Identifying Previously Active, Now Inactive Users
Write a PostgreSQL query to identify users who have not posted any original tweets in the last 30 days (from '2023-03-15') but had posted at least one original tweet before that 30-day period. Only consider 'active' users. The output should include user ID, username, their last tweet date, the number of days since that last tweet (from '2023-03-15'), and their total count of original tweets. Order by days since last tweet (descending).
Related Concepts
Hint
Let's define the "current date" as '2023-03-15'::DATE and the "30-day cutoff date" as '2023-03-15'::DATE - INTERVAL '30 days' which is '2023-02-13'. A user is inactive in the last 30 days if their last tweet was *before* '2023-02-13' + INTERVAL '1 day' = '2023-02-14'.
- User Tweet Summary CTE:
- Start by joining
UserswithTweets(LEFT JOINto keep all users initially, then filter). - Filter for
Users.account_status = 'active'andTweets.is_retweet = FALSE. - For each user, calculate:
- Their overall last tweet date (
MAX(tweet_date)). - Their total number of original tweets (
COUNT(tweet_id)). - The date of their last tweet *within* the last 30 days (e.g.,
MAX(CASE WHEN tweet_date >= '2023-02-14' THEN tweet_date ELSE NULL END)). Let's call thislast_tweet_in_30_days. - The date of their last tweet *before* the 30-day window (e.g.,
MAX(CASE WHEN tweet_date < '2023-02-14' THEN tweet_date ELSE NULL END)). Let's call thislast_tweet_before_30_days.
- Their overall last tweet date (
- Group by
user_idandusername.
- Start by joining
- Filtering for Inactive Users:
- From the CTE above, select users where:
last_tweet_in_30_days IS NULL(meaning they had no tweets in the last 30 days).last_tweet_before_30_days IS NOT NULL(meaning they *did* have at least one tweet before the 30-day window).
- The
last_tweet_datefor the output will belast_tweet_before_30_daysfrom the CTE, as this is their actual last tweet if they are inactive in the recent period.
- From the CTE above, select users where:
- Calculate Output Fields:
days_since_last_tweet:'2023-03-15'::DATE - last_tweet_date(which is thelast_tweet_before_30_days).total_tweets: This was already calculated in the first CTE.
- Final Selection and Ordering: Select the required columns (
user_id,username,last_tweet_date,days_since_last_tweet,total_tweets) and order them.
Alternative Hint: Another approach could be to calculate each user's overall last tweet date and total tweets. Then, in an outer query or using HAVING, filter these users: their overall last tweet date must be before the 30-day cutoff (< '2023-02-14'), AND they must have a COUNT of tweets > 0 (ensuring they tweeted at least once).
Solution Approach (PostgreSQL)
Imagine you're trying to find Twitter users who used to tweet but have gone quiet recently:
- Define "Recent": First, we set a "current date" (e.g., March 15, 2023). "Last 30 days" means the period from February 14, 2023, to March 15, 2023.
- Check Each User's Tweets: For every active user:
- Find the date of their very last (original, not retweet) post.
- Count how many (original) posts they've ever made.
- Filter for the "Quiet Ones": Now, we only keep users if:
- Their very last post was before February 14, 2023 (meaning they haven't posted anything in the recent 30-day window).
- AND they have made at least one post sometime in the past (so we know they were active before).
- Calculate Days Silent: For these quiet users, calculate how many days it's been from our "current date" (March 15, 2023) back to their last post date.
- Show Results: Display their ID, username, when they last tweeted, how many days ago that was, and their total tweet count. Sort this list to see who has been silent the longest.
Conceptual Query Structure (using CTEs)
WITH UserTweetSummary AS (
-- Aggregate tweet data for each active user, focusing on original tweets
SELECT
u.user_id,
u.username,
MAX(t.tweet_date) AS overall_last_tweet_date,
COUNT(t.tweet_id) AS total_original_tweets
FROM
Users u
LEFT JOIN
Tweets t ON u.user_id = t.user_id AND t.is_retweet = FALSE
WHERE
u.account_status = 'active'
GROUP BY
u.user_id, u.username
)
-- Identify inactive users based on the criteria
SELECT
uts.user_id,
uts.username,
uts.overall_last_tweet_date AS last_tweet_date,
('2023-03-15'::DATE - uts.overall_last_tweet_date) AS days_since_last_tweet,
uts.total_original_tweets AS total_tweets
FROM
UserTweetSummary uts
WHERE
uts.total_original_tweets > 0 -- Must have tweeted at least once
AND uts.overall_last_tweet_date < '2023-03-15'::DATE - INTERVAL '30 days' -- Last tweet was before the 30-day window started
-- (i.e., last tweet_date < '2023-02-13')
-- More precisely, if analysis date is '2023-03-15', then 30 days prior is '2023-02-13'.
-- "Haven't posted in last 30 days" means their last post date is strictly less than ('2023-03-15' - 29 days) = '2023-02-14'.
-- So, overall_last_tweet_date < '2023-02-14'
ORDER BY
days_since_last_tweet DESC;
-- Corrected WHERE clause for inactivity:
-- overall_last_tweet_date < ('2023-03-15'::DATE - INTERVAL '29 days') -- which is < '2023-02-14'
(This is a structural example; a detailed solution would refine date logic and conditions.)
Explanation Notes:
UserTweetSummary CTE):
- Joins
UserswithTweets. ALEFT JOINis used to include all users, and conditions on tweets (is_retweet = FALSE) are applied in theONclause or a subsequentWHEREif grouping on user attributes only. - Filters for
Users.account_status = 'active'. - Calculates
MAX(t.tweet_date)to get the most recent original tweet date for each user. - Calculates
COUNT(t.tweet_id)to get the total number of original tweets for each user. - Groups by user details to get these aggregates per user.
- Selects from the
UserTweetSummaryCTE. - The crucial
WHEREclause conditions:uts.total_original_tweets > 0: Ensures the user has tweeted at least once in their lifetime.uts.overall_last_tweet_date < ('2023-03-15'::DATE - INTERVAL '29 days'): This checks if the user's last tweet date is before the start of the 30-day window. If "today" is 2023-03-15, the 30-day window is [2023-02-14, 2023-03-15]. So, "haven't posted in the last 30 days" means theiroverall_last_tweet_datemust be strictly less than 2023-02-14.
- Calculates
days_since_last_tweetas('2023-03-15'::DATE - uts.overall_last_tweet_date). - Orders the results by
days_since_last_tweet DESC.
Key Logic Points for a Full Solution:
- Defining the 30-day window: If the analysis date is '2023-03-15', the "last 30 days" includes dates from '2023-02-14' up to '2023-03-15'. A user is "inactive in the last 30 days" if their latest tweet occurred *before* '2023-02-14'.
- "Previously active" condition: This is met if their
overall_last_tweet_date(which must be < '2023-02-14') is not NULL, implying they have tweeted at least once. Thetotal_original_tweets > 0condition also ensures this. - Filtering
is_retweet = FALSEis important to only consider original content. - Filtering
account_status = 'active'ensures we don't target suspended users.
Take it Further! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!