Ask Claude about this

User Retention Analysis

Problem Statement

As a Data Analyst at Twitter (X), you're working with the User Growth and Retention team to identify users who might be at risk of churning. The team wants to create a re-engagement campaign targeting users who were previously active but have become inactive recently. This analysis is crucial for understanding user behavior patterns and implementing proactive retention strategies.

Your task: Identify users who haven't posted any tweets in the last 30 days (as of '2023-03-15') but had posted at least one tweet before that 30-day period. Include their last tweet date, days since last tweet, and total tweets.

Business Context

  • Twitter's growth team uses inactive user identification for targeted re-engagement campaigns.
  • Understanding when users become inactive helps optimize retention strategies.
  • This analysis helps prioritize limited marketing resources on recoverable users.
  • Inactive user patterns can inform product improvements for user engagement.

Key Requirements

Core Logic: (Current Date Assumption for Analysis: 2023-03-15)

  • Identify users who haven't tweeted in the last 30 days (i.e., no tweets on or after '2023-02-14').
  • The identified users must have had at least one tweet before this 30-day cutoff (i.e., at least one tweet before '2023-02-14').
  • Exclude users who never tweeted at all.
  • Consider only users with account_status = 'active'.
  • Consider only original tweets (is_retweet = FALSE).

Output Specification:

  • user_id: User identifier.
  • username: User's display name.
  • last_tweet_date: Date of their most recent tweet (must be before '2023-02-14').
  • days_since_last_tweet: Number of days from '2023-03-15' to their last_tweet_date.
  • total_tweets: Total number of original tweets by the user.
  • Order results by days_since_last_tweet DESC (most inactive first).

Data Quality Considerations:

  • The solution must inherently handle users with no tweets or only tweets within the last 30 days.
  • Assume tweet_date >= user join_date for valid tweets.
  • Assume date calculations are based on the provided '2023-03-15' current date.

Database Schema & Sample Data

Assume the following schema and sample data for Users and Tweets tables (key fields shown):

Table: Users

CREATE TABLE Users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    join_date DATE NOT NULL,
    account_status VARCHAR(20) DEFAULT 'active'
    -- ... other columns
);
user_id username join_date account_status
1 john_doe 2022-12-01 active
2 jane_smith 2022-12-05 active
3 bob_brown 2022-12-10 active
4 sara_white 2022-12-15 active
5 mike_black 2022-12-20 active
6 lisa_green 2022-12-25 active
7 tom_blue 2023-01-01 active
8 anna_red 2023-01-05 suspended

Table: Tweets

CREATE TABLE Tweets (
    tweet_id INT PRIMARY KEY,
    user_id INT NOT NULL,
    tweet_date DATE NOT NULL,
    content TEXT NOT NULL,
    is_retweet BOOLEAN DEFAULT FALSE,
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
    -- ... other columns
);
tweet_id user_id tweet_date content is_retweet
101 1 2023-01-05 "Hello Twitter!" FALSE
102 2 2023-01-10 "Just joined, excited!" FALSE
103 3 2023-01-15 "My first tweet" FALSE
104 1 2023-02-05 "Another day, another tweet" FALSE
105 4 2023-01-20 "New here, saying hi!" FALSE
106 2 2023-02-15 "Beautiful day today" FALSE
107 5 2023-01-25 "Testing Twitter out" FALSE
108 6 2023-02-20 "Loving the new features!" FALSE
109 1 2023-02-25 "Weekend vibes" FALSE
110 6 2023-02-28 "Almost March already!" FALSE
111 7 2023-01-30 "Finally posting something" FALSE
112 2 2023-03-01 "March is here!" FALSE
113 1 2023-03-02 "Spring is coming" FALSE

Expected Output (Analysis Date: 2023-03-15):

The 30-day cutoff date is '2023-02-14' (i.e., 2023-03-15 minus 30 days is 2023-02-13, so tweets must be before this date, meaning on or before 2023-02-12. *Correction:* "Last 30 days" means from 2023-02-14 to 2023-03-15 inclusive. So "haven't posted in last 30 days" means last post was *before* 2023-02-14).

user_id username last_tweet_date days_since_last_tweet total_tweets
7 tom_blue 2023-01-30 44 1
5 mike_black 2023-01-25 49 1
4 sara_white 2023-01-20 54 1
3 bob_brown 2023-01-15 59 1

Note: Users 1, 2, 6 are excluded because they tweeted on/after '2023-02-14'. User 8 is excluded due to 'suspended' status. All listed users had at least one tweet before '2023-02-14' and no tweets on or after '2023-02-14'. Total tweets are all original tweets by the user.

Identifying Previously Active, Now Inactive Users

ADVANCED

Write a PostgreSQL query to identify users who have not posted any original tweets in the last 30 days (from '2023-03-15') but had posted at least one original tweet before that 30-day period. Only consider 'active' users. The output should include user ID, username, their last tweet date, the number of days since that last tweet (from '2023-03-15'), and their total count of original tweets. Order by days since last tweet (descending).

Take it Further! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!

Nerchuko Academy · Free DS Interview Prep