Active Customer MoM Growth (PostgreSQL)
Write a PostgreSQL query to calculate the month-over-month percentage change in the number of unique active customers. A customer is defined as active if their subscription status is 'active' on the last day of the analyzed month, based on the Subscriptions table.
Related Concepts
Hint
In PostgreSQL, you can use GENERATE_SERIES to create your month sequence and specific date arithmetic for finding the last day of the month.
- Use
GENERATE_SERIESto get the first day of each month in your data's range. - Calculate the last day of each generated month. A common PostgreSQL way is
(month_start_date + INTERVAL '1 month' - INTERVAL '1 day')::DATE. - Join
Subscriptionsto these months. A user is active if:- Their
start_dateis on or before the month's last day. - Their
end_dateis NULL OR theirend_dateis on or after the month's last day. - Their
statusis 'active'.
- Their
- Count distinct active users per month.
- Use
LAG()to get the previous month's count for the percentage change calculation. Remember to cast for division if necessary.
Solution (PostgreSQL)
Think of checking Netflix accounts at the very end of each month (like midnight on January 31st):
To see who was an "active subscriber" then, you'd look for accounts that:
- Had started their subscription on or before that moment (e.g., Jan 31st).
- Their subscription either hadn't ended yet OR was scheduled to end on or after that moment.
- And, crucially, their current subscription status was officially
'active'.
We count how many unique people fit this description for January. Then we repeat for February, March, etc. Finally, we compare this month's count to last month's to see the growth percentage.
PostgreSQL Query
-- Determine the overall date range for generating months
WITH DateRange AS (
SELECT
DATE_TRUNC('month', MIN(start_date))::DATE
AS min_month,
DATE_TRUNC('month', MAX(COALESCE(end_date,
start_date,
CURRENT_DATE)))::DATE
AS max_month
FROM
Subscriptions
),
-- Generate a series of all months within the data range
AllMonths (month_start_date) AS (
SELECT
GENERATE_SERIES(
(SELECT min_month FROM DateRange),
(SELECT max_month FROM DateRange),
'1 month'::INTERVAL
)::DATE
),
-- Calculate active customers for the last day of each month
MonthlyActiveCustomers (report_month, active_customers) AS (
SELECT
am.month_start_date,
COUNT(DISTINCT s.user_id)
FROM
AllMonths am
LEFT JOIN -- Use LEFT JOIN to include months with no active users
Subscriptions s
ON s.start_date <=
(am.month_start_date +
INTERVAL '1 month' -
INTERVAL '1 day')::DATE
-- ^ Started on or before EOM
AND (
COALESCE(s.end_date, '9999-12-31'::DATE) >=
(am.month_start_date +
INTERVAL '1 month' -
INTERVAL '1 day')::DATE
-- ^ Ends on or after EOM (or ongoing)
)
AND s.status = 'active' -- Status is 'active'
GROUP BY
am.month_start_date
),
-- Add previous month's active customer count
CustomerGrowth (
report_month,
active_customers,
prev_month_active_customers
) AS (
SELECT
report_month,
active_customers,
LAG(active_customers, 1)
OVER (ORDER BY report_month)
-- ^ LAG will be NULL for the first month
FROM
MonthlyActiveCustomers
)
-- Final calculation of percentage change
SELECT
cg.report_month AS month,
cg.active_customers,
CASE
WHEN cg.prev_month_active_customers IS NULL
OR cg.prev_month_active_customers = 0
THEN
NULL::NUMERIC
ELSE
ROUND(
(
(cg.active_customers - cg.prev_month_active_customers)
* 100.0
) / cg.prev_month_active_customers,
2
)
END AS percent_change
FROM
CustomerGrowth cg
ORDER BY
cg.month;
Explanation of the PostgreSQL Query:
DateRange CTE:
- Purpose: Determines the minimum and maximum month to generate a series for. This makes
GENERATE_SERIESmore efficient by not generating unnecessary dates. DATE_TRUNC('month', ...): Gets the first day of the month.COALESCE(end_date, start_date, CURRENT_DATE): Ensures that ongoing subscriptions or subscriptions without an end date are considered up to the current date if that's the latest activity.
AllMonths CTE:
- Purpose: Generates a series of the first day of each month between
min_monthandmax_monthfrom theDateRangeCTE. GENERATE_SERIES(start, stop, interval): PostgreSQL function to create a series of values. Here, it generates timestamps at 1-month intervals, which are then cast toDATE.
MonthlyActiveCustomers CTE:
- Purpose: For each month from
AllMonths, this counts unique users active on the last day of that month. LEFT JOIN: Changed fromJOINtoLEFT JOIN. This is important to ensure that all months generated byAllMonthsare present in the result, even if there were zero active customers in a particular month. IfJOINwere used, months with no matching active subscriptions would be omitted.- Key Logic (PostgreSQL specific for EOM):
- The last day of the month (EOM) for
am.month_start_dateis calculated as(am.month_start_date + INTERVAL '1 month' - INTERVAL '1 day')::DATE. - A subscription (
s) is active on EOM if:s.start_date <= EOM_current_month(COALESCE(s.end_date, '9999-12-31'::DATE) >= EOM_current_month)(Using'9999-12-31'::DATEfor a far future date, explicitly cast to DATE).s.status = 'active'.
- The last day of the month (EOM) for
COUNT(DISTINCT s.user_id): Counts unique active users. If a month has no active users due to theLEFT JOIN,COUNT(s.user_id)will correctly return 0.
CustomerGrowth CTE:
- Purpose: Aligns current month's active customer count with the previous month's count using
LAG(). LAG(active_customers, 1) OVER (ORDER BY report_month): Fetchesactive_customersfrom the preceding row (ordered by month). ReturnsNULLfor the first month.
SELECT Statement:
- Purpose: Computes the MoM percentage change.
- Formula:
((Current - Previous) * 100.0 / Previous). The100.0ensures floating-point division. CASEstatement handles:- The first month (
prev_month_active_customersisNULL). - Months where
prev_month_active_customersis0(to prevent division by zero). In these cases,percent_changeisNULL. Explicitly casting toNUMERIC(NULL::NUMERIC) ensures the column type is consistent.
- The first month (
ROUND(..., 2): Rounds the percentage to two decimal places.
PostgreSQL Specifics: The use of GENERATE_SERIES is idiomatic PostgreSQL for creating date sequences. The interval arithmetic (am.month_start_date + INTERVAL '1 month' - INTERVAL '1 day')::DATE is a common way to get the last day of the month. Explicit casting (e.g., ::DATE, ::NUMERIC) is good practice in PostgreSQL for clarity and type safety.
The sample data for Subscriptions and the logic for "active on the last day of the month" produce the expected output:
- Jan 2022 (EOM: Jan 31): Users 101 (sub #1) and 102 (sub #2) are active.
status='active',start_date <= EOM,end_dateis NULL or>= EOM. Count = 2. - Feb 2022 (EOM: Feb 28): Users 101 (sub #1), 102 (sub #2), 104 (sub #4), 105 (sub #5) are active. User 103's subscription ended Feb 10. Count = 4.
- Mar 2022 (EOM: Mar 31):
- User 101: Subscription #1 ended Mar 20 (due to record #8 if it updates sub #1). New subscription #9 started Mar 25 and is active. So, User 101 is active.
- User 102: Subscription #2 is still active.
- User 104: Subscription #4 is still active.
- User 105: Subscription #5 is still active.
- User 107: Subscription #7 started Mar 10 and is active.
start_date,end_date, andstatusfor the last day of each month.
Your Turn! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!