User Email Domain Distribution
EASY
Write a PostgreSQL query to count the number of users for each email domain. The results should list the email domain and the corresponding user count, sorted in descending order by count, then alphabetically by domain for ties.
Related Concepts
String Manipulation (SPLIT_PART) Aggregate Functions (COUNT) Grouping (GROUP BY) Ordering (ORDER BY) SQL Aliases
Hint
- Use `SPLIT_PART(email, '@', 2)` to extract the domain part of the email and give it an alias, like `email_domain`.
- In PostgreSQL, you can conveniently use this alias directly in the `GROUP BY` clause.
- Count the users in each group using `COUNT(*)`.
- Order the results by the count descending, then by the domain ascending.
Solution (PostgreSQL)
Your task is to see how many users come from 'gmail.com', how many from 'yahoo.com', and so on.
- For each email (e.g., 'john.doe@gmail.com'), you need to snip off everything before and including the '@' symbol. This leaves you with just the domain ('gmail.com').
- Once you have the domain for every user, you group all the 'gmail.com' users together, all the 'yahoo.com' users together, etc.
- Then, you simply count how many users are in each group.
- Finally, you list these domains from the one with the most users down to the one with the fewest.
PostgreSQL Query
-- Count users per email domain
SELECT
SPLIT_PART(email, '@', 2) AS email_domain,
COUNT(user_id) AS user_count
FROM
Users
GROUP BY
email_domain
ORDER BY
user_count DESC,
email_domain ASC;
Explanation of the PostgreSQL Query:
1. Extracting the Email Domain:
SPLIT_PART(email, '@', 2) AS email_domain: This function splits theemailstring at the'@'symbol and returns the second part (the domain), which is then aliased asemail_domainfor the final output.
2. Counting Users per Domain:
COUNT(user_id) AS user_count: This aggregate function counts the number of non-nulluser_idvalues for each group. This is aliased asuser_count.
3. Grouping Results:
GROUP BY email_domain: This clause groups all rows that have the same extractedemail_domain. TheCOUNT(user_id)aggregate function then operates on each of these distinct groups. PostgreSQL is flexible and allows you to use the alias from the `SELECT` statement here, which makes the query cleaner.
4. Ordering Results:
ORDER BY user_count DESC, email_domain ASC: This sorts the final grouped results. It primarily sorts by theuser_countin descending order. For any domains with the same count, it sorts them secondarily by theemail_domainname alphabetically.
Note on Portability: While using the `email_domain` alias in the `GROUP BY` clause is a convenient feature of PostgreSQL, it is not supported by all SQL databases (like SQL Server or Oracle). For maximum portability, you would need to repeat the entire expression: GROUP BY SPLIT_PART(email, '@', 2). However, for a query specifically targeting PostgreSQL, the approach above is perfectly valid and more readable.
Your Turn! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!