In the previous lesson, we introduced window functions and explored ROW_NUMBER(). Now we'll dive deeper into the family of ranking functions that SQL offers: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. Each has a distinct purpose and understanding when to use each one is crucial for effective data analysis.
All four functions assign a numeric value to rows based on ordering, but they handle ties (equal values) differently. Let's explore each one.
ROW_NUMBER() assigns a unique sequential number to each row, even if values are identical. It treats ties as different rows.
Syntax:
ROW_NUMBER() OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
Example: Ranking Transactions
SELECT
customer_id,
amount,
payment_date,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY amount DESC
) AS payment_rank
FROM
payment
WHERE
customer_id IN (1, 2, 3)
ORDER BY
customer_id,
payment_rank;
Output Sample:
customer_id | amount | payment_date | payment_rank
1 | 11.99 | 2005-08-01 | 1
1 | 11.99 | 2005-07-08 | 2
1 | 10.99 | 2005-06-19 | 3
2 | 11.99 | 2005-08-02 | 1
2 | 10.99 | 2005-07-09 | 2
3 | 9.99 | 2005-08-03 | 1
Key Point: Even though the first two customer 1 payments have identical amounts (11.99), they receive different row numbers (1 and 2).
RANK() assigns the same rank to rows with identical ordering values, but leaves gaps in the numbering sequence. If two rows tie for rank 1, the next rank is 3 (skipping 2).
Syntax:
RANK() OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
Example: Ranking Payments by Amount
SELECT
customer_id,
amount,
payment_date,
RANK() OVER (
PARTITION BY customer_id
ORDER BY amount DESC
) AS payment_rank
FROM
payment
WHERE
customer_id IN (1, 2, 3)
ORDER BY
customer_id,
payment_rank;
Output Sample:
customer_id | amount | payment_date | payment_rank
1 | 11.99 | 2005-08-01 | 1
1 | 11.99 | 2005-07-08 | 1
1 | 10.99 | 2005-06-19 | 3
2 | 11.99 | 2005-08-02 | 1
2 | 10.99 | 2005-07-09 | 2
3 | 9.99 | 2005-08-03 | 1
Key Point: Both customer 1 payments of 11.99 receive rank 1, and the next payment gets rank 3 (not 2). This is useful when you want to identify ties but preserve ranking position in the full dataset.
DENSE_RANK() is similar to RANK() but doesn't skip numbers. If two rows tie for rank 1, the next rank is 2 (not 3).
Syntax:
DENSE_RANK() OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
Example: Dense Ranking Payment Amounts
SELECT
customer_id,
amount,
payment_date,
DENSE_RANK() OVER (
PARTITION BY customer_id
ORDER BY amount DESC
) AS payment_rank
FROM
payment
WHERE
customer_id IN (1, 2, 3)
ORDER BY
customer_id,
payment_rank;
Output Sample:
customer_id | amount | payment_date | payment_rank
1 | 11.99 | 2005-08-01 | 1
1 | 11.99 | 2005-07-08 | 1
1 | 10.99 | 2005-06-19 | 2
2 | 11.99 | 2005-08-02 | 1
2 | 10.99 | 2005-07-09 | 2
3 | 9.99 | 2005-08-03 | 1
Key Point: Both customer 1 payments of 11.99 receive rank 1, and the next distinct amount gets rank 2. No gaps in the ranking sequence. This is ideal when you want to identify distinct groups without gaps.
NTILE(n) divides the partition into n groups (buckets) and assigns each row a bucket number. This is useful for percentile analysis and bucketing data into quartiles, tertiles, etc.
Syntax:
NTILE(number_of_buckets) OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
)
Example: Quartile Analysis
SELECT
customer_id,
amount,
payment_date,
NTILE(4) OVER (
PARTITION BY customer_id
ORDER BY amount DESC
) AS quartile
FROM
payment
WHERE
customer_id IN (1, 2, 3)
ORDER BY
customer_id,
quartile;
Output Sample:
customer_id | amount | payment_date | quartile
1 | 11.99 | 2005-08-01 | 1
1 | 11.99 | 2005-07-08 | 2
1 | 10.99 | 2005-06-19 | 3
2 | 11.99 | 2005-08-02 | 1
2 | 10.99 | 2005-07-09 | 2
3 | 9.99 | 2005-08-03 | 1
Key Point: Rows are distributed into 4 quartiles. This is extremely useful for percentile analysis—identifying top 25% (quartile 1), next 25% (quartile 2), etc.
Let's see all four functions applied to the same data:
SELECT
customer_id,
amount,
row_number() OVER (ORDER BY amount DESC) AS row_num,
rank() OVER (ORDER BY amount DESC) AS rnk,
dense_rank() OVER (ORDER BY amount DESC) AS dense_rnk,
ntile(3) OVER (ORDER BY amount DESC) AS tertile
FROM
payment
LIMIT 10;
Output Sample:
customer_id | amount | row_num | rnk | dense_rnk | tertile
1 | 11.99 | 1 | 1 | 1 | 1
1 | 11.99 | 2 | 1 | 1 | 1
2 | 11.99 | 3 | 1 | 1 | 1
5 | 10.99 | 4 | 4 | 2 | 1
6 | 10.99 | 5 | 4 | 2 | 1
3 | 9.99 | 6 | 6 | 3 | 2
4 | 9.99 | 7 | 6 | 3 | 2
7 | 8.99 | 8 | 8 | 4 | 3
8 | 8.99 | 9 | 8 | 4 | 3
9 | 7.99 | 10 | 10 | 5 | 3
Observations:
row_number: Always unique, no gapsrank: Groups ties but creates gaps (1, 1, 1, 4, 4, 6, 6, 8, 8, 10)dense_rank: Groups ties without gaps (1, 1, 1, 2, 2, 3, 3, 4, 4, 5)ntile(3): Distributes into 3 groups based on orderingGet the highest-paying customer per rental month:
WITH ranked_payments AS (
SELECT
customer_id,
amount,
DATE_TRUNC('month', payment_date) AS month,
ROW_NUMBER() OVER (
PARTITION BY DATE_TRUNC('month', payment_date)
ORDER BY amount DESC
) AS rank
FROM
payment
)
SELECT
customer_id,
amount,
month
FROM
ranked_payments
WHERE
rank = 1
ORDER BY
month DESC;
Categorize films by rental frequency:
WITH rental_counts AS (
SELECT
film_id,
COUNT(*) AS rental_count,
DENSE_RANK() OVER (
ORDER BY COUNT(*) DESC
) AS popularity_tier
FROM
rental r
JOIN inventory i ON r.inventory_id = i.inventory_id
GROUP BY
film_id
)
SELECT
film_id,
rental_count,
CASE
WHEN popularity_tier = 1 THEN 'Blockbuster'
WHEN popularity_tier <= 3 THEN 'Popular'
WHEN popularity_tier <= 10 THEN 'Standard'
ELSE 'Niche'
END AS popularity_category
FROM
rental_counts
LIMIT 20;
Segment customers into spending quartiles:
WITH customer_spending AS (
SELECT
customer_id,
SUM(amount) AS total_spent,
NTILE(4) OVER (ORDER BY SUM(amount)) AS spending_quartile
FROM
payment
GROUP BY
customer_id
)
SELECT
spending_quartile,
COUNT(*) AS customer_count,
MIN(total_spent) AS low_amount,
MAX(total_spent) AS high_amount
FROM
customer_spending
GROUP BY
spending_quartile
ORDER BY
spending_quartile;
| Function | Use Case | Handles Ties |
|---|---|---|
ROW_NUMBER | Need unique sequential numbers; don't care about ties | No (all unique) |
RANK | Need to identify position but account for ties; gaps are OK | Yes (with gaps) |
DENSE_RANK | Need tier identification without position gaps | Yes (no gaps) |
NTILE | Need percentile/quartile/bucket analysis | Distributes into groups |
OVER clause.In the next lesson, we'll explore advanced window function concepts including window frames, partitioning strategies, and other analytical functions like LAG, LEAD, FIRST_VALUE, and LAST_VALUE.