SQL code copied to buffer
RU PT FR

Lesson 7.1: Window Functions for Advanced Data Analysis

Window functions are one of the most powerful features in SQL for performing complex analytical calculations. Unlike aggregate functions that collapse multiple rows into a single result, window functions allow you to perform calculations across a set of rows that are related to the current row—all while preserving the individual rows in your result set.

This lesson introduces the fundamental concepts of window functions and demonstrates how they can transform your data analysis capabilities.

What Are Window Functions?

A window function performs a calculation across a set of table rows that are somehow related to the current row. This set of rows is called a "window" or "window frame." The key difference from regular aggregate functions is that window functions do not cause rows to be grouped into a single output row—each row retains its identity.

Think of it like looking through a moving window as you scan through your data. For each row, you can see and calculate values based on related rows around it, but each row still appears separately in the result.

Key characteristics:

  • Window functions operate on a set of rows defined by the OVER clause
  • They return a value for each row in the result set
  • They do not reduce the number of rows returned by the query
  • They can be used for ranking, aggregation, and analytical operations

Basic Syntax

The general syntax for a window function is:

window_function_name(expression) OVER (
    [PARTITION BY partition_expression]
    [ORDER BY sort_expression]
    [window_frame_clause]
)

Components:

  • window_function_name: The function to apply (e.g., ROW_NUMBER, SUM, AVG)
  • OVER clause: Defines the window of rows for the function
  • PARTITION BY (optional): Divides the result set into partitions (groups)
  • ORDER BY (optional): Defines the order of rows within each partition
  • window_frame_clause (optional): Further refines which rows are included in the window

Your First Window Function: ROW_NUMBER()

Let's start with one of the most commonly used window functions: ROW_NUMBER(). This function assigns a unique sequential number to each row within a partition.

Example 1: Numbering All Payments

SELECT
    payment_id,
    customer_id,
    amount,
    payment_date,
    ROW_NUMBER() OVER (ORDER BY payment_date) AS row_num
FROM
    payment
LIMIT 10;

This query assigns a sequential number to each payment ordered by payment date. The OVER (ORDER BY payment_date) clause tells SQL to:

  1. Order all rows by payment_date
  2. Assign row numbers starting from 1

Example 2: Numbering Within Groups Using PARTITION BY

The real power of window functions comes when you use PARTITION BY to create separate windows for different groups:

SELECT
    customer_id,
    amount,
    payment_date,
    ROW_NUMBER() OVER (
        PARTITION BY customer_id 
        ORDER BY payment_date
    ) AS payment_number
FROM
    payment
WHERE
    customer_id IN (1, 2, 3)
ORDER BY
    customer_id, 
    payment_date;

Here's what happens:

  • PARTITION BY customer_id creates a separate window for each customer
  • Within each customer's window, rows are ordered by payment_date
  • ROW_NUMBER() starts counting from 1 for each new customer
  • This allows you to see the 1st, 2nd, 3rd payment for each customer

Visualization:

Customer 1:     Customer 2:     Customer 3:
Row 1 ----\     Row 1 ----\     Row 1 ----\
Row 2 -----\    Row 2 -----\    Row 2 -----\
Row 3 ------\   Row 3 ------\   Row 3 ------\
   ...           ...             ...

Each customer has their own independent row numbering.

Practical Applications

Finding the Most Recent Transaction

Window functions make it easy to identify the most recent record in each group:

WITH numbered_payments AS (
    SELECT
        customer_id,
        amount,
        payment_date,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY payment_date DESC
        ) AS recency_rank
    FROM
        payment
)
SELECT
    customer_id,
    amount,
    payment_date
FROM
    numbered_payments
WHERE
    recency_rank = 1
ORDER BY
    customer_id
LIMIT 10;

This query finds the most recent payment for each customer by:

  1. Numbering payments for each customer in descending date order
  2. Filtering for recency_rank = 1 (the most recent)

Comparing Each Row to Aggregate Values

Window functions can also perform aggregations while keeping individual rows:

SELECT
    customer_id,
    amount,
    payment_date,
    SUM(amount) OVER (PARTITION BY customer_id) AS total_spent,
    AVG(amount) OVER (PARTITION BY customer_id) AS avg_payment,
    amount - AVG(amount) OVER (PARTITION BY customer_id) AS diff_from_avg
FROM
    payment
WHERE
    customer_id IN (1, 2, 3)
ORDER BY
    customer_id,
    payment_date;

For each payment, this query shows:

  • The individual payment amount
  • The total amount this customer has spent (across all their payments)
  • The average payment amount for this customer
  • How much this specific payment differs from their average

Notice how regular aggregate functions would require a GROUP BY and collapse the rows, but window functions let you keep all the detail while adding aggregated context.

Window Functions vs. GROUP BY

It's important to understand the difference:

GROUP BY (Aggregate Functions):

SELECT
    customer_id,
    COUNT(*) AS payment_count,
    SUM(amount) AS total_amount
FROM
    payment
GROUP BY
    customer_id;

Result: One row per customer

Window Functions:

SELECT
    customer_id,
    payment_id,
    amount,
    COUNT(*) OVER (PARTITION BY customer_id) AS payment_count,
    SUM(amount) OVER (PARTITION BY customer_id) AS total_amount
FROM
    payment;

Result: Every payment row preserved, with aggregate values added as additional columns

Key Takeaways

  • Window functions perform calculations across related rows while maintaining all individual rows in the result set.
  • The OVER clause is essential and defines the window of rows for the function to operate on.
  • PARTITION BY divides the data into groups, with the window function applied separately to each group.
  • ORDER BY within the OVER clause determines the order of rows for the function (crucial for functions like ROW_NUMBER()).
  • Window functions are perfect for ranking, running totals, moving averages, and comparing individual values to group aggregates.
  • Unlike GROUP BY, window functions do not collapse rows—they add calculated columns to your existing data.

In the next lessons, we will explore more window functions like RANK(), DENSE_RANK(), NTILE(), and dive deeper into window frames and advanced analytical calculations.