Compare Successive Rows within the same table 

Compare Successive Rows within the same table 

(how to compare successive rows within the same table)

How to Use LEAD in SQL

The LEAD function in SQL is a window function that allows you to access the value of a column in the next row within the same result set, based on a specified ordering. It's particularly useful when you want to compare values from successive rows.

Syntax

LEAD(column_name, offset, default_value) OVER (
    PARTITION BY partition_column
    ORDER BY order_column
)
  • column_name: The column to fetch the value from the next row.
  • offset: (Optional) Specifies how many rows ahead to look. Default is 1.
  • default_value: (Optional) A value to return if there’s no next row. Default is NULL.
  • PARTITION BY: (Optional) Divides the rows into partitions to reset the window function for each partition.
  • ORDER BY: Defines the order of rows to determine what the "next row" is.

Example Use Cases

1. Comparing Successive Rows

Suppose you have a table sales:

id
sale_date
sales_amount
1
2023-01-01
100
2
2023-01-02
120
3
2023-01-03
150

To calculate the difference between the current day's sales and the next day's sales:

SELECT
    id,
    sale_date,
    sales_amount,
    LEAD(sales_amount, 1) OVER (ORDER BY sale_date) AS next_sales,
    LEAD(sales_amount, 1) OVER (ORDER BY sale_date) - sales_amount AS sales_difference
FROM sales;

Output:

id
sale_date
sales_amount
next_sales
sales_difference
1
2023-01-01
100
120
20
2
2023-01-02
120
150
30
3
2023-01-03
150
NULL
NULL

2. Filling Missing Data

Suppose a students table contains scores and you want to fill missing future scores with a default value (e.g., 0):

student_id
exam_date
score
1
2023-01-01
85
2
2023-01-02
NULL
3
2023-01-03
90

You can use LEAD with a default value:


SELECT
    student_id,
    exam_date,
    score,
    LEAD(score, 1, 0) OVER (ORDER BY exam_date) AS next_score
FROM students;

Output:

student_id
exam_date
score
next_score
1
2023-01-01
85
NULL
2
2023-01-02
NULL
90
3
2023-01-03
90
0

3. Using Partitions

If you want to compare data within groups (e.g., by region), use PARTITION BY.

Table sales_by_region:

region
sale_date
sales_amount
East
2023-01-01
100
East
2023-01-02
120
West
2023-01-01
200
West
2023-01-02
250

To calculate next day's sales within each region:

sql
Copy code
SELECT
    region,
    sale_date,
    sales_amount,
    LEAD(sales_amount) OVER (PARTITION BY region ORDER BY sale_date) AS next_sales
FROM sales_by_region;

Output:

region
sale_date
sales_amount
next_sales
East
2023-01-01
100
120
East
2023-01-02
120
NULL
West
2023-01-01
200
250
West
2023-01-02
250
NULL

Key Points

  1. LEAD is for Next Rows: Fetches the value from the next row based on the specified order.
  2. Use PARTITION BY for Groups: Processes rows within each group independently.
  3. Custom Defaults: Specify a default value when there’s no next row.
  4. Performance: Ensure proper indexing for large datasets when using ORDER BY.

When to Use LEAD

  • Trend Analysis: Compare successive values (e.g., stock prices, sales trends).
  • Filling Gaps: Predict or backfill missing values.
  • Event Durations: Calculate time differences between events.
  • Comparing Groups: Analyze changes within partitions like regions or categories.

This function is powerful for row-based comparisons in SQL, making it easier to solve many real-world problems.