(how to compare successive rows within the same table)
How to Use LEAD
in SQL
The LEAD
function in SQL is a window function that allows you to access the value of a column in the next row within the same result set, based on a specified ordering. It's particularly useful when you want to compare values from successive rows.
Syntax
LEAD(column_name, offset, default_value) OVER (
PARTITION BY partition_column
ORDER BY order_column
)
column_name
: The column to fetch the value from the next row.offset
: (Optional) Specifies how many rows ahead to look. Default is 1.default_value
: (Optional) A value to return if there’s no next row. Default isNULL
.PARTITION BY
: (Optional) Divides the rows into partitions to reset the window function for each partition.ORDER BY
: Defines the order of rows to determine what the "next row" is.
Example Use Cases
1. Comparing Successive Rows
Suppose you have a table sales
:
id | sale_date | sales_amount |
1 | 2023-01-01 | 100 |
2 | 2023-01-02 | 120 |
3 | 2023-01-03 | 150 |
To calculate the difference between the current day's sales and the next day's sales:
SELECT
id,
sale_date,
sales_amount,
LEAD(sales_amount, 1) OVER (ORDER BY sale_date) AS next_sales,
LEAD(sales_amount, 1) OVER (ORDER BY sale_date) - sales_amount AS sales_difference
FROM sales;
Output:
id | sale_date | sales_amount | next_sales | sales_difference |
1 | 2023-01-01 | 100 | 120 | 20 |
2 | 2023-01-02 | 120 | 150 | 30 |
3 | 2023-01-03 | 150 | NULL | NULL |
2. Filling Missing Data
Suppose a students
table contains scores and you want to fill missing future scores with a default value (e.g., 0
):
student_id | exam_date | score |
1 | 2023-01-01 | 85 |
2 | 2023-01-02 | NULL |
3 | 2023-01-03 | 90 |
You can use LEAD
with a default value:
SELECT
student_id,
exam_date,
score,
LEAD(score, 1, 0) OVER (ORDER BY exam_date) AS next_score
FROM students;
Output:
student_id | exam_date | score | next_score |
1 | 2023-01-01 | 85 | NULL |
2 | 2023-01-02 | NULL | 90 |
3 | 2023-01-03 | 90 | 0 |
3. Using Partitions
If you want to compare data within groups (e.g., by region
), use PARTITION BY
.
Table sales_by_region
:
region | sale_date | sales_amount |
East | 2023-01-01 | 100 |
East | 2023-01-02 | 120 |
West | 2023-01-01 | 200 |
West | 2023-01-02 | 250 |
To calculate next day's sales within each region:
sql
Copy code
SELECT
region,
sale_date,
sales_amount,
LEAD(sales_amount) OVER (PARTITION BY region ORDER BY sale_date) AS next_sales
FROM sales_by_region;
Output:
region | sale_date | sales_amount | next_sales |
East | 2023-01-01 | 100 | 120 |
East | 2023-01-02 | 120 | NULL |
West | 2023-01-01 | 200 | 250 |
West | 2023-01-02 | 250 | NULL |
Key Points
LEAD
is for Next Rows: Fetches the value from the next row based on the specified order.- Use
PARTITION BY
for Groups: Processes rows within each group independently. - Custom Defaults: Specify a default value when there’s no next row.
- Performance: Ensure proper indexing for large datasets when using
ORDER BY
.
When to Use LEAD
- Trend Analysis: Compare successive values (e.g., stock prices, sales trends).
- Filling Gaps: Predict or backfill missing values.
- Event Durations: Calculate time differences between events.
- Comparing Groups: Analyze changes within partitions like regions or categories.
This function is powerful for row-based comparisons in SQL, making it easier to solve many real-world problems.