Advanced analytical functions like ROW_NUMBER(), RANK(), LAG(), LEAD(), and aggregate window functions with OVER() clause.
Introduction
Query Logical Processing
Window function syntax
Supplementary NOTE
Complete Window Functions Reference - Categorized
Overview
Window functions can be grouped into several categories based on their purpose and behavior. All can use the OVER()clause with PARTITION BY and ORDER BY.
Category 1: Aggregate Window Functions
Purpose: Perform calculations across rows and return aggregated values while preserving individual rows.
Basic Aggregates
- SUM()
- Calculates sum of values
- Example: Running totals, cumulative sums
SUM(amount) OVER (PARTITION BY customer_id ORDER BY date)- AVG()
- Calculates average of values
- Example: Moving averages, mean calculations
AVG(salary) OVER (PARTITION BY department ORDER BY hire_date)- MIN()
- Finds minimum value
- Example: Lowest price in category, earliest date
MIN(price) OVER (PARTITION BY category)- MAX()
- Finds maximum value
- Example: Highest score, latest date, peak values
MAX(revenue) OVER (PARTITION BY region ORDER BY month)- COUNT()
- Counts rows (or non-NULL values)
- Example: Running count, rows in partition
COUNT(*) OVER (PARTITION BY status ORDER BY created_date)
Statistical Aggregates
- STDDEV() / STDDEV_POP() / STDDEV_SAMP()
- Standard deviation (population or sample)
- Example: Measure variability within groups
STDDEV(test_score) OVER (PARTITION BY class_id)- VARIANCE() / VAR_POP() / VAR_SAMP()
- Variance (population or sample)
- Example: Data spread analysis
VARIANCE(response_time) OVER (PARTITION BY server_id)
Advanced Aggregates
- GROUP_CONCAT() (MySQL specific)
- Concatenates values into a single string
- Example: Combine multiple values with separator
GROUP_CONCAT(product_name ORDER BY price) OVER (PARTITION BY category_id)- STRING_AGG() (PostgreSQL, SQL Server)
- Similar to GROUP_CONCAT, concatenates strings
- Example: Build comma-separated lists
STRING_AGG(name, ', ') OVER (PARTITION BY department)- JSON_ARRAYAGG() / JSON_OBJECTAGG() (MySQL 8.0+)
- Aggregates values into JSON arrays or objects
- Example: Create JSON structures from rows
JSON_ARRAYAGG(product_id) OVER (PARTITION BY category)
Category 2: Ranking Window Functions
Purpose: Assign ranks or row numbers based on ordering, with different tie-handling strategies.
- ROW_NUMBER()
- Assigns unique sequential integers
- Ties get different numbers (arbitrary order)
- Example: Number rows 1, 2, 3, 4...
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)- Result: 1, 2, 3, 4, 5...
- RANK()
- Assigns ranking with gaps after ties
- Ties get same rank, next rank skips numbers
- Example: Olympic medals (1st, 1st, 3rd, 4th)
RANK() OVER (PARTITION BY category ORDER BY score DESC)- Result: 1, 2, 2, 4, 5... (notice gap from 2 to 4)
- DENSE_RANK()
- Assigns ranking without gaps
- Ties get same rank, next rank continues sequentially
- Example: Grade rankings (A, A, B, C)
DENSE_RANK() OVER (PARTITION BY region ORDER BY revenue DESC)- Result: 1, 2, 2, 3, 4... (no gap, continues from 2 to 3)
- NTILE(n)
- Divides rows into n equal buckets/groups
- Example: Quartiles (4 groups), deciles (10 groups)
NTILE(4) OVER (ORDER BY salary)creates quartiles- Use case: Segment customers into equal-sized groups
- PERCENT_RANK()
- Calculates relative rank as percentage (0 to 1)
- Formula: (rank - 1) / (total rows - 1)
- Example: Percentile rankings
PERCENT_RANK() OVER (ORDER BY exam_score)- Result: 0.0, 0.25, 0.5, 0.75, 1.0
- CUME_DIST()
- Cumulative distribution (percentage of rows ≤ current row)
- Formula: (rows ≤ current) / (total rows)
- Example: "This score beats X% of all scores"
CUME_DIST() OVER (ORDER BY salary)- Result: 0.2, 0.4, 0.6, 0.8, 1.0
Category 3: Value/Offset Window Functions
Purpose: Access values from other rows relative to the current row.
- LAG(column, offset, default)
- Accesses value from previous row(s)
- offset: how many rows back (default = 1)
- default: value if no previous row exists
- Example: Compare to previous month
LAG(revenue, 1, 0) OVER (ORDER BY month)gets previous month's revenue- LEAD(column, offset, default)
- Accesses value from next row(s)
- offset: how many rows forward (default = 1)
- default: value if no next row exists
- Example: Compare to next quarter
LEAD(sales, 1) OVER (ORDER BY quarter)gets next quarter's sales- FIRST_VALUE(column)
- Returns first value in the window frame
- Example: Compare each row to the first/earliest value
FIRST_VALUE(price) OVER (PARTITION BY product ORDER BY date)- Gets the initial/opening price
- LAST_VALUE(column)
- Returns last value in the window frame
- CAUTION: Default frame is up to current row, not entire partition
- Must specify:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING - Example: Compare to most recent/final value
LAST_VALUE(stock_price) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)- NTH_VALUE(column, n)
- Returns the nth value in the window frame
- Example: Get 2nd place, 3rd highest, etc.
NTH_VALUE(salary, 2) OVER (PARTITION BY dept ORDER BY salary DESC)- Gets the 2nd highest salary
Category 4: Distribution Functions
Purpose: Analyze data distribution and percentiles.
- PERCENTILE_CONT(percentile) (PostgreSQL, SQL Server, Oracle)
- Continuous percentile (interpolates between values)
- Returns exact percentile even if value doesn't exist
- Example: Median (50th percentile), 95th percentile
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary)calculates median- PERCENTILE_DISC(percentile) (PostgreSQL, SQL Server, Oracle)
- Discrete percentile (returns actual value from dataset)
- Returns closest actual value at or above percentile
- Example: True median from dataset
PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY price)finds 75th percentile value
Comparison Table: Ranking Functions
Function | Handles Ties | Gaps After Ties | Sequential | Use Case |
ROW_NUMBER() | No (arbitrary order) | N/A | Yes (1,2,3,4) | Unique numbering, pagination |
RANK() | Yes (same rank) | Yes | No (1,2,2,4) | Competition ranking, top-N with ties |
DENSE_RANK() | Yes (same rank) | No | Yes (1,2,2,3) | Grade levels, continuous rankings |
NTILE(n) | Distributes evenly | N/A | Yes | Equal-sized buckets, quartiles |
Visual Example:
Scores: 95, 90, 90, 85, 80
ROW_NUMBER(): 1, 2, 3, 4, 5 (arbitrary for ties)
RANK(): 1, 2, 2, 4, 5 (gap after tie)
DENSE_RANK(): 1, 2, 2, 3, 4 (no gap)
NTILE(2): 1, 1, 1, 2, 2 (split into 2 groups)
Comparison Table: Value Functions
Function | Direction | Offset | Use Case |
LAG() | Backward (previous) | Configurable | Period-over-period change, previous value |
LEAD() | Forward (next) | Configurable | Future comparison, next value |
FIRST_VALUE() | Beginning of frame | Fixed | Compare to baseline, initial value |
LAST_VALUE() | End of frame | Fixed | Compare to final, most recent value |
NTH_VALUE() | Specific position | Specified by n | Get 2nd place, 3rd item, etc. |
Window Frame Specification
All window functions can use window frames to control which rows are included:
Syntax:
function() OVER (
PARTITION BY column
ORDER BY column
{ROWS | RANGE} BETWEEN frame_start AND frame_end
)
Frame Boundaries:
UNBOUNDED PRECEDING- From the start of partitionn PRECEDING- n rows/values before currentCURRENT ROW- The current rown FOLLOWING- n rows/values after currentUNBOUNDED FOLLOWING- To the end of partition
Frame Types:
- ROWS: Physical row count
- RANGE: Logical value-based (considers ORDER BY values)
Common Frames:
-- Running total (default with ORDER BY)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Entire partition (default without ORDER BY)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-- Last 3 rows (moving window)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
-- Centered window
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-- Last 7 days of data
RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW
Function Availability by Database
Function | MySQL 8.0+ | PostgreSQL | SQL Server | Oracle | SQLite 3.25+ |
SUM/AVG/MIN/MAX/COUNT | ✅ | ✅ | ✅ | ✅ | ✅ |
ROW_NUMBER/RANK/DENSE_RANK | ✅ | ✅ | ✅ | ✅ | ✅ |
NTILE | ✅ | ✅ | ✅ | ✅ | ✅ |
LAG/LEAD | ✅ | ✅ | ✅ | ✅ | ✅ |
FIRST_VALUE/LAST_VALUE | ✅ | ✅ | ✅ | ✅ | ✅ |
NTH_VALUE | ✅ | ✅ | ✅ | ✅ | ✅ |
PERCENT_RANK/CUME_DIST | ✅ | ✅ | ✅ | ✅ | ✅ |
PERCENTILE_CONT/DISC | ❌ | ✅ | ✅ | ✅ | ❌ |
STDDEV/VARIANCE | ✅ | ✅ | ✅ | ✅ | ❌ |
GROUP_CONCAT | ✅ (MySQL) | ❌ | ❌ | ❌ | ✅ |
STRING_AGG | ❌ | ✅ | ✅ | ❌ | ✅ |
Quick Selection Guide
When to Use Each Category:
Aggregate Functions - When you need:
- Running totals or cumulative sums
- Moving averages or rolling calculations
- Comparisons to group totals/averages
- Statistical analysis within groups
Ranking Functions - When you need:
- Top N per category
- Percentile rankings
- Position within ordered groups
- Equal-sized buckets/segments
Value Functions - When you need:
- Period-over-period comparisons (LAG/LEAD)
- Growth rates (current vs previous)
- First/last values in a sequence
- Access to specific positioned values
Distribution Functions - When you need:
- Median calculations
- Percentile thresholds
- Statistical distribution analysis
- Outlier detection
Common Patterns Summary
Pattern 1: Running Total
SUM(amount) OVER (PARTITION BY customer_id ORDER BY date)
Pattern 2: Moving Average
AVG(sales) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
Pattern 3: Rank Within Category
RANK() OVER (PARTITION BY category ORDER BY sales DESC)
Pattern 4: Top N per Group
WHERE ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) <= 5
Pattern 5: Period-over-Period Change
value - LAG(value) OVER (ORDER BY date) AS change
Pattern 6: Percentage of Total
value / SUM(value) OVER (PARTITION BY category) * 100 AS pct_of_total
Pattern 7: First and Last Comparison
SELECT
FIRST_VALUE(price) OVER (ORDER BY date) AS opening_price,
LAST_VALUE(price) OVER (ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS closing_price
Key Takeaways
- Aggregate functions compute across rows while preserving detail
- Ranking functions assign positions with different tie-handling
- Value functions access other rows relative to current position
- All functions use OVER() clause with optional PARTITION BY and ORDER BY
- Window frames control which rows are included in calculations
- Multiple functions can be combined in a single query
- Performance varies; use appropriate indexes on PARTITION BY and ORDER BY columns