Quick reference notes and key concepts for MySQL window functions implementation.
SQL Window Functions Study Guide - Sakila Database
What Are Window Functions?
Window functions perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions with GROUP BY (which collapse rows), window functions retain all individual rows while adding calculated values.
Key Components of Window Function Syntax:
<window_function>() OVER (
PARTITION BY column_name
ORDER BY column_name
frame_clause
)
Basic Aggregates
- SUM()
- Calculates sum of values
- Example: Running totals, cumulative sums
SUM(amount) OVER (PARTITION BY customer_id ORDER BY date)- AVG()
- Calculates average of values
- Example: Moving averages, mean calculations
AVG(salary) OVER (PARTITION BY department ORDER BY hire_date)
- MIN()
- Finds minimum value
- Example: Lowest price in category, earliest date
MIN(price) OVER (PARTITION BY category)- MAX()
- Finds maximum value
- Example: Highest score, latest date, peak values
MAX(revenue) OVER (PARTITION BY region ORDER BY month)
- COUNT()
- Counts rows (or non-NULL values)
- Example: Running count, rows in partition
COUNT(*) OVER (PARTITION BY status ORDER BY created_date)
1. SUM() Window Function
2. AVG() Window Function
3. MIN() Window Function
4. MAX() Window Function
5. COUNT() Window Function
Key Concepts Summary
Window Frame Types
- ROWS: Physical rows
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW= last 3 rows- RANGE: Logical values
RANGE BETWEEN INTERVAL 1 DAY PRECEDING AND CURRENT ROW= last 2 days of data
Default Window Frames
- With ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - Without ORDER BY:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING(entire partition)
Common Patterns
- Running Total:
SUM(...) OVER (PARTITION BY ... ORDER BY ...) - Percentage of Total:
value / SUM(value) OVER (PARTITION BY ...) * 100 - Moving Average:
AVG(...) OVER (ORDER BY ... ROWS BETWEEN n PRECEDING AND CURRENT ROW) - Rank Within Group: Compare individual values to MIN/MAX/AVG of partition
- Sequential Numbering:
COUNT(*) OVER (PARTITION BY ... ORDER BY ...)
Window Functions vs GROUP BY
Aspect | GROUP BY | Window Functions |
Row output | Collapses to aggregates | Retains all detail rows |
Calculation scope | Entire group | Flexible window |
Multiple aggregates | Limited options | Unlimited combinations |
Access to details | Lost | Preserved |
Practice Tips
- Start Simple: Begin with no ORDER BY to see partition totals
- Add ORDER BY: Observe how running calculations work
- Experiment with Frames: Try different ROWS/RANGE specifications
- Combine Functions: Use multiple window functions in same query
- Compare Results: Run with and without window functions to understand differences
Common Use Cases
- Financial: Running totals, moving averages, period-over-period comparisons
- Marketing: Customer segmentation, cohort analysis, funnel metrics
- Operations: Inventory levels, capacity utilization, queue lengths
- Analytics: Trend analysis, anomaly detection, forecasting inputs
To prepare a time series data fetch for Apple and Google using Yahoo Finance data, and then structure it into a portfolio using SQL with LEAD and LAG functions, follow these steps. I'll assume you have a table containing stock prices for Apple and Google, and you want to calculate daily returns, moving averages, or other metrics using LEAD and LAG.
Step 1: Table Structure
Assume you have a table named stock_prices with the following structure:
CREATE TABLE stock_prices (
date DATE,
ticker VARCHAR(10),
open_price FLOAT,
high_price FLOAT,
low_price FLOAT,
close_price FLOAT,
volume BIGINT
);Step 2: Fetch Data for Apple and Google
You can fetch the data for Apple (AAPL) and Google (GOOGL) from Yahoo Finance and insert it into the stock_prices table. For simplicity, I'll assume the data is already populated.
Step 3: Use LEAD and LAG to Calculate Daily Returns
The LEAD and LAG functions can be used to calculate daily returns, moving averages, or other metrics. Here's an example query to calculate daily returns for both stocks:
sql
Copy
WITH stock_data AS (
SELECT
date,
ticker,
close_price,
LAG(close_price) OVER (PARTITION BY ticker ORDER BY date) AS previous_close_price,
LEAD(close_price) OVER (PARTITION BY ticker ORDER BY date) AS next_close_price
FROM
stock_prices
WHERE
ticker IN ('AAPL', 'GOOGL')
)
SELECT
date,
ticker,
close_price,
previous_close_price,
next_close_price,
(close_price - previous_close_price) / previous_close_price * 100 AS daily_return
FROM
stock_data
ORDER BY
ticker, date;Step 4: Create a Portfolio
To create a portfolio, you can combine the data for Apple and Google and calculate portfolio metrics. For example, you can calculate the weighted average return if you have a specific allocation for each stock.
Assume a portfolio with 60% Apple and 40% Google:
sql
Copy
WITH stock_data AS (
SELECT
date,
ticker,
close_price,
LAG(close_price) OVER (PARTITION BY ticker ORDER BY date) AS previous_close_price
FROM
stock_prices
WHERE
ticker IN ('AAPL', 'GOOGL')
),
daily_returns AS (
SELECT
date,
ticker,
(close_price - previous_close_price) / previous_close_price * 100 AS daily_return
FROM
stock_data
)
SELECT
date,
SUM(CASE WHEN ticker = 'AAPL' THEN daily_return * 0.6 ELSE 0 END) AS aapl_weighted_return,
SUM(CASE WHEN ticker = 'GOOGL' THEN daily_return * 0.4 ELSE 0 END) AS googl_weighted_return,
SUM(CASE WHEN ticker = 'AAPL' THEN daily_return * 0.6 ELSE 0 END) +
SUM(CASE WHEN ticker = 'GOOGL' THEN daily_return * 0.4 ELSE 0 END) AS portfolio_return
FROM
daily_returns
GROUP BY
date
ORDER BY
date;Step 5: Export or Visualize
Once you have the portfolio data, you can export it to a CSV or visualize it using tools like Python, Excel, or Tableau.
Notes:
- Replace
stock_priceswith the actual table name if it's different. - Adjust the portfolio weights (60% and 40%) as needed.
- Ensure the data is clean and free of missing values before performing calculations.
Let me know if you need further assistance!
there is first_value, and last value... give me all the same category of query of this type in sql in list for my
In SQL, there are several window functions that are commonly used for time series analysis, including LEAD, LAG, FIRST_VALUE, and LAST_VALUE. Below is a comprehensive list of such functions, along with their descriptions and example use cases:
1. LEAD
- Description: Accesses the value of a column in the next row within the same partition.
- Use Case: Calculate the next day's price or forward-looking metrics.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
LEAD(close_price) OVER (PARTITION BY ticker ORDER BY date) AS next_day_close
FROM
stock_prices;2. LAG
- Description: Accesses the value of a column in the previous row within the same partition.
- Use Case: Calculate the previous day's price or backward-looking metrics.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
LAG(close_price) OVER (PARTITION BY ticker ORDER BY date) AS previous_day_close
FROM
stock_prices;3. FIRST_VALUE
- Description: Accesses the first value in an ordered partition.
- Use Case: Track the starting value of a time series.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
FIRST_VALUE(close_price) OVER (PARTITION BY ticker ORDER BY date) AS first_close_price
FROM
stock_prices;4. LAST_VALUE
- Description: Accesses the last value in an ordered partition.
- Use Case: Track the most recent value of a time series.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
LAST_VALUE(close_price) OVER (PARTITION BY ticker ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_close_price
FROM
stock_prices;5. ROW_NUMBER
- Description: Assigns a unique sequential integer to each row within a partition.
- Use Case: Rank rows or identify the order of events.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY date) AS row_num
FROM
stock_prices;6. RANK
- Description: Assigns a rank to each row within a partition, with gaps for ties.
- Use Case: Rank rows based on a metric (e.g., highest prices).
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
RANK() OVER (PARTITION BY ticker ORDER BY close_price DESC) AS price_rank
FROM
stock_prices;7. DENSE_RANK
- Description: Assigns a rank to each row within a partition, without gaps for ties.
- Use Case: Rank rows without skipping ranks for ties.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
DENSE_RANK() OVER (PARTITION BY ticker ORDER BY close_price DESC) AS dense_price_rank
FROM
stock_prices;8. NTILE
- Description: Divides rows into a specified number of buckets.
- Use Case: Segment data into quantiles (e.g., quartiles, deciles).
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
NTILE(4) OVER (PARTITION BY ticker ORDER BY close_price) AS quartile
FROM
stock_prices;9. SUM (Window Function)
- Description: Calculates a cumulative sum over a partition.
- Use Case: Track running totals (e.g., cumulative returns).
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
SUM(close_price) OVER (PARTITION BY ticker ORDER BY date) AS cumulative_sum
FROM
stock_prices;10. AVG (Window Function)
- Description: Calculates a moving average over a partition.
- Use Case: Track rolling averages (e.g., 7-day moving average).
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
AVG(close_price) OVER (PARTITION BY ticker ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
FROM
stock_prices;11. MIN (Window Function)
- Description: Finds the minimum value in a partition.
- Use Case: Track the lowest value in a rolling window.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
MIN(close_price) OVER (PARTITION BY ticker ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS min_7d
FROM
stock_prices;12. MAX (Window Function)
- Description: Finds the maximum value in a partition.
- Use Case: Track the highest value in a rolling window.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
MAX(close_price) OVER (PARTITION BY ticker ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS max_7d
FROM
stock_prices;13. PERCENT_RANK
- Description: Calculates the relative rank of a row within a partition as a percentage.
- Use Case: Compare values as a percentage of the partition.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
PERCENT_RANK() OVER (PARTITION BY ticker ORDER BY close_price) AS pct_rank
FROM
stock_prices;14. CUME_DIST
- Description: Calculates the cumulative distribution of a value within a partition.
- Use Case: Understand the distribution of values in a partition.
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
CUME_DIST() OVER (PARTITION BY ticker ORDER BY close_price) AS cumulative_dist
FROM
stock_prices;15. NTH_VALUE
- Description: Accesses the value of a column at a specific row offset within a partition.
- Use Case: Retrieve the value at a specific position (e.g., 5th row).
- Example:
sql
Copy
SELECT
date,
ticker,
close_price,
NTH_VALUE(close_price, 5) OVER (PARTITION BY ticker ORDER BY date) AS fifth_close_price
FROM
stock_prices;Summary
These functions are powerful tools for time series analysis in SQL. You can use them to calculate metrics like:
- Daily returns (
LEAD,LAG) - Rolling averages (
AVG) - Cumulative sums (
SUM) - Rankings (
RANK,DENSE_RANK) - Distributions (
PERCENT_RANK,CUME_DIST)
Let me know if you need further clarification or examples!
list all the function in window for me
Here is a comprehensive list of all window functions in SQL, categorized for clarity. These functions are used with the OVER() clause to perform calculations across a set of rows related to the current row.
Aggregate Window Functions
These functions perform calculations across a set of rows and return a single value for each row.
- SUM()
- Calculates the sum of values in a window.
- Example:
SUM(column) OVER (PARTITION BY group ORDER BY order_column) - AVG()
- Calculates the average of values in a window.
- Example:
AVG(column) OVER (PARTITION BY group ORDER BY order_column) - MIN()
- Finds the minimum value in a window.
- Example:
MIN(column) OVER (PARTITION BY group ORDER BY order_column) - MAX()
- Finds the maximum value in a window.
- Example:
MAX(column) OVER (PARTITION BY group ORDER BY order_column) - COUNT()
- Counts the number of rows in a window.
- Example:
COUNT(column) OVER (PARTITION BY group ORDER BY order_column)
Ranking Window Functions
These functions assign ranks or row numbers to rows within a partition.
- ROW_NUMBER()
- Assigns a unique sequential integer to each row within a partition.
- Example:
ROW_NUMBER() OVER (PARTITION BY group ORDER BY order_column) - RANK()
- Assigns a rank to each row within a partition, with gaps for ties.
- Example:
RANK() OVER (PARTITION BY group ORDER BY order_column) - DENSE_RANK()
- Assigns a rank to each row within a partition, without gaps for ties.
- Example:
DENSE_RANK() OVER (PARTITION BY group ORDER BY order_column) - NTILE()
- Divides rows into a specified number of buckets (e.g., quartiles, deciles).
- Example:
NTILE(4) OVER (PARTITION BY group ORDER BY order_column) - PERCENT_RANK()
- Calculates the relative rank of a row as a percentage (0 to 1).
- Example:
PERCENT_RANK() OVER (PARTITION BY group ORDER BY order_column) - CUME_DIST()
- Calculates the cumulative distribution of a value within a partition (0 to 1).
- Example:
CUME_DIST() OVER (PARTITION BY group ORDER BY order_column)
Value Window Functions
These functions access values from other rows within the partition.
- LEAD()
- Accesses the value of a column in the next row within the partition.
- Example:
LEAD(column, 1) OVER (PARTITION BY group ORDER BY order_column) - LAG()
- Accesses the value of a column in the previous row within the partition.
- Example:
LAG(column, 1) OVER (PARTITION BY group ORDER BY order_column) - FIRST_VALUE()
- Accesses the first value in the window.
- Example:
FIRST_VALUE(column) OVER (PARTITION BY group ORDER BY order_column) - LAST_VALUE()
- Accesses the last value in the window.
- Example:
LAST_VALUE(column) OVER (PARTITION BY group ORDER BY order_column ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - NTH_VALUE()
- Accesses the value of a column at a specific row offset within the partition.
- Example:
NTH_VALUE(column, 3) OVER (PARTITION BY group ORDER BY order_column)
Analytic Window Functions
These functions perform advanced calculations over a window.
- SUM() with Window Frames
- Calculates a cumulative or rolling sum.
- Example:
SUM(column) OVER (PARTITION BY group ORDER BY order_column ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) - AVG() with Window Frames
- Calculates a cumulative or rolling average.
- Example:
AVG(column) OVER (PARTITION BY group ORDER BY order_column ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) - MIN() with Window Frames
- Finds the minimum value in a rolling window.
- Example:
MIN(column) OVER (PARTITION BY group ORDER BY order_column ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) - MAX() with Window Frames
- Finds the maximum value in a rolling window.
- Example:
MAX(column) OVER (PARTITION BY group ORDER BY order_column ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
Statistical Window Functions
These functions are used for statistical analysis.
- STDDEV()
- Calculates the standard deviation of values in a window.
- Example:
STDDEV(column) OVER (PARTITION BY group ORDER BY order_column) - VARIANCE()
- Calculates the variance of values in a window.
- Example:
VARIANCE(column) OVER (PARTITION BY group ORDER BY order_column)
Other Useful Window Functions
- COALESCE()
- Returns the first non-null value in a list of arguments.
- Example:
COALESCE(column1, column2) OVER (PARTITION BY group ORDER BY order_column) - NULLIF()
- Returns
NULLif two arguments are equal; otherwise, returns the first argument. - Example:
NULLIF(column1, column2) OVER (PARTITION BY group ORDER BY order_column)
Window Frame Clauses
These clauses define the subset of rows used in calculations for window functions.
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- Includes all rows from the start of the partition to the current row.
- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- Includes all rows from the current row to the end of the partition.
- ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
- Includes rows within a fixed range around the current row.
- RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
- Includes rows within a specific range (e.g., time-based).