Introduction
Window functions are SQL's superpower for analytics. They let you calculate rankings, running totals, and comparisons across rows without collapsing your data like GROUP BY does.
Real-world use cases:
- Flipkart: Rank products by sales within each category
- Swiggy: Calculate delivery time compared to previous order
- Zomato: Running total of daily revenue to track monthly progress
If you've ever needed to say "show me the top 3 products per category" or "calculate month-to-date sales," window functions are your answer.
What Are Window Functions?
Window functions perform calculations across a "window" of rows related to the current row. Unlike GROUP BY (which squashes rows into summaries), window functions keep all rows while adding calculated columns.
Think of window functions like looking through a sliding window at your data. GROUP BY is like taking a photo of groups (you lose individual row details). Window functions are like a video camera that pans across rows, calculating as it goes while keeping everything visible.
Key difference from GROUP BY:
-- GROUP BY: Collapses to 1 row per customer
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id;
-- Window Function: Keeps all rows, adds count to each
SELECT
customer_id,
order_id,
order_date,
COUNT(*) OVER (PARTITION BY customer_id) as order_count
FROM orders;โ ๏ธ CheckpointQuiz error: Missing or invalid options array
ROW_NUMBER(): Sequential Numbering
ROW_NUMBER() assigns a unique number to each row within a partition.
Syntax:
ROW_NUMBER() OVER (
PARTITION BY column -- Optional: Create separate sequences per group
ORDER BY column -- Required: Determines numbering order
)Example - Number Each Customer's Orders:
SELECT
customer_id,
order_id,
order_date,
order_value,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) as order_sequence
FROM orders;Output: | customer_id | order_id | order_date | order_value | order_sequence | |-------------|----------|------------|-------------|----------------| | 101 | 1001 | 2024-01-05 | 500 | 1 | | 101 | 1002 | 2024-01-12 | 750 | 2 | | 101 | 1003 | 2024-02-03 | 600 | 3 | | 102 | 2001 | 2024-01-08 | 300 | 1 | | 102 | 2002 | 2024-01-20 | 450 | 2 |
Notice order_sequence restarts at 1 for each customer because of PARTITION BY customer_id.
RANK() and DENSE_RANK(): Handling Ties
When rows have the same value (ties), ROW_NUMBER() still gives unique numbers. RANK() and DENSE_RANK() handle ties differently.
Difference:
- ROW_NUMBER(): Always unique (1, 2, 3, 4, 5...)
- RANK(): Allows ties, skips next number (1, 2, 2, 4, 5...)
- DENSE_RANK(): Allows ties, no gaps (1, 2, 2, 3, 4...)
Example - Rank Products by Sales:
SELECT
product_name,
sales,
ROW_NUMBER() OVER (ORDER BY sales DESC) as row_num,
RANK() OVER (ORDER BY sales DESC) as rank,
DENSE_RANK() OVER (ORDER BY sales DESC) as dense_rank
FROM products;Output: | product_name | sales | row_num | rank | dense_rank | |--------------|-------|---------|------|------------| | iPhone | 1000 | 1 | 1 | 1 | | MacBook | 800 | 2 | 2 | 2 | | iPad | 800 | 3 | 2 | 2 | | AirPods | 600 | 4 | 4 | 3 |
Notice: iPad and MacBook tied at 800. RANK skips to 4, DENSE_RANK continues at 3.
When to use which:
- ROW_NUMBER() when you need exactly N items (e.g., "top 5 orders")
- RANK() for leaderboards where ties share rank but affect next position
- DENSE_RANK() when you want top N ranks (e.g., "gold, silver, bronze" should be ranks 1,2,3 with no gaps)
LAG() and LEAD(): Compare with Previous/Next Rows
LAG() and LEAD() access values from previous or next rows. Perfect for calculating differences between rows.
Syntax:
LAG(column, offset, default_value) OVER (ORDER BY column)
LEAD(column, offset, default_value) OVER (ORDER BY column)Example - Swiggy Delivery Time Comparison:
Compare each delivery time with the previous order:
SELECT
order_id,
delivery_time_minutes,
LAG(delivery_time_minutes, 1) OVER (ORDER BY order_date) as prev_delivery_time,
delivery_time_minutes - LAG(delivery_time_minutes, 1) OVER (ORDER BY order_date) as time_diff
FROM deliveries
WHERE delivery_partner_id = 'DP-101';Output: | order_id | delivery_time | prev_delivery_time | time_diff | |----------|---------------|-------------------|-----------| | 1001 | 25 | NULL | NULL | | 1002 | 30 | 25 | +5 | | 1003 | 22 | 30 | -8 | | 1004 | 28 | 22 | +6 |
Default Value: The first row has no previous row, so LAG() returns NULL unless you specify a default: LAG(column, 1, 0) returns 0 instead of NULL.
Running Totals with SUM() OVER
Window functions can turn aggregate functions like SUM(), AVG(), COUNT() into running calculations.
Example - Flipkart Daily Revenue Running Total:
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) as running_total,
AVG(daily_revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7days
FROM daily_sales
ORDER BY order_date;Output: | order_date | daily_revenue | running_total | moving_avg_7days | |------------|---------------|---------------|------------------| | 2024-01-01 | 50000 | 50000 | 50000 | | 2024-01-02 | 60000 | 110000 | 55000 | | 2024-01-03 | 55000 | 165000 | 55000 | | 2024-01-04 | 70000 | 235000 | 58750 |
Frame Specification:
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW= Last 7 days (6 before + current)ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW= From start to current (running total)ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING= Previous, current, and next row
Real-World Example: Zomato Top 3 Restaurants per City
Scenario: Find the top 3 highest-rated restaurants in each city.
WITH ranked_restaurants AS (
SELECT
city,
restaurant_name,
rating,
total_orders,
ROW_NUMBER() OVER (
PARTITION BY city
ORDER BY rating DESC, total_orders DESC
) as city_rank
FROM restaurants
)
SELECT
city,
restaurant_name,
rating,
total_orders,
city_rank
FROM ranked_restaurants
WHERE city_rank <= 3
ORDER BY city, city_rank;Output: | city | restaurant_name | rating | total_orders | city_rank | |------|----------------|--------|--------------|-----------| | Bangalore | Truffles | 4.5 | 12000 | 1 | | Bangalore | Empire | 4.4 | 15000 | 2 | | Bangalore | MTR | 4.3 | 18000 | 3 | | Delhi | Karim's | 4.6 | 20000 | 1 | | Delhi | Moti Mahal | 4.5 | 16000 | 2 | | Delhi | Bukhara | 4.4 | 10000 | 3 |
How it works:
PARTITION BY citycreates separate rankings per cityORDER BY rating DESC, total_orders DESCranks by rating (ties broken by order count)WHERE city_rank <= 3keeps only top 3 per city
Common Mistake: You can't filter window functions in the WHERE clause directly. Use a CTE or subquery first, then filter:
-- โ This doesn't work:
SELECT *, ROW_NUMBER() OVER (...) as rn
FROM table
WHERE rn = 1 -- Error! Window function in WHERE
-- โ
This works:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (...) as rn
FROM table
) WHERE rn = 1Performance Tips
-
Add indexes on PARTITION BY and ORDER BY columns:
query.sqlSQLCREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date); -
Limit window frame size for large datasets:
query.sqlSQL-- Faster: 7-day window SUM(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) -- Slower: Unlimited window on millions of rows SUM(sales) OVER (ORDER BY date) -
Use CTEs to avoid recalculating window functions:
query.sqlSQLWITH ranked AS ( SELECT *, ROW_NUMBER() OVER (...) as rn FROM orders ) SELECT * FROM ranked WHERE rn <= 10;
Common Use Cases
1. Top N per Group:
-- Top 5 customers by spend per city
ROW_NUMBER() OVER (PARTITION BY city ORDER BY total_spend DESC) <= 52. Running Totals:
-- Month-to-date sales
SUM(daily_sales) OVER (PARTITION BY YEAR(date), MONTH(date) ORDER BY date)3. Previous/Next Comparisons:
-- Growth vs previous month
(current_month_sales - LAG(current_month_sales) OVER (ORDER BY month)) / LAG(current_month_sales) * 1004. Moving Averages:
-- 30-day moving average
AVG(value) OVER (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)โ ๏ธ SummarySection error: Missing or invalid items array
Received: {"hasItems":false,"isArray":false}
โ ๏ธ FinalQuiz error: Missing or invalid questions array