Topic 14 of

SQL Window Functions Explained

Master SQL window functions with real analytics examples from Flipkart and Swiggy

๐Ÿ“šAdvanced
โฑ๏ธ25
โœ…2 quizzes

Introduction

Window functions are SQL's superpower for analytics. They let you calculate rankings, running totals, and comparisons across rows without collapsing your data like GROUP BY does.

Real-world use cases:

  • Flipkart: Rank products by sales within each category
  • Swiggy: Calculate delivery time compared to previous order
  • Zomato: Running total of daily revenue to track monthly progress

If you've ever needed to say "show me the top 3 products per category" or "calculate month-to-date sales," window functions are your answer.

What Are Window Functions?

Window functions perform calculations across a "window" of rows related to the current row. Unlike GROUP BY (which squashes rows into summaries), window functions keep all rows while adding calculated columns.

Think of it this way...

Think of window functions like looking through a sliding window at your data. GROUP BY is like taking a photo of groups (you lose individual row details). Window functions are like a video camera that pans across rows, calculating as it goes while keeping everything visible.

Key difference from GROUP BY:

query.sqlSQL
-- GROUP BY: Collapses to 1 row per customer
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id;

-- Window Function: Keeps all rows, adds count to each
SELECT
  customer_id,
  order_id,
  order_date,
  COUNT(*) OVER (PARTITION BY customer_id) as order_count
FROM orders;

โš ๏ธ CheckpointQuiz error: Missing or invalid options array

ROW_NUMBER(): Sequential Numbering

ROW_NUMBER() assigns a unique number to each row within a partition.

Syntax:

query.sqlSQL
ROW_NUMBER() OVER (
  PARTITION BY column  -- Optional: Create separate sequences per group
  ORDER BY column      -- Required: Determines numbering order
)

Example - Number Each Customer's Orders:

query.sqlSQL
SELECT
  customer_id,
  order_id,
  order_date,
  order_value,
  ROW_NUMBER() OVER (
    PARTITION BY customer_id
    ORDER BY order_date
  ) as order_sequence
FROM orders;

Output: | customer_id | order_id | order_date | order_value | order_sequence | |-------------|----------|------------|-------------|----------------| | 101 | 1001 | 2024-01-05 | 500 | 1 | | 101 | 1002 | 2024-01-12 | 750 | 2 | | 101 | 1003 | 2024-02-03 | 600 | 3 | | 102 | 2001 | 2024-01-08 | 300 | 1 | | 102 | 2002 | 2024-01-20 | 450 | 2 |

Notice order_sequence restarts at 1 for each customer because of PARTITION BY customer_id.

RANK() and DENSE_RANK(): Handling Ties

When rows have the same value (ties), ROW_NUMBER() still gives unique numbers. RANK() and DENSE_RANK() handle ties differently.

Difference:

  • ROW_NUMBER(): Always unique (1, 2, 3, 4, 5...)
  • RANK(): Allows ties, skips next number (1, 2, 2, 4, 5...)
  • DENSE_RANK(): Allows ties, no gaps (1, 2, 2, 3, 4...)

Example - Rank Products by Sales:

query.sqlSQL
SELECT
  product_name,
  sales,
  ROW_NUMBER() OVER (ORDER BY sales DESC) as row_num,
  RANK() OVER (ORDER BY sales DESC) as rank,
  DENSE_RANK() OVER (ORDER BY sales DESC) as dense_rank
FROM products;

Output: | product_name | sales | row_num | rank | dense_rank | |--------------|-------|---------|------|------------| | iPhone | 1000 | 1 | 1 | 1 | | MacBook | 800 | 2 | 2 | 2 | | iPad | 800 | 3 | 2 | 2 | | AirPods | 600 | 4 | 4 | 3 |

Notice: iPad and MacBook tied at 800. RANK skips to 4, DENSE_RANK continues at 3.

Tip

When to use which:

  • ROW_NUMBER() when you need exactly N items (e.g., "top 5 orders")
  • RANK() for leaderboards where ties share rank but affect next position
  • DENSE_RANK() when you want top N ranks (e.g., "gold, silver, bronze" should be ranks 1,2,3 with no gaps)

LAG() and LEAD(): Compare with Previous/Next Rows

LAG() and LEAD() access values from previous or next rows. Perfect for calculating differences between rows.

Syntax:

query.sqlSQL
LAG(column, offset, default_value) OVER (ORDER BY column)
LEAD(column, offset, default_value) OVER (ORDER BY column)

Example - Swiggy Delivery Time Comparison:

Compare each delivery time with the previous order:

query.sqlSQL
SELECT
  order_id,
  delivery_time_minutes,
  LAG(delivery_time_minutes, 1) OVER (ORDER BY order_date) as prev_delivery_time,
  delivery_time_minutes - LAG(delivery_time_minutes, 1) OVER (ORDER BY order_date) as time_diff
FROM deliveries
WHERE delivery_partner_id = 'DP-101';

Output: | order_id | delivery_time | prev_delivery_time | time_diff | |----------|---------------|-------------------|-----------| | 1001 | 25 | NULL | NULL | | 1002 | 30 | 25 | +5 | | 1003 | 22 | 30 | -8 | | 1004 | 28 | 22 | +6 |

Info

Default Value: The first row has no previous row, so LAG() returns NULL unless you specify a default: LAG(column, 1, 0) returns 0 instead of NULL.

Running Totals with SUM() OVER

Window functions can turn aggregate functions like SUM(), AVG(), COUNT() into running calculations.

Example - Flipkart Daily Revenue Running Total:

query.sqlSQL
SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY order_date) as running_total,
  AVG(daily_revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7days
FROM daily_sales
ORDER BY order_date;

Output: | order_date | daily_revenue | running_total | moving_avg_7days | |------------|---------------|---------------|------------------| | 2024-01-01 | 50000 | 50000 | 50000 | | 2024-01-02 | 60000 | 110000 | 55000 | | 2024-01-03 | 55000 | 165000 | 55000 | | 2024-01-04 | 70000 | 235000 | 58750 |

Frame Specification:

  • ROWS BETWEEN 6 PRECEDING AND CURRENT ROW = Last 7 days (6 before + current)
  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW = From start to current (running total)
  • ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING = Previous, current, and next row

Real-World Example: Zomato Top 3 Restaurants per City

Scenario: Find the top 3 highest-rated restaurants in each city.

query.sqlSQL
WITH ranked_restaurants AS (
  SELECT
    city,
    restaurant_name,
    rating,
    total_orders,
    ROW_NUMBER() OVER (
      PARTITION BY city
      ORDER BY rating DESC, total_orders DESC
    ) as city_rank
  FROM restaurants
)
SELECT
  city,
  restaurant_name,
  rating,
  total_orders,
  city_rank
FROM ranked_restaurants
WHERE city_rank <= 3
ORDER BY city, city_rank;

Output: | city | restaurant_name | rating | total_orders | city_rank | |------|----------------|--------|--------------|-----------| | Bangalore | Truffles | 4.5 | 12000 | 1 | | Bangalore | Empire | 4.4 | 15000 | 2 | | Bangalore | MTR | 4.3 | 18000 | 3 | | Delhi | Karim's | 4.6 | 20000 | 1 | | Delhi | Moti Mahal | 4.5 | 16000 | 2 | | Delhi | Bukhara | 4.4 | 10000 | 3 |

How it works:

  1. PARTITION BY city creates separate rankings per city
  2. ORDER BY rating DESC, total_orders DESC ranks by rating (ties broken by order count)
  3. WHERE city_rank <= 3 keeps only top 3 per city
Warning

Common Mistake: You can't filter window functions in the WHERE clause directly. Use a CTE or subquery first, then filter:

query.sqlSQL
-- โŒ This doesn't work:
SELECT *, ROW_NUMBER() OVER (...) as rn
FROM table
WHERE rn = 1  -- Error! Window function in WHERE

-- โœ… This works:
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (...) as rn
  FROM table
) WHERE rn = 1

Performance Tips

  1. Add indexes on PARTITION BY and ORDER BY columns:

    query.sqlSQL
    CREATE INDEX idx_orders_customer_date
    ON orders(customer_id, order_date);
  2. Limit window frame size for large datasets:

    query.sqlSQL
    -- Faster: 7-day window
    SUM(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
    
    -- Slower: Unlimited window on millions of rows
    SUM(sales) OVER (ORDER BY date)
  3. Use CTEs to avoid recalculating window functions:

    query.sqlSQL
    WITH ranked AS (
      SELECT *, ROW_NUMBER() OVER (...) as rn
      FROM orders
    )
    SELECT * FROM ranked WHERE rn <= 10;

Common Use Cases

1. Top N per Group:

query.sqlSQL
-- Top 5 customers by spend per city
ROW_NUMBER() OVER (PARTITION BY city ORDER BY total_spend DESC) <= 5

2. Running Totals:

query.sqlSQL
-- Month-to-date sales
SUM(daily_sales) OVER (PARTITION BY YEAR(date), MONTH(date) ORDER BY date)

3. Previous/Next Comparisons:

query.sqlSQL
-- Growth vs previous month
(current_month_sales - LAG(current_month_sales) OVER (ORDER BY month)) / LAG(current_month_sales) * 100

4. Moving Averages:

query.sqlSQL
-- 30-day moving average
AVG(value) OVER (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)

โš ๏ธ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}

โš ๏ธ FinalQuiz error: Missing or invalid questions array