Topic 19 of

SQL GROUP BY and Aggregation Functions

GROUP BY is the command that turns a million rows into a handful of meaningful insights. It's how analysts answer 'which city had the most orders?' and 'what'''s the average delivery time by restaurant?'

📚Beginner
⏱️11 min
7 quizzes
📊

Aggregation Functions — Summarizing Data

Aggregation functions collapse many rows into a single value. They're always used with GROUP BY (or across the entire table if no GROUP BY is specified).

| Function | What it does | Example | |----------|-------------|---------| | COUNT(*) | Counts all rows | Total number of orders | | COUNT(column) | Counts non-NULL values in a column | Orders with a delivery time recorded | | SUM(column) | Adds up all values | Total revenue | | AVG(column) | Average of all values | Average order value | | MIN(column) | Smallest value | Cheapest order | | MAX(column) | Largest value | Most expensive order |

Aggregating the Entire Table

Without GROUP BY, the function applies to all rows at once:

query.sqlSQL
-- How many orders total?
SELECT COUNT(*) AS total_orders FROM orders;

-- What's the total revenue?
SELECT SUM(amount) AS total_revenue FROM orders;

-- What's the average order value?
SELECT AVG(amount) AS avg_order_value FROM orders;

-- What's the range of order values?
SELECT MIN(amount) AS cheapest, MAX(amount) AS most_expensive FROM orders;
Think of it this way...

Imagine you have a jar of 10,000 marbles (rows). COUNT(*) tells you there are 10,000 marbles. AVG(weight) tells you the average weight of a marble. GROUP BY color is like sorting the marbles by color first, then computing the count or average for each color separately.

🗂️

GROUP BY — Summarizing by Category

GROUP BY splits your data into groups based on unique values in one or more columns, then applies the aggregation function to each group separately.

query.sqlSQL
-- How many orders per city?
SELECT
  city,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue,
  AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
ORDER BY total_revenue DESC;

Result:

| city | order_count | total_revenue | avg_order_value | | --------- | ----------- | ------------- | --------------- | | Mumbai | 12,450 | 62,25,000 | 500 | | Bangalore | 10,230 | 48,08,100 | 470 | | Delhi | 9,880 | 41,49,600 | 420 |

Grouping by Multiple Columns

query.sqlSQL
-- Orders and revenue per city per restaurant
SELECT
  city,
  restaurant,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue
FROM orders
GROUP BY city, restaurant
ORDER BY city, total_revenue DESC;
Info

Key Rule: Every column in your SELECT that isn't inside an aggregation function MUST appear in your GROUP BY clause. If you SELECT city and restaurant, both must be in GROUP BY.

⚠️ CheckpointQuiz error: Missing or invalid options array

🔬

HAVING — Filtering Grouped Results

WHERE filters rows before grouping. HAVING filters after grouping — it filters the groups themselves.

query.sqlSQL
-- Cities with more than 10,000 orders
SELECT
  city,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue
FROM orders
GROUP BY city
HAVING COUNT(*) > 10000
ORDER BY order_count DESC;

-- Cities where average order value exceeds ₹600
SELECT
  city,
  AVG(amount) AS avg_order_value
FROM orders
GROUP BY city
HAVING AVG(amount) > 600;

WHERE vs HAVING — The Critical Difference

query.sqlSQL
-- WRONG: Can't use WHERE to filter on an aggregate
SELECT city, COUNT(*) FROM orders
WHERE COUNT(*) > 1000  -- ❌ Error: can't use aggregate in WHERE
GROUP BY city;

-- RIGHT: Use HAVING for aggregate conditions
SELECT city, COUNT(*) FROM orders
GROUP BY city
HAVING COUNT(*) > 1000;  -- ✅

-- Combining WHERE and HAVING:
-- Filter rows first (WHERE), then group, then filter groups (HAVING)
SELECT city, COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2026-01-01'  -- filter rows first
GROUP BY city
HAVING COUNT(*) > 500;            -- then filter groups
Think of it this way...

A Zomato analyst wants to find restaurants with more than 100 orders AND an average rating above 4.2. WHERE can't do this — it runs before grouping. HAVING is the tool: GROUP BY restaurant_name HAVING COUNT(*) > 100 AND AVG(rating) > 4.2.

⚠️

COUNT(*) vs COUNT(column) — The Difference Matters

query.sqlSQL
SELECT
  COUNT(*) AS all_rows,
  COUNT(delivery_time) AS rows_with_delivery_time,
  COUNT(*) - COUNT(delivery_time) AS rows_missing_delivery_time
FROM orders;
  • COUNT(*) counts every row, including those with NULLs
  • COUNT(column) counts only rows where the column is NOT NULL
  • This difference is crucial for data quality checks

COUNT DISTINCT

query.sqlSQL
-- How many unique customers placed orders this month?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2026-03-01';

-- Average order count per customer
SELECT AVG(orders_per_customer) FROM (
  SELECT customer_id, COUNT(*) AS orders_per_customer
  FROM orders
  GROUP BY customer_id
) sub;

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}