Mean, Median, Mode — The Big Three
These three measures all answer the same question: "What's the typical value?" But they define "typical" differently.
Mean (Average)
Definition: Sum of all values divided by count.
Formula: Mean = (x₁ + x₂ + ... + xₙ) / n
Example — Swiggy Delivery Times (5 orders):
Data: 22, 25, 28, 30, 95 minutes
Mean = (22 + 25 + 28 + 30 + 95) / 5
= 200 / 5
= 40 minutes
Interpretation: "Average delivery time is 40 minutes"
Problem: One outlier (95 min) inflates the mean. Most deliveries (4 out of 5) were 22-30 minutes, but mean says 40.
Median (Middle Value)
Definition: Middle value when data is sorted. Half of values are above, half below.
How to Calculate:
- Sort data (ascending or descending)
- Odd count: Pick middle value
- Even count: Average the two middle values
Example — Same Swiggy Data:
Data: 22, 25, 28, 30, 95 minutes
Sorted: [22, 25, 28, 30, 95]
↑ ↑
50% below 50% above
Median = 28 minutes (middle value)
Interpretation: "Half of deliveries took ≤28 minutes, half took ≥28 minutes"
Advantage: Outlier (95 min) doesn't affect median. It's 28 min either way.
Mode (Most Frequent)
Definition: Value that appears most often in dataset.
Example — Flipkart Product Ratings:
Data: 5★, 5★, 5★, 4★, 4★, 3★, 1★, 1★
Mode = 5★ (appears 3 times, more than any other rating)
Interpretation: "Most common rating is 5 stars"
Special Cases:
- No mode: All values appear equally (e.g., 1, 2, 3, 4, 5 — each appears once)
- Bimodal: Two values tie for most frequent (e.g., 1, 1, 2, 3, 3 — mode is 1 AND 3)
- Multimodal: Three or more values tie
When it matters: Categorical data (sizes: S, M, L), discrete data (ratings: 1-5), identifying most popular product/category.
Quick Comparison
| Measure | Affected by Outliers? | Best For | Example Use | |---------|----------------------|----------|-------------| | Mean | Yes (highly sensitive) | Symmetric data, no outliers | Test scores, heights, sensor data | | Median | No (resistant) | Skewed data, outliers present | Income, real estate prices, order values | | Mode | No | Categorical data, finding most common | Shoe sizes, product colors, customer segments |
Imagine 5 people's salaries: ₹5L, ₹5L, ₹6L, ₹7L, ₹1Cr. Mean salary = ₹24.6L (misleading — only CEO earns this much). Median = ₹6L (typical employee). Mode = ₹5L (most common). Each tells a different story — choose based on what you want to communicate.
When to Use Each Measure
Choosing the right measure depends on: (1) Data distribution, (2) Presence of outliers, (3) What story you want to tell.
Use Mean When...
✅ Data is symmetric (bell curve, no skew)
- Heights of adults: Most near average, few very tall/short (symmetric)
- Test scores: Most students near average, few very high/low
- Manufacturing measurements: Part dimensions cluster around target
✅ No significant outliers
- Daily website traffic: Consistent range (no viral spikes)
- Sensor readings: Small natural variation
✅ You need mathematical properties
- Mean has algebraic properties (useful in formulas, regression)
- Sum of deviations from mean = 0 (useful property)
Example — Zomato Restaurant Ratings:
Ratings: 4.1, 4.2, 4.3, 4.2, 4.4, 4.3, 4.2 (out of 5)
Mean = 4.24 (good summary — data is tightly clustered)
When mean works: Data is consistent, no extreme values, symmetric distribution.
Use Median When...
✅ Data is skewed (long tail on one side)
- Income: Most people earn ₹5-10L, few earn crores (right-skewed)
- Real estate prices: Most homes ₹50L-₹1Cr, few luxury ₹10Cr+ (right-skewed)
- Website load time: Most pages 2s, few very slow (right-skewed)
✅ Outliers are present
- E-commerce order values: Most ₹500-₹2,000, occasional ₹50K laptop orders
- Delivery times: Most 20-30 min, occasional 2-hour delays (traffic, weather)
✅ You want 'typical' experience
- Median represents middle 50% of data (less influenced by extremes)
- Better for stakeholder communication: "Half our customers wait ≤25 minutes"
Example — Flipkart Order Values:
Orders: ₹350, ₹480, ₹920, ₹1,200, ₹1,500, ₹50,000 (laptop)
Mean = ₹9,075 (misleading — laptop inflates average)
Median = ₹1,060 (typical order for most customers)
Rule of thumb: If mean >> median (much larger), data is right-skewed → Use median.
Use Mode When...
✅ Categorical data (non-numeric)
- Most popular product category: Electronics, Fashion, Home
- Most common traffic source: Organic, Paid, Direct, Social
- Preferred payment method: UPI, Card, COD
✅ Discrete data with clear peaks
- Shoe sizes: Most common size 8 (mode = 8)
- Star ratings: Most customers give 5★ (mode = 5)
- Number of items per order: Most orders have 1 item (mode = 1)
✅ You want 'most common' value
- Mode answers: "What do most people do/choose?"
- Inventory planning: Stock more of mode size (best-selling size)
Example — T-Shirt Size Sales:
Sales: S (15), M (40), L (55), XL (30), XXL (10)
Mode = L (sold 55 units — most popular size)
Mean = Not applicable (sizes aren't numeric)
Median = L (if you order S < M < L < XL < XXL, middle is L)
When mode is essential: Non-numeric data (can't calculate mean/median for categories like "Red, Blue, Green").
Decision Tree: Which Measure?
START
│
├─ Is data numeric?
│ ├─ NO → Use MODE (categorical data)
│ └─ YES → Continue
│
├─ Are there outliers?
│ ├─ YES → Use MEDIAN (robust to outliers)
│ └─ NO → Continue
│
├─ Is data skewed (long tail)?
│ ├─ YES → Use MEDIAN (better represents typical)
│ └─ NO → Use MEAN (symmetric distribution)
In practice, report ALL THREE when appropriate. Example dashboard: "Average order value: ₹1,250 (mean), Typical order: ₹950 (median), Most common order: ₹800 (mode)." This gives complete picture of data distribution.
⚠️ CheckpointQuiz error: Missing or invalid options array
Visualizing Mean, Median, Mode
Seeing how mean, median, and mode behave with different distributions clarifies when to use each.
Symmetric Distribution (Normal/Bell Curve)
Shape: Data centered around middle, tapers evenly on both sides.
Frequency
│ ╱╲
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│_____╱________╲_____
│
Mean = Median = Mode
(all equal)
Example — Heights of Adult Men:
Data: 165, 168, 170, 172, 170, 173, 175, 172, 170, 168 cm
Mean = 170.3 cm
Median = 170 cm
Mode = 170 cm
All three are nearly equal (symmetric distribution)
When this happens: Natural phenomena (heights, IQ scores, measurement errors), consistent processes (manufacturing).
Takeaway: For symmetric data, mean = median = mode (all valid). Use mean (most common in statistics).
Right-Skewed Distribution (Long Tail Right)
Shape: Most data on left (low values), long tail on right (high values).
Frequency
│ ╱╲
│╱ ╲___
│ ╲___
│ ╲___
│______________╲___
Mode < Median < Mean
↑ ↑ ↑
Most Middle Inflated
common value by outliers
Example — Income Distribution:
Data: ₹4L, ₹5L, ₹5L, ₹6L, ₹7L, ₹8L, ₹10L, ₹15L, ₹50L, ₹1Cr
Mode = ₹5L (most common)
Median = ₹7.5L (middle — half earn less, half more)
Mean = ₹21L (inflated by ₹50L and ₹1Cr outliers)
Rule: Mode < Median < Mean (in right-skewed data)
When this happens: Income, wealth, real estate prices, order values, website load times.
Takeaway: Use median for right-skewed data (represents typical value). Mean overstates reality.
Left-Skewed Distribution (Long Tail Left)
Shape: Most data on right (high values), long tail on left (low values).
Frequency
│ ╱╲
│ ___╱ ╲
│ ___╱ ╲
│___╱
│
Mean < Median < Mode
Example — Student Test Scores (Easy Exam):
Data: 35, 45, 50, 85, 88, 90, 92, 95, 95, 98
Mode = 95 (most common score)
Median = 87 (middle value)
Mean = 77.3 (pulled down by 35, 45, 50 outliers)
Rule: Mean < Median < Mode (in left-skewed data)
When this happens: Test scores (when most students do well, few fail), age at retirement, product ratings (most 5★, few 1★).
Takeaway: Use median (less affected by low outliers). Mean understates typical performance.
Bimodal Distribution (Two Peaks)
Shape: Two distinct clusters (two modes).
Frequency
│ ╱╲ ╱╲
│ ╱ ╲ ╱ ╲
│╱ ╲___╱ ╲
│
│
Mode 1 Mean/Median Mode 2
Example — Website Traffic (Weekday vs Weekend):
Weekday traffic: 5,000-6,000 sessions/day (peak 1)
Weekend traffic: 1,500-2,000 sessions/day (peak 2)
Mode 1 = 5,500 (weekdays — most common high traffic)
Mode 2 = 1,800 (weekends — most common low traffic)
Mean = 4,200 (between two peaks — misleading)
Median = 4,500 (also between peaks — misleading)
Takeaway: Mean/median fall BETWEEN peaks (not representative of either group). Report both modes or segment data ("Weekday avg: 5,500, Weekend avg: 1,800").
Real-World Examples: Mean vs Median Decisions
Let's see how companies use mean vs median for honest communication and decision-making.
Example 1: Swiggy Delivery Time Promise
Context: Swiggy wants to set customer expectations for delivery time on app.
Data: 100,000 deliveries last month
Mean delivery time: 38 minutes
Median delivery time: 32 minutes
90th percentile: 55 minutes
Analysis:
- Mean (38 min) is inflated by long-tail delays (traffic, weather, far locations)
- Median (32 min) represents typical delivery (half faster, half slower)
- 90th percentile (55 min) = worst 10% took ≥55 minutes
Decision: Show median + percentile on app:
- "Typical delivery: 30-35 minutes" (median)
- "90% of orders delivered within 50 minutes" (90th percentile for cautious estimate)
Why: Median sets realistic expectation for MOST customers. Mean would overpromise (32 min < 38 min).
Example 2: Flipkart Seller Dashboard (Revenue Reporting)
Context: Flipkart shows sellers their "average order value" to help plan inventory/pricing.
Seller's Data (last 1,000 orders):
Mean order value: ₹1,850
Median order value: ₹950
Mode: ₹800 (most common — single-item orders)
Analysis:
- Mean (₹1,850) is inflated by occasional high-value orders (multi-item, electronics)
- Median (₹950) represents typical single order
- Mode (₹800) shows most common order size
Decision: Show ALL THREE on dashboard:
┌─────────────────────────────────────┐
│ Order Value Summary │
├─────────────────────────────────────┤
│ Average order: ₹1,850 (mean) │
│ Typical order: ₹950 (median) │
│ Most common order: ₹800 (mode) │
└─────────────────────────────────────┘
Why: Sellers need full picture. Mean for revenue forecasting, median for pricing strategy, mode for inventory planning (stock more of ₹800 items).
Example 3: Real Estate Listing (Property Prices)
Context: Real estate website shows "Average home price in Bangalore" on city page.
Data: 5,000 home sales last quarter
Mean: ₹85 lakhs
Median: ₹62 lakhs
Distribution: 70% of homes sold for ₹40L-₹80L, 30% for ₹1Cr-₹5Cr (luxury)
Analysis:
- Mean (₹85L) is inflated by luxury properties (₹1Cr-₹5Cr segment)
- Median (₹62L) represents typical buyer's budget
- Right-skewed distribution (high-end outliers)
Decision: Use median for city-wide summary:
- "Median home price: ₹62 lakhs" (more honest for buyers)
- Include note: "30% of homes sold above ₹1 crore (luxury segment)"
Why: Median protects buyers from false expectations. Saying "average ₹85L" misleads budget-conscious buyers (most homes are ₹40L-₹80L). Median is industry standard in real estate.
Example 4: Salary Negotiation (Company Offer vs Market Data)
Context: Data analyst receives offer of ₹12 LPA. HR says "Our average analyst salary is ₹14 LPA — this is below average."
You investigate Glassdoor data for company:
Mean salary: ₹14 LPA
Median salary: ₹10 LPA
Distribution: 80% earn ₹8-12 LPA, 20% earn ₹25-40 LPA (senior analysts/managers)
Analysis:
- Mean (₹14L) is inflated by senior roles (₹25-40L)
- Median (₹10L) represents typical analyst
- Your offer (₹12L) is ABOVE median (better than 50% of analysts)
Counter-argument: "Median analyst salary is ₹10 LPA — my ₹12L offer is actually 20% above typical. The ₹14L average includes senior analysts and managers. For entry-level, ₹12L is competitive."
Why: Median prevents misleading comparisons. HR's "below average" claim is technically true but contextually misleading. Median gives fair comparison.
Calculating Mean, Median, Mode in Python/SQL
Here's how to calculate these measures in real analysis workflows.
Python (Pandas)
import pandas as pd
import numpy as np
# Sample data: E-commerce order values
orders = pd.Series([450, 520, 680, 920, 1100, 1250, 1500, 55000])
# Mean
mean_val = orders.mean()
print(f"Mean: ₹{mean_val:.0f}") # ₹7,677.5
# Median
median_val = orders.median()
print(f"Median: ₹{median_val:.0f}") # ₹1,010
# Mode
mode_val = orders.mode()
print(f"Mode: {mode_val.values}") # [no mode — all unique]
# For dataset with mode
ratings = pd.Series([5, 5, 5, 4, 4, 3, 1, 1])
mode_rating = ratings.mode()[0] # 5 (most frequent)
print(f"Mode rating: {mode_rating}★")
# Percentiles (bonus: 25th, 50th=median, 75th)
print(orders.quantile([0.25, 0.5, 0.75]))
# 0.25 737.5
# 0.50 1010.0 ← Median
# 0.75 1318.75
# Skewness (positive = right-skewed)
skew = orders.skew()
print(f"Skewness: {skew:.2f}") # 2.72 (highly right-skewed)When mean >> median (like here: ₹7,677 vs ₹1,010), data is right-skewed → Use median.
SQL (Most Databases)
-- Mean (AVG built-in)
SELECT AVG(order_value) AS mean_order_value
FROM orders;
-- Result: 7677.5
-- Median (PostgreSQL, BigQuery)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_value) AS median_order_value
FROM orders;
-- Result: 1010
-- Median (MySQL — no built-in, use subquery)
SELECT AVG(order_value) AS median_order_value
FROM (
SELECT order_value,
ROW_NUMBER() OVER (ORDER BY order_value) AS rn,
COUNT(*) OVER () AS cnt
FROM orders
) sub
WHERE rn IN (FLOOR((cnt+1)/2), CEIL((cnt+1)/2));
-- Result: 1010
-- Mode (most frequent value)
SELECT order_value AS mode_order_value, COUNT(*) AS frequency
FROM orders
GROUP BY order_value
ORDER BY COUNT(*) DESC
LIMIT 1;
-- Returns most common order value
-- Percentiles (25th, 50th, 75th)
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_value) AS p25,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_value) AS p50,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_value) AS p75
FROM orders;Excel (Quick Analysis)
Data in column A (A1:A8)
Mean: =AVERAGE(A1:A8)
Median: =MEDIAN(A1:A8)
Mode: =MODE.SNGL(A1:A8) [single mode]
=MODE.MULT(A1:A8) [multiple modes, returns array]
Percentiles:
25th: =QUARTILE(A1:A8, 1)
50th: =QUARTILE(A1:A8, 2) [same as MEDIAN]
75th: =QUARTILE(A1:A8, 3)
90th: =PERCENTILE(A1:A8, 0.9)
For large datasets (1M+ rows), use SQL or Python (Pandas). Excel slows down with large data. For quick exploration (100K rows), Excel's AVERAGE/MEDIAN functions work fine.
⚠️ FinalQuiz error: Missing or invalid questions array
⚠️ SummarySection error: Missing or invalid items array
Received: {"hasItems":false,"isArray":false}