What is Mean vs Median vs Mode — Which Measure to Use When?

Master mean, median, and mode with real examples. Learn when to use each measure of central tendency, how outliers affect them, and avoid common mistakes in data analysis.

Is Mean vs Median vs Mode — Which Measure to Use When suitable for beginners?

This topic is designed for Beginner level learners. It takes approximately 9 min to complete and includes 10 interactive quizzes to test your understanding.

How long does it take to learn Mean vs Median vs Mode — Which Measure to Use When?

You can complete this topic in about 9 min. The topic is part 42 of undefined in our comprehensive Data Analytics Learning Path.

Mean vs Median vs Mode — When to Use Each | DataPath

📊

Mean, Median, Mode — The Big Three

These three measures all answer the same question: "What's the typical value?" But they define "typical" differently.

Mean (Average)

Definition: Sum of all values divided by count.

Formula: Mean = (x₁ + x₂ + ... + xₙ) / n

Example — Swiggy Delivery Times (5 orders):

Data: 22, 25, 28, 30, 95 minutes

Mean = (22 + 25 + 28 + 30 + 95) / 5
     = 200 / 5
     = 40 minutes

Interpretation: "Average delivery time is 40 minutes"

Problem: One outlier (95 min) inflates the mean. Most deliveries (4 out of 5) were 22-30 minutes, but mean says 40.

Median (Middle Value)

Definition: Middle value when data is sorted. Half of values are above, half below.

How to Calculate:

Sort data (ascending or descending)
Odd count: Pick middle value
Even count: Average the two middle values

Example — Same Swiggy Data:

Data: 22, 25, 28, 30, 95 minutes
Sorted: [22, 25, 28, 30, 95]
         ↑        ↑
      50% below  50% above

Median = 28 minutes (middle value)

Interpretation: "Half of deliveries took ≤28 minutes, half took ≥28 minutes"

Advantage: Outlier (95 min) doesn't affect median. It's 28 min either way.

Mode (Most Frequent)

Definition: Value that appears most often in dataset.

Example — Flipkart Product Ratings:

Data: 5★, 5★, 5★, 4★, 4★, 3★, 1★, 1★

Mode = 5★ (appears 3 times, more than any other rating)

Interpretation: "Most common rating is 5 stars"

Special Cases:

No mode: All values appear equally (e.g., 1, 2, 3, 4, 5 — each appears once)
Bimodal: Two values tie for most frequent (e.g., 1, 1, 2, 3, 3 — mode is 1 AND 3)
Multimodal: Three or more values tie

When it matters: Categorical data (sizes: S, M, L), discrete data (ratings: 1-5), identifying most popular product/category.

Quick Comparison

| Measure | Affected by Outliers? | Best For | Example Use | |---------|----------------------|----------|-------------| | Mean | Yes (highly sensitive) | Symmetric data, no outliers | Test scores, heights, sensor data | | Median | No (resistant) | Skewed data, outliers present | Income, real estate prices, order values | | Mode | No | Categorical data, finding most common | Shoe sizes, product colors, customer segments |

Think of it this way...

Imagine 5 people's salaries: ₹5L, ₹5L, ₹6L, ₹7L, ₹1Cr. Mean salary = ₹24.6L (misleading — only CEO earns this much). Median = ₹6L (typical employee). Mode = ₹5L (most common). Each tells a different story — choose based on what you want to communicate.

🎯

When to Use Each Measure

Choosing the right measure depends on: (1) Data distribution, (2) Presence of outliers, (3) What story you want to tell.

Use Mean When...

✅ Data is symmetric (bell curve, no skew)

Heights of adults: Most near average, few very tall/short (symmetric)
Test scores: Most students near average, few very high/low
Manufacturing measurements: Part dimensions cluster around target

✅ No significant outliers

Daily website traffic: Consistent range (no viral spikes)
Sensor readings: Small natural variation

✅ You need mathematical properties

Mean has algebraic properties (useful in formulas, regression)
Sum of deviations from mean = 0 (useful property)

Example — Zomato Restaurant Ratings:

Ratings: 4.1, 4.2, 4.3, 4.2, 4.4, 4.3, 4.2 (out of 5)

Mean = 4.24 (good summary — data is tightly clustered)

When mean works: Data is consistent, no extreme values, symmetric distribution.

Use Median When...

✅ Data is skewed (long tail on one side)

Income: Most people earn ₹5-10L, few earn crores (right-skewed)
Real estate prices: Most homes ₹50L-₹1Cr, few luxury ₹10Cr+ (right-skewed)
Website load time: Most pages 2s, few very slow (right-skewed)

✅ Outliers are present

E-commerce order values: Most ₹500-₹2,000, occasional ₹50K laptop orders
Delivery times: Most 20-30 min, occasional 2-hour delays (traffic, weather)

✅ You want 'typical' experience

Median represents middle 50% of data (less influenced by extremes)
Better for stakeholder communication: "Half our customers wait ≤25 minutes"

Example — Flipkart Order Values:

Orders: ₹350, ₹480, ₹920, ₹1,200, ₹1,500, ₹50,000 (laptop)

Mean = ₹9,075 (misleading — laptop inflates average)
Median = ₹1,060 (typical order for most customers)

Rule of thumb: If mean >> median (much larger), data is right-skewed → Use median.

Use Mode When...

✅ Categorical data (non-numeric)

Most popular product category: Electronics, Fashion, Home
Most common traffic source: Organic, Paid, Direct, Social
Preferred payment method: UPI, Card, COD

✅ Discrete data with clear peaks

Shoe sizes: Most common size 8 (mode = 8)
Star ratings: Most customers give 5★ (mode = 5)
Number of items per order: Most orders have 1 item (mode = 1)

✅ You want 'most common' value

Mode answers: "What do most people do/choose?"
Inventory planning: Stock more of mode size (best-selling size)

Example — T-Shirt Size Sales:

Sales: S (15), M (40), L (55), XL (30), XXL (10)

Mode = L (sold 55 units — most popular size)

Mean = Not applicable (sizes aren't numeric)
Median = L (if you order S < M < L < XL < XXL, middle is L)

When mode is essential: Non-numeric data (can't calculate mean/median for categories like "Red, Blue, Green").

Decision Tree: Which Measure?

START
 │
 ├─ Is data numeric?
 │   ├─ NO → Use MODE (categorical data)
 │   └─ YES → Continue
 │
 ├─ Are there outliers?
 │   ├─ YES → Use MEDIAN (robust to outliers)
 │   └─ NO → Continue
 │
 ├─ Is data skewed (long tail)?
 │   ├─ YES → Use MEDIAN (better represents typical)
 │   └─ NO → Use MEAN (symmetric distribution)

Info

In practice, report ALL THREE when appropriate. Example dashboard: "Average order value: ₹1,250 (mean), Typical order: ₹950 (median), Most common order: ₹800 (mode)." This gives complete picture of data distribution.

⚠️ CheckpointQuiz error: Missing or invalid options array

📈

Visualizing Mean, Median, Mode

Seeing how mean, median, and mode behave with different distributions clarifies when to use each.

Symmetric Distribution (Normal/Bell Curve)

Shape: Data centered around middle, tapers evenly on both sides.

    Frequency
       │         ╱╲
       │        ╱  ╲
       │       ╱    ╲
       │      ╱      ╲
       │_____╱________╲_____
               │
          Mean = Median = Mode
            (all equal)

Example — Heights of Adult Men:

Data: 165, 168, 170, 172, 170, 173, 175, 172, 170, 168 cm

Mean = 170.3 cm
Median = 170 cm
Mode = 170 cm

All three are nearly equal (symmetric distribution)

When this happens: Natural phenomena (heights, IQ scores, measurement errors), consistent processes (manufacturing).

Takeaway: For symmetric data, mean = median = mode (all valid). Use mean (most common in statistics).

Right-Skewed Distribution (Long Tail Right)

Shape: Most data on left (low values), long tail on right (high values).

    Frequency
       │ ╱╲
       │╱  ╲___
       │      ╲___
       │          ╲___
       │______________╲___
        Mode < Median < Mean
         ↑      ↑       ↑
       Most  Middle   Inflated
      common  value  by outliers

Example — Income Distribution:

Data: ₹4L, ₹5L, ₹5L, ₹6L, ₹7L, ₹8L, ₹10L, ₹15L, ₹50L, ₹1Cr

Mode = ₹5L (most common)
Median = ₹7.5L (middle — half earn less, half more)
Mean = ₹21L (inflated by ₹50L and ₹1Cr outliers)

Rule: Mode < Median < Mean (in right-skewed data)

When this happens: Income, wealth, real estate prices, order values, website load times.

Takeaway: Use median for right-skewed data (represents typical value). Mean overstates reality.

Left-Skewed Distribution (Long Tail Left)

Shape: Most data on right (high values), long tail on left (low values).

    Frequency
       │         ╱╲
       │      ___╱ ╲
       │   ___╱     ╲
       │___╱
       │
        Mean < Median < Mode

Example — Student Test Scores (Easy Exam):

Data: 35, 45, 50, 85, 88, 90, 92, 95, 95, 98

Mode = 95 (most common score)
Median = 87 (middle value)
Mean = 77.3 (pulled down by 35, 45, 50 outliers)

Rule: Mean < Median < Mode (in left-skewed data)

When this happens: Test scores (when most students do well, few fail), age at retirement, product ratings (most 5★, few 1★).

Takeaway: Use median (less affected by low outliers). Mean understates typical performance.

Bimodal Distribution (Two Peaks)

Shape: Two distinct clusters (two modes).

    Frequency
       │  ╱╲       ╱╲
       │ ╱  ╲     ╱  ╲
       │╱    ╲___╱    ╲
       │
       │
        Mode 1  Mean/Median  Mode 2

Example — Website Traffic (Weekday vs Weekend):

Weekday traffic: 5,000-6,000 sessions/day (peak 1)
Weekend traffic: 1,500-2,000 sessions/day (peak 2)

Mode 1 = 5,500 (weekdays — most common high traffic)
Mode 2 = 1,800 (weekends — most common low traffic)
Mean = 4,200 (between two peaks — misleading)
Median = 4,500 (also between peaks — misleading)

Takeaway: Mean/median fall BETWEEN peaks (not representative of either group). Report both modes or segment data ("Weekday avg: 5,500, Weekend avg: 1,800").

🏢

Real-World Examples: Mean vs Median Decisions

Let's see how companies use mean vs median for honest communication and decision-making.

Example 1: Swiggy Delivery Time Promise

Context: Swiggy wants to set customer expectations for delivery time on app.

Data: 100,000 deliveries last month

Mean delivery time: 38 minutes
Median delivery time: 32 minutes
90th percentile: 55 minutes

Analysis:

Mean (38 min) is inflated by long-tail delays (traffic, weather, far locations)
Median (32 min) represents typical delivery (half faster, half slower)
90th percentile (55 min) = worst 10% took ≥55 minutes

Decision: Show median + percentile on app:

"Typical delivery: 30-35 minutes" (median)
"90% of orders delivered within 50 minutes" (90th percentile for cautious estimate)

Why: Median sets realistic expectation for MOST customers. Mean would overpromise (32 min < 38 min).

Example 2: Flipkart Seller Dashboard (Revenue Reporting)

Context: Flipkart shows sellers their "average order value" to help plan inventory/pricing.

Seller's Data (last 1,000 orders):

Mean order value: ₹1,850
Median order value: ₹950
Mode: ₹800 (most common — single-item orders)

Analysis:

Mean (₹1,850) is inflated by occasional high-value orders (multi-item, electronics)
Median (₹950) represents typical single order
Mode (₹800) shows most common order size

Decision: Show ALL THREE on dashboard:

┌─────────────────────────────────────┐
│ Order Value Summary                 │
├─────────────────────────────────────┤
│ Average order:     ₹1,850 (mean)    │
│ Typical order:     ₹950 (median)    │
│ Most common order: ₹800 (mode)      │
└─────────────────────────────────────┘

Why: Sellers need full picture. Mean for revenue forecasting, median for pricing strategy, mode for inventory planning (stock more of ₹800 items).

Example 3: Real Estate Listing (Property Prices)

Context: Real estate website shows "Average home price in Bangalore" on city page.

Data: 5,000 home sales last quarter

Mean: ₹85 lakhs
Median: ₹62 lakhs
Distribution: 70% of homes sold for ₹40L-₹80L, 30% for ₹1Cr-₹5Cr (luxury)

Analysis:

Mean (₹85L) is inflated by luxury properties (₹1Cr-₹5Cr segment)
Median (₹62L) represents typical buyer's budget
Right-skewed distribution (high-end outliers)

Decision: Use median for city-wide summary:

"Median home price: ₹62 lakhs" (more honest for buyers)
Include note: "30% of homes sold above ₹1 crore (luxury segment)"

Why: Median protects buyers from false expectations. Saying "average ₹85L" misleads budget-conscious buyers (most homes are ₹40L-₹80L). Median is industry standard in real estate.

Example 4: Salary Negotiation (Company Offer vs Market Data)

Context: Data analyst receives offer of ₹12 LPA. HR says "Our average analyst salary is ₹14 LPA — this is below average."

You investigate Glassdoor data for company:

Mean salary: ₹14 LPA
Median salary: ₹10 LPA
Distribution: 80% earn ₹8-12 LPA, 20% earn ₹25-40 LPA (senior analysts/managers)

Analysis:

Mean (₹14L) is inflated by senior roles (₹25-40L)
Median (₹10L) represents typical analyst
Your offer (₹12L) is ABOVE median (better than 50% of analysts)

Counter-argument: "Median analyst salary is ₹10 LPA — my ₹12L offer is actually 20% above typical. The ₹14L average includes senior analysts and managers. For entry-level, ₹12L is competitive."

Why: Median prevents misleading comparisons. HR's "below average" claim is technically true but contextually misleading. Median gives fair comparison.

💻

Calculating Mean, Median, Mode in Python/SQL

Here's how to calculate these measures in real analysis workflows.

Python (Pandas)

code.pyPython

import pandas as pd
import numpy as np

# Sample data: E-commerce order values
orders = pd.Series([450, 520, 680, 920, 1100, 1250, 1500, 55000])

# Mean
mean_val = orders.mean()
print(f"Mean: ₹{mean_val:.0f}")  # ₹7,677.5

# Median
median_val = orders.median()
print(f"Median: ₹{median_val:.0f}")  # ₹1,010

# Mode
mode_val = orders.mode()
print(f"Mode: {mode_val.values}")  # [no mode — all unique]

# For dataset with mode
ratings = pd.Series([5, 5, 5, 4, 4, 3, 1, 1])
mode_rating = ratings.mode()[0]  # 5 (most frequent)
print(f"Mode rating: {mode_rating}★")

# Percentiles (bonus: 25th, 50th=median, 75th)
print(orders.quantile([0.25, 0.5, 0.75]))
# 0.25      737.5
# 0.50     1010.0  ← Median
# 0.75     1318.75

# Skewness (positive = right-skewed)
skew = orders.skew()
print(f"Skewness: {skew:.2f}")  # 2.72 (highly right-skewed)

When mean >> median (like here: ₹7,677 vs ₹1,010), data is right-skewed → Use median.

SQL (Most Databases)

query.sqlSQL

-- Mean (AVG built-in)
SELECT AVG(order_value) AS mean_order_value
FROM orders;
-- Result: 7677.5

-- Median (PostgreSQL, BigQuery)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_value) AS median_order_value
FROM orders;
-- Result: 1010

-- Median (MySQL — no built-in, use subquery)
SELECT AVG(order_value) AS median_order_value
FROM (
  SELECT order_value,
         ROW_NUMBER() OVER (ORDER BY order_value) AS rn,
         COUNT(*) OVER () AS cnt
  FROM orders
) sub
WHERE rn IN (FLOOR((cnt+1)/2), CEIL((cnt+1)/2));
-- Result: 1010

-- Mode (most frequent value)
SELECT order_value AS mode_order_value, COUNT(*) AS frequency
FROM orders
GROUP BY order_value
ORDER BY COUNT(*) DESC
LIMIT 1;
-- Returns most common order value

-- Percentiles (25th, 50th, 75th)
SELECT
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_value) AS p25,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_value) AS p50,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_value) AS p75
FROM orders;

Excel (Quick Analysis)

Data in column A (A1:A8)

Mean:   =AVERAGE(A1:A8)
Median: =MEDIAN(A1:A8)
Mode:   =MODE.SNGL(A1:A8)  [single mode]
        =MODE.MULT(A1:A8)  [multiple modes, returns array]

Percentiles:
25th: =QUARTILE(A1:A8, 1)
50th: =QUARTILE(A1:A8, 2)  [same as MEDIAN]
75th: =QUARTILE(A1:A8, 3)
90th: =PERCENTILE(A1:A8, 0.9)

Info

For large datasets (1M+ rows), use SQL or Python (Pandas). Excel slows down with large data. For quick exploration (100K rows), Excel's AVERAGE/MEDIAN functions work fine.

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}