What is Cohort Analysis?
Cohort analysis groups users who share a common characteristic (like signup month) and tracks their behavior over time to measure retention, engagement, or revenue patterns.
How It Works
1. Define Cohort (Group users by shared trait):
Cohort = "Users who signed up in January 2026"
→ 10,000 users joined Jan 1-31
2. Track Behavior Over Time:
Month 0 (Jan): 10,000 users signed up (100% present)
Month 1 (Feb): 4,500 users returned (45% retention)
Month 2 (Mar): 3,200 users returned (32% retention)
Month 3 (Apr): 2,700 users returned (27% retention)
3. Repeat for Each Cohort:
Jan cohort: 45% → 32% → 27% retention
Feb cohort: 48% → 35% → 30% retention (better!)
Mar cohort: 52% → 38% → 33% retention (even better!)
Insight: Retention improving over time (product getting stickier)
Why Cohort Analysis Matters
Without Cohorts (Overall metrics):
Overall active users: 50,000 in Jan → 55,000 in Feb → 60,000 in Mar
Conclusion: "Growth is good!" ✓
Reality check with cohorts:
- Jan cohort: 10K → 4.5K → 3.2K (losing 68% of users)
- New signups masking churn (acquiring 15K/month, churning 8K/month)
→ Leaky bucket problem (can't sustain growth)
With Cohorts (Time-based groups):
Cohort retention analysis:
Jan cohort: 45% Month 1 retention
Feb cohort: 48% Month 1 retention (after product fix)
Mar cohort: 52% Month 1 retention (fix worked!)
Insight: Product improvements increased retention 7% (45% → 52%)
→ Cohorts prove causality (improvement worked)
Benefits:
- Measure product improvements — Did feature X increase retention? (compare cohorts before/after launch)
- Predict LTV — If Month 1 retention is 50%, Month 6 is ~25% (extrapolate)
- Identify churn patterns — When do users drop off? (Week 1? Month 3?)
- Segment analysis — Do paid users retain better than free users?
- Budget allocation — High-retention cohorts → Invest more in similar channels
Real Example: Flipkart Cohort Analysis
Goal: Measure retention of users acquired from different marketing channels.
Cohorts: Group users by acquisition month + channel:
Jan 2026 — Google Ads: 5,000 users
Jan 2026 — Facebook Ads: 3,000 users
Jan 2026 — Organic: 2,000 users
Retention Tracking (% who made ≥1 purchase in each subsequent month):
Channel | Month 0 | Month 1 | Month 2 | Month 3 | LTV (3 months)
Google Ads | 100% | 35% | 22% | 18% | ₹1,250
Facebook Ads | 100% | 42% | 28% | 23% | ₹1,680
Organic | 100% | 58% | 45% | 38% | ₹2,450
Insights:
- Organic users retain 2× better than paid ads (58% vs 35-42% Month 1)
- Facebook > Google for retention (42% vs 35% Month 1)
- Organic LTV is 2× paid (₹2,450 vs ₹1,250-1,680)
Action:
- Reallocate budget to Facebook (better ROI than Google)
- Invest in organic channels (SEO, referrals) — higher LTV
- Study organic users: Why do they stick around? (replicate behavior for paid users)
Cohort analysis is like tracking graduating classes in school. You group students by year (Class of 2020, 2021, 2022), then track: How many got jobs within 6 months? 1 year? 5 years? This reveals if career services improved over time (2022 cohort better outcomes than 2020 = improvement).
Types of Cohorts
1. Time-Based Cohorts (Most Common)
Group users by when they first interacted with your product.
Examples:
- Signup month cohort: Users who signed up in Jan 2026 vs Feb 2026
- First purchase cohort: Users who made first purchase in Q1 vs Q2
- App install cohort: Users who installed app in Week 1 vs Week 2
Use Case: Measure retention, track product improvements over time.
SQL Example (Signup month cohorts):
SELECT
DATE_TRUNC('month', signup_date) AS cohort_month,
COUNT(DISTINCT user_id) AS cohort_size
FROM users
WHERE signup_date >= '2026-01-01'
GROUP BY cohort_month
ORDER BY cohort_month;2. Behavior-Based Cohorts
Group users by what they did (action or event).
Examples:
- Feature adoption cohort: Users who used feature X vs users who didn't
- Purchase frequency cohort: Users who bought 1× vs 5+ times in first month
- Engagement level cohort: Users with high engagement (10+ sessions/week) vs low (1-2 sessions)
Use Case: Segment users by behavior to identify power users vs casual users.
SQL Example (Purchase frequency cohorts):
WITH first_month_purchases AS (
SELECT
user_id,
COUNT(*) AS purchase_count_month_0
FROM orders
WHERE order_date BETWEEN signup_date AND signup_date + INTERVAL '30 days'
GROUP BY user_id
)
SELECT
CASE
WHEN purchase_count_month_0 = 1 THEN '1 purchase'
WHEN purchase_count_month_0 BETWEEN 2 AND 4 THEN '2-4 purchases'
WHEN purchase_count_month_0 >= 5 THEN '5+ purchases'
END AS cohort,
COUNT(*) AS users
FROM first_month_purchases
GROUP BY cohort;3. Acquisition Channel Cohorts
Group users by how they found your product.
Examples:
- Google Ads vs Facebook Ads vs Organic
- Referral vs Direct vs Email campaign
- iOS app vs Android app vs Web
Use Case: Compare retention/LTV across channels to optimize marketing spend.
SQL Example:
SELECT
acquisition_channel,
COUNT(DISTINCT user_id) AS cohort_size,
AVG(lifetime_value) AS avg_ltv
FROM users
WHERE signup_date >= '2026-01-01'
GROUP BY acquisition_channel
ORDER BY avg_ltv DESC;4. Demographic Cohorts
Group users by who they are (attributes).
Examples:
- Age groups (18-24, 25-34, 35-44)
- Geography (Tier 1 cities vs Tier 2 vs Tier 3)
- Device type (iOS vs Android)
Use Case: Personalize product for different user segments.
Example — Zomato Geography Cohorts:
Tier 1 cities (Mumbai, Delhi, Bangalore):
- 30-day retention: 55%
- Avg order frequency: 6 orders/month
Tier 2 cities (Pune, Jaipur, Chandigarh):
- 30-day retention: 48%
- Avg order frequency: 4 orders/month
Insight: Tier 1 users order 50% more often (6 vs 4) → Focus retention efforts on Tier 2
⚠️ CheckpointQuiz error: Missing or invalid options array
Building Retention Cohort Tables
Retention cohort table shows % of each cohort that returned in subsequent time periods.
Structure
| Cohort | Month 0 | Month 1 | Month 2 | Month 3 | Month 6 | Month 12 | |--------|---------|---------|---------|---------|---------|----------| | Jan 2026 | 100% | 45% | 32% | 27% | 18% | 12% | | Feb 2026 | 100% | 48% | 35% | 30% | 21% | - | | Mar 2026 | 100% | 52% | 38% | 33% | - | - | | Apr 2026 | 100% | 55% | 42% | - | - | - |
Reading the table:
- Jan 2026 cohort: 45% returned in Month 1, 32% in Month 2, etc.
- Diagonal pattern: Newer cohorts have less time elapsed (can't measure Month 12 for Apr cohort yet)
- Improving trend: Month 1 retention increased from 45% → 55% (product getting stickier)
SQL Query — Retention Cohort Table
Schema:
users (user_id, signup_date)
events (user_id, event_date, event_type)Query (Monthly retention cohorts):
WITH cohorts AS (
-- Define cohort (signup month)
SELECT
user_id,
DATE_TRUNC('month', signup_date) AS cohort_month
FROM users
),
user_activity AS (
-- Get user activity by month
SELECT
user_id,
DATE_TRUNC('month', event_date) AS activity_month
FROM events
WHERE event_type = 'purchase' -- or 'login', 'session_start', etc.
GROUP BY user_id, activity_month
),
cohort_activity AS (
-- Join cohorts with activity, calculate months since signup
SELECT
c.cohort_month,
c.user_id,
a.activity_month,
DATEDIFF('month', c.cohort_month, a.activity_month) AS months_since_signup
FROM cohorts c
LEFT JOIN user_activity a ON c.user_id = a.user_id
),
cohort_sizes AS (
-- Count cohort size (Month 0)
SELECT
cohort_month,
COUNT(DISTINCT user_id) AS cohort_size
FROM cohorts
GROUP BY cohort_month
),
retention_by_month AS (
-- Count active users per cohort per month
SELECT
cohort_month,
months_since_signup,
COUNT(DISTINCT user_id) AS active_users
FROM cohort_activity
WHERE activity_month IS NOT NULL
GROUP BY cohort_month, months_since_signup
)
-- Calculate retention rate
SELECT
r.cohort_month,
r.months_since_signup,
r.active_users,
cs.cohort_size,
ROUND(100.0 * r.active_users / cs.cohort_size, 1) AS retention_pct
FROM retention_by_month r
JOIN cohort_sizes cs ON r.cohort_month = cs.cohort_month
ORDER BY r.cohort_month, r.months_since_signup;Output:
cohort_month | months_since_signup | active_users | cohort_size | retention_pct
2026-01-01 | 0 | 10000 | 10000 | 100.0
2026-01-01 | 1 | 4500 | 10000 | 45.0
2026-01-01 | 2 | 3200 | 10000 | 32.0
2026-02-01 | 0 | 12000 | 12000 | 100.0
2026-02-01 | 1 | 5760 | 12000 | 48.0
...
Python Visualization — Retention Heatmap
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load SQL results
df = pd.read_sql(query, connection)
# Pivot to cohort table format
cohort_table = df.pivot_table(
index='cohort_month',
columns='months_since_signup',
values='retention_pct'
)
# Heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(
cohort_table,
annot=True, # Show percentages in cells
fmt='.0f', # Format as integers
cmap='RdYlGn', # Red (low) to Green (high)
vmin=0,
vmax=100,
cbar_kws={'label': 'Retention %'}
)
plt.title('Monthly Retention Cohort Analysis')
plt.xlabel('Months Since Signup')
plt.ylabel('Cohort (Signup Month)')
plt.tight_layout()
plt.show()Output: Heatmap with green cells (high retention) and red cells (low retention).
Interpreting Retention Curves
Healthy Retention Curve (Logarithmic decay):
Month 0: 100%
Month 1: 40-60% (sharp drop — expected)
Month 2: 30-45% (slower drop)
Month 3: 25-40% (flattening)
Month 6: 20-35% (stable)
Month 12: 15-30% (plateau)
Pattern: Sharp initial drop, then flatten (users who stay past Month 3 are sticky)
Unhealthy Retention Curve (Linear decay):
Month 0: 100%
Month 1: 30%
Month 2: 20%
Month 3: 10%
Month 6: 5%
Month 12: 2%
Pattern: Continuous linear decay (never plateaus) — leaky bucket, no product-market fit
Key Metric: If Month 6 retention ≥ 20%, you have product-market fit. If < 10%, users aren't finding long-term value (high churn).
Advanced Cohort Analysis Techniques
1. Cumulative Cohort Revenue (LTV Projection)
Track cumulative revenue per cohort to project lifetime value.
SQL Query:
WITH cohort_revenue AS (
SELECT
DATE_TRUNC('month', u.signup_date) AS cohort_month,
DATEDIFF('month', DATE_TRUNC('month', u.signup_date),
DATE_TRUNC('month', o.order_date)) AS months_since_signup,
SUM(o.order_value) AS revenue
FROM users u
JOIN orders o ON u.user_id = o.user_id
GROUP BY cohort_month, months_since_signup
),
cumulative AS (
SELECT
cohort_month,
months_since_signup,
SUM(revenue) OVER (
PARTITION BY cohort_month
ORDER BY months_since_signup
) AS cumulative_revenue
FROM cohort_revenue
)
SELECT
cohort_month,
months_since_signup,
cumulative_revenue /
(SELECT COUNT(*) FROM users WHERE DATE_TRUNC('month', signup_date) = cohort_month)
AS revenue_per_user
FROM cumulative;Output:
Jan 2026 cohort:
Month 0: ₹450 per user
Month 1: ₹780 per user (cumulative)
Month 2: ₹980 per user
Month 6: ₹1,650 per user
→ Projected 12-month LTV: ~₹2,000
Use Case: If Month 6 LTV is ₹1,650 and CAC is ₹1,200, LTV/CAC = 1.4 (marginal). Need to improve retention or reduce CAC.
2. Cohort Churn Analysis
Measure when users churn (time to churn).
SQL Query (Days to last activity):
WITH last_activity AS (
SELECT
user_id,
MAX(event_date) AS last_active_date
FROM events
GROUP BY user_id
),
churn_time AS (
SELECT
u.user_id,
DATE_TRUNC('month', u.signup_date) AS cohort_month,
DATEDIFF('day', u.signup_date, la.last_active_date) AS days_to_churn
FROM users u
JOIN last_activity la ON u.user_id = la.user_id
WHERE la.last_active_date < CURRENT_DATE - INTERVAL '30 days' -- Inactive for 30+ days = churned
)
SELECT
cohort_month,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY days_to_churn) AS median_days_to_churn,
AVG(days_to_churn) AS avg_days_to_churn
FROM churn_time
GROUP BY cohort_month
ORDER BY cohort_month;Output:
Jan cohort: Median churn at 18 days, Avg 35 days
Feb cohort: Median churn at 24 days, Avg 42 days (improving!)
Insight: Product fixes delayed churn by 6 days (18 → 24 median)
3. Segmented Cohort Analysis
Compare cohorts across segments (channel, device, plan tier).
Example — Swiggy: Paid vs Free Delivery Pass Users:
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', signup_date) AS cohort_month,
CASE WHEN has_delivery_pass THEN 'Paid Pass' ELSE 'Free' END AS segment
FROM users
),
retention AS (
-- (Same retention logic as before, grouped by segment)
...
)
SELECT
cohort_month,
segment,
months_since_signup,
retention_pct
FROM retention
ORDER BY cohort_month, segment, months_since_signup;Output:
Paid Pass users: 75% Month 1 retention, 60% Month 3
Free users: 35% Month 1 retention, 20% Month 3
Insight: Paid Pass users retain 2× better (invest in converting free → paid)
4. Cohort A/B Testing
Compare cohorts exposed to product changes vs control.
Example — PhonePe: New Onboarding Flow:
Cohorts:
- Control (Jan 1-15): Old onboarding (10K users)
- Treatment (Jan 16-31): New onboarding (10K users)
Results:
Control: 40% Month 1 retention
Treatment: 48% Month 1 retention
Conclusion: New onboarding increased retention 20% (40% → 48%) — Deploy to all users
⚠️ FinalQuiz error: Missing or invalid questions array
⚠️ SummarySection error: Missing or invalid items array
Received: {"hasItems":false,"isArray":false}