Topic 71 of

Zomato Data Analytics: How Food-Tech Uses Data to Drive Growth

Zomato serves 80 million monthly active users across 1,000+ cities, partnering with 250,000+ restaurants. Behind every restaurant recommendation, delivery ETA, and Gold membership offer is sophisticated analytics driving discovery, retention, and profitability.

📚Intermediate
⏱️11 min
10 quizzes
🏢

Zomato: Company Context

Zomato started in 2008 as a restaurant discovery platform (menu listings, reviews) and evolved into India's leading food delivery marketplace. After merging with Blinkit (quick commerce) and expanding into dining-out services, Zomato operates across the entire food ecosystem.

Key Metrics (2026)

  • 80+ million monthly active users (MAU)
  • 250,000+ restaurant partners across 1,000+ cities
  • 180,000+ delivery partners
  • 1 million+ orders/day (365M annually)
  • ₹9,000+ crore annual revenue (2025)
  • Zomato Gold: 5M+ paid members (dining + delivery subscriptions)

Data Infrastructure

Zomato's analytics runs on:

  • User behavior tracking: Clickstream data capturing every search, filter, restaurant view, order
  • Geospatial database: Real-time location tracking of delivery partners + restaurant proximity
  • ML platform: Recommendation engine, delivery time prediction, churn prediction models
  • A/B testing framework: 50+ experiments running on app UI, pricing, notifications
  • Operational dashboards: Real-time monitoring of order flow, delivery SLAs, restaurant performance

Analytics Team Structure

  • Growth Analytics: User acquisition, activation, retention, monetization (AARRR funnel)
  • Restaurant Analytics: Partner onboarding, menu optimization, demand forecasting for restaurants
  • Delivery Analytics: Route optimization, ETA prediction, delivery partner allocation
  • Product Analytics: Feature adoption, app engagement, search relevance
  • Finance Analytics: Unit economics, profitability by city/restaurant, CAC/LTV modeling
Think of it this way...

Zomato's analytics system is like a matchmaking platform for food — understanding what you crave (Mexican at 2 PM vs comfort food at 11 PM), which restaurants can fulfill it fastest, and how to keep you coming back (personalized offers, loyalty rewards) — all optimized through millions of data points collected daily.

🎯

The Business Problems

Zomato faces three core analytics challenges:

1. Restaurant Discovery: Paradox of Choice

Problem: With 250K+ restaurants, users get overwhelmed and abandon the app without ordering.

Challenge:

  • Search ambiguity: User searches "pizza" → 5,000 results in Bangalore (which to show first?)
  • Preference diversity: Same user orders sushi (healthy) and biryani (comfort food) on different days
  • Context matters: Lunch searches prioritize speed (fast delivery), dinner prioritizes quality (ratings)
  • Cold start: New users have no order history (what to recommend?)

Traditional approach: Rank restaurants by ratings + popularity → Result: 40% of users don't find what they want (high bounce rate)

Data-driven approach: Personalized ranking using collaborative filtering + contextual signals → Result: 25% bounce rate (38% improvement) + 3.2× higher order conversion


2. Delivery Time Accuracy: The 30-Minute Promise

Problem: Inaccurate ETAs lead to refunds, bad reviews, and customer churn.

Challenge:

  • Restaurant delays: Food prep time varies (15-45 min depending on kitchen load)
  • Traffic variability: Same route takes 8 min (midnight) vs 25 min (rush hour)
  • Weather impact: Rain increases delivery time by 35-40%
  • Last-mile complexity: Apartment complex deliveries take 5-10 min longer (parking, security, finding flat)

Traditional approach: Fixed 30-40 min ETA for all orders → Result: 20% late deliveries (refund costs ₹25 crore/year)

Data-driven approach: ML-powered ETA prediction with real-time adjustments → Result: 8% late deliveries (60% reduction) + ±3 min accuracy


3. Customer Retention: High Churn in Food Delivery

Problem: Only 25% of first-time users place a second order within 30 days (75% churn).

Typical customer journey:

App Install → First Order → Second Order (30 days) → Active User (3+ orders/month) 100 users → 100 → 25 → 12 Drop-off: 0% 75% 52%

Key insights from analytics:

  1. Poor first experience: 40% of first orders have issues (late delivery, wrong order, quality complaints)
  2. Price sensitivity: 60% of churned users cite "too expensive" (vs cooking at home)
  3. Lack of variety: Users try 1-2 cuisines, then forget about the app
  4. No habit formation: Food delivery is episodic (weekend treat), not daily habit like groceries

Data-driven solutions:

  • Personalized win-back campaigns (discount offers to churned users)
  • Zomato Gold (subscription for frequent users)
  • Push notifications at meal times (lunch 12-1 PM, dinner 7-9 PM)
Info

Scale context: Reducing first-month churn from 75% → 60% = 15,000 additional retained users/month at Zomato's scale. Over 12 months, that's 180,000 users × ₹500 avg LTV = ₹90 crore additional revenue.

🔬

Data They Used & Analytics Approach

1. Restaurant Recommendations: Collaborative Filtering + Contextual Ranking

Data sources:

code.pyPython
# User order history
{
  "user_id": "U12345",
  "order_history": [
    {"restaurant_id": "R001", "cuisine": "North Indian", "order_time": "13:15", "rating": 4},
    {"restaurant_id": "R045", "cuisine": "Chinese", "order_time": "20:30", "rating": 5},
    {"restaurant_id": "R122", "cuisine": "Italian", "order_time": "21:00", "rating": 4}
  ],
  "search_queries": ["pizza near me", "biryani", "healthy salad"],
  "filters_used": ["veg_only", "rating_4_plus", "delivery_time_30min"]
}

# Restaurant metadata
{
  "restaurant_id": "R001",
  "name": "Punjab Grill",
  "cuisine": ["North Indian", "Mughlai"],
  "avg_rating": 4.2,
  "avg_delivery_time": 35,
  "price_for_two": 800,
  "location_lat_lon": (12.9716, 77.5946),
  "popular_dishes": ["Butter Chicken", "Dal Makhani", "Naan"]
}

SQL: Find restaurants similar to user's past orders

query.sqlSQL
-- Collaborative filtering: Users who ordered from Restaurant A also ordered from Restaurant B
WITH user_restaurant_pairs AS (
  SELECT
    o1.user_id,
    o1.restaurant_id AS restaurant_a,
    o2.restaurant_id AS restaurant_b,
    o1.order_date
  FROM orders o1
  JOIN orders o2
    ON o1.user_id = o2.user_id
    AND o1.restaurant_id != o2.restaurant_id
  WHERE o1.order_date >= CURRENT_DATE - INTERVAL '90 days'
),

restaurant_similarity AS (
  SELECT
    restaurant_a,
    restaurant_b,
    COUNT(DISTINCT user_id) AS users_ordered_both,
    -- Confidence: If user orders from A, probability they also order from B
    COUNT(DISTINCT user_id) * 100.0 /
      (SELECT COUNT(DISTINCT user_id) FROM orders WHERE restaurant_id = restaurant_a)
      AS confidence_pct
  FROM user_restaurant_pairs
  GROUP BY restaurant_a, restaurant_b
  HAVING COUNT(DISTINCT user_id) >= 20  -- Minimum support
)

SELECT
  ra.name AS restaurant_a_name,
  rb.name AS restaurant_b_name,
  rs.confidence_pct,
  rb.avg_rating,
  rb.avg_delivery_time,
  rb.price_for_two
FROM restaurant_similarity rs
JOIN restaurants ra ON rs.restaurant_a = ra.restaurant_id
JOIN restaurants rb ON rs.restaurant_b = rb.restaurant_id
WHERE rs.restaurant_a = 'R001'  -- Punjab Grill (user's favorite)
ORDER BY rs.confidence_pct DESC
LIMIT 10;

Python: Contextual ranking (time of day, cuisine preference)

code.pyPython
import pandas as pd
import numpy as np
from datetime import datetime

def rank_restaurants(user_id, user_context, restaurant_list):
    """
    Rank restaurants based on collaborative filtering + contextual factors

    Args:
        user_id: User identifier
        user_context: {'hour': 13, 'day_of_week': 'Friday', 'weather': 'rainy'}
        restaurant_list: List of candidate restaurants (from collaborative filtering)

    Returns:
        Ranked restaurant list with scores
    """

    results = []

    for restaurant in restaurant_list:
        score = 0

        # Base score: Collaborative filtering confidence (0-100)
        score += restaurant['cf_confidence']

        # Contextual adjustments

        # Time of day preference (lunch: quick delivery, dinner: quality)
        if 12 <= user_context['hour'] <= 14:  # Lunch
            if restaurant['avg_delivery_time'] <= 30:
                score += 20  # Prioritize fast delivery
        elif 19 <= user_context['hour'] <= 22:  # Dinner
            if restaurant['avg_rating'] >= 4.0:
                score += 20  # Prioritize high-rated

        # Weather context (rainy day: comfort food)
        if user_context['weather'] == 'rainy':
            if restaurant['cuisine'] in ['North Indian', 'Chinese', 'Italian']:
                score += 15  # Comfort food cuisines

        # Price sensitivity (Friday/weekend: less price-sensitive)
        if user_context['day_of_week'] in ['Friday', 'Saturday', 'Sunday']:
            score += 5  # All restaurants benefit (users order more expensive on weekends)
        else:
            if restaurant['price_for_two'] <= 400:
                score += 10  # Prioritize budget options on weekdays

        # Distance penalty (farther = longer delivery = lower score)
        distance_km = restaurant['distance_km']
        if distance_km <= 2:
            score += 10
        elif distance_km <= 4:
            score += 5
        else:
            score -= 5  # Penalize distant restaurants

        results.append({
            'restaurant_id': restaurant['restaurant_id'],
            'name': restaurant['name'],
            'final_score': score,
            'avg_rating': restaurant['avg_rating'],
            'delivery_time': restaurant['avg_delivery_time'],
            'price_for_two': restaurant['price_for_two']
        })

    # Sort by score descending
    results_sorted = sorted(results, key=lambda x: x['final_score'], reverse=True)

    return results_sorted

# Example usage
user_context = {
    'hour': 13,
    'day_of_week': 'Wednesday',
    'weather': 'clear'
}

candidate_restaurants = [
    {'restaurant_id': 'R001', 'name': 'Punjab Grill', 'cf_confidence': 85, 'cuisine': 'North Indian',
     'avg_rating': 4.2, 'avg_delivery_time': 35, 'price_for_two': 800, 'distance_km': 3.2},
    {'restaurant_id': 'R002', 'name': 'Chinese Wok', 'cf_confidence': 75, 'cuisine': 'Chinese',
     'avg_rating': 3.9, 'avg_delivery_time': 25, 'price_for_two': 350, 'distance_km': 1.8},
    {'restaurant_id': 'R003', 'name': 'Wow! Momo', 'cf_confidence': 70, 'cuisine': 'Tibetan',
     'avg_rating': 4.0, 'avg_delivery_time': 20, 'price_for_two': 300, 'distance_km': 2.5}
]

ranked = rank_restaurants('U12345', user_context, candidate_restaurants)

print("Ranked Restaurants for Lunch (Wednesday):")
for i, r in enumerate(ranked, 1):
    print(f"{i}. {r['name']} (Score: {r['final_score']}, Rating: {r['avg_rating']}, "
          f"Delivery: {r['delivery_time']}min, Price: ₹{r['price_for_two']})")

# Output:
# 1. Wow! Momo (Score: 105, Rating: 4.0, Delivery: 20min, Price: ₹300)
# 2. Chinese Wok (Score: 105, Rating: 3.9, Delivery: 25min, Price: ₹350)
# 3. Punjab Grill (Score: 95, Rating: 4.2, Delivery: 35min, Price: ₹800)
#
# Reason: Lunchtime prioritizes fast delivery + budget-friendly (Wow! Momo, Chinese Wok win)
# Punjab Grill ranked lower despite higher CF confidence because slower delivery + expensive

Result: Personalized ranking increased order conversion from 8% → 12% (+50% lift)


2. Churn Prediction & Win-Back Campaigns

SQL: Identify at-risk users (no order in 30 days)

query.sqlSQL
-- Cohort analysis: Users who ordered in Jan 2026, retention over next 3 months
WITH jan_cohort AS (
  SELECT DISTINCT user_id
  FROM orders
  WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31'
),

monthly_activity AS (
  SELECT
    jc.user_id,
    DATE_TRUNC('month', o.order_date) AS order_month,
    COUNT(o.order_id) AS orders_count
  FROM jan_cohort jc
  LEFT JOIN orders o ON jc.user_id = o.user_id
    AND o.order_date >= '2026-01-01'
    AND o.order_date < '2026-05-01'
  GROUP BY jc.user_id, DATE_TRUNC('month', o.order_date)
)

SELECT
  order_month,
  COUNT(DISTINCT user_id) AS active_users,
  SUM(orders_count) AS total_orders,
  SUM(orders_count) * 1.0 / COUNT(DISTINCT user_id) AS avg_orders_per_user
FROM monthly_activity
GROUP BY order_month
ORDER BY order_month;

-- Churn prediction: Users likely to churn (ML feature engineering)
SELECT
  u.user_id,
  u.email,
  u.signup_date,
  CURRENT_DATE - MAX(o.order_date) AS days_since_last_order,
  COUNT(o.order_id) AS total_orders,
  AVG(o.order_value) AS avg_order_value,
  AVG(o.delivery_rating) AS avg_delivery_rating,
  -- Churn risk flag
  CASE
    WHEN CURRENT_DATE - MAX(o.order_date) > 30 AND COUNT(o.order_id) >= 3 THEN 'HIGH_RISK'
    WHEN CURRENT_DATE - MAX(o.order_date) > 45 THEN 'MEDIUM_RISK'
    ELSE 'ACTIVE'
  END AS churn_risk_segment
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.email, u.signup_date
HAVING COUNT(o.order_id) > 0  -- Exclude never-ordered users
ORDER BY days_since_last_order DESC;

Python: Personalized win-back offer

code.pyPython
# Win-back campaign: Personalized discount based on user value
def generate_winback_offer(user_segment, user_ltv):
    """
    Generate personalized discount offer to win back churned users

    Args:
        user_segment: 'HIGH_RISK', 'MEDIUM_RISK', 'ACTIVE'
        user_ltv: Lifetime value (total spent) ₹

    Returns:
        Discount offer dictionary
    """

    offers = {
        'HIGH_RISK': {
            'ltv_0_1000': {'discount': 100, 'min_order': 199, 'message': 'We miss you! ₹100 OFF on your next order'},
            'ltv_1000_5000': {'discount': 150, 'min_order': 299, 'message': 'Come back! ₹150 OFF on ₹299+'},
            'ltv_5000_plus': {'discount': 250, 'min_order': 499, 'message': 'Special offer! ₹250 OFF on ₹499+'}
        },
        'MEDIUM_RISK': {
            'ltv_0_1000': {'discount': 50, 'min_order': 199, 'message': '₹50 OFF your next order'},
            'ltv_1000_5000': {'discount': 75, 'min_order': 249, 'message': '₹75 OFF on ₹249+'},
            'ltv_5000_plus': {'discount': 100, 'min_order': 299, 'message': '₹100 OFF on ₹299+'}
        }
    }

    # Determine LTV bucket
    if user_ltv < 1000:
        ltv_bucket = 'ltv_0_1000'
    elif user_ltv < 5000:
        ltv_bucket = 'ltv_1000_5000'
    else:
        ltv_bucket = 'ltv_5000_plus'

    # Return personalized offer
    return offers[user_segment][ltv_bucket]

# Example
offer = generate_winback_offer('HIGH_RISK', 3500)
print(offer)
# Output: {'discount': 150, 'min_order': 299, 'message': 'Come back! ₹150 OFF on ₹299+'}

Result: Win-back campaigns with personalized offers recovered 18% of at-risk users (vs 5% with generic offers)

⚠️ CheckpointQuiz error: Missing or invalid options array

📈

Key Results & Impact

1. Restaurant Discovery Improvements

Before personalization (generic ranking):

  • Order conversion rate: 8% (92% of users browsed without ordering)
  • Avg time to order: 12 minutes (high friction)
  • New user activation: 35% (placed first order within 7 days)

After personalization (collaborative filtering + contextual ranking):

  • Order conversion rate: 12% (+50% lift)
  • Avg time to order: 8 minutes (33% faster)
  • New user activation: 48% (+37% improvement)

Revenue impact: ₹800+ crore additional GMV from improved discovery


2. Delivery Time Prediction Accuracy

Metric improvements:

  • ETA accuracy: ±3 minutes (vs ±8 minutes with fixed ETAs)
  • Late delivery rate: 8% (down from 20%)
  • Refund costs: ₹10 crore/year (down from ₹25 crore)
  • Customer satisfaction: 4.2/5 (up from 3.6/5)

3. Customer Retention & LTV

Cohort analysis results (Jan 2026 cohort):

| Month | Active Users | Retention % | Avg Orders/User | Revenue/User | |-------|--------------|-------------|-----------------|--------------| | Jan (M0) | 100,000 | 100% | 1.0 | ₹350 | | Feb (M1) | 32,000 | 32% | 1.5 | ₹525 | | Mar (M2) | 22,000 | 22% | 2.1 | ₹735 | | Apr (M3) | 18,000 | 18% | 2.5 | ₹875 |

Impact of win-back campaigns:

  • Without campaigns: Month 1 retention = 25%
  • With personalized offers: Month 1 retention = 32% (+28% lift)
  • Recovered users: 7,000 per 100K cohort × ₹500 avg LTV = ₹35 lakh per cohort
Info

Zomato Gold impact: 5M+ paid members (₹149/month subscription). Members order 3× more frequently than non-members (4.5 orders/month vs 1.5). Gold membership drives ₹2,000+ crore annual GMV (25% of total revenue).

💡

What You Can Learn from Zomato

1. Context Matters as Much as Patterns

Key insight: Collaborative filtering finds patterns (what users generally like), but context determines relevance (what users want RIGHT NOW).

How to apply this:

  • When building recommendation systems, always add contextual features:
    • Time of day (breakfast vs dinner preferences)
    • Day of week (weekday budget vs weekend splurge)
    • Weather (rainy day comfort food vs sunny day salads)
    • Location (home vs office vs traveling)

Portfolio project idea: "Food delivery recommendation system with collaborative filtering + contextual ranking using Zomato/Swiggy public data"


2. Retention > Acquisition (Fix Churn Before Scaling Ads)

Key insight: Zomato's biggest problem isn't getting new users (app installs are cheap) — it's keeping them (75% churn after first order).

The math:

code.pyPython
# Scenario A: Focus on acquisition (no retention fix)
new_users_per_month = 100000
month_1_retention = 0.25  # 75% churn
month_3_retention = 0.15
cac = 150  # Cost to acquire one user (ads)
ltv = 500  # Lifetime value per user

total_cost = new_users_per_month * cac  # ₹1.5 crore
total_revenue = new_users_per_month * month_3_retention * ltv  # ₹75 lakh
roi = (total_revenue - total_cost) / total_cost * 100  # -50% (losing money!)

# Scenario B: Fix retention first, then scale acquisition
new_users_per_month = 100000
month_1_retention = 0.40  # Improved from 25% → 40% (win-back campaigns)
month_3_retention = 0.25  # Improved from 15% → 25%
cac = 150
ltv = 800  # Higher LTV (retained users order more)

total_cost = new_users_per_month * cac  # ₹1.5 crore
total_revenue = new_users_per_month * month_3_retention * ltv  # ₹2 crore
roi = (total_revenue - total_cost) / total_cost * 100  # +33% (profitable!)

Lesson: Fix the leaky bucket (churn) before pouring more water (acquisition). Use cohort analysis to measure retention.


3. Personalization Works at All Stages (Not Just Recommendations)

Key insight: Zomato personalizes everything — recommendations, offers, notifications, email subject lines.

Examples:

  • Recommendations: Contextual ranking (lunch vs dinner)
  • Offers: Win-back discounts based on LTV (high-value users get better offers)
  • Notifications: Sent at user's typical order time (12:30 PM lunch, 8 PM dinner)
  • Email subject lines: A/B tested ("Order your favorite biryani" vs "10% OFF today")

How to apply this to job search:

  • Generic cover letter: "I'm a data analyst with SQL and Python skills" → Recruiter thinks: "Like 100 other applicants"

  • Personalized cover letter: "I noticed Zomato is hiring for a Growth Analyst role focused on retention. I built a churn prediction model using cohort analysis and win-back campaigns (see portfolio project), which aligns with your need for reducing Month 1 churn from 30% → 20%." → Recruiter thinks: "This person understands our problem!"

The best analysts personalize everything — just like Zomato personalizes for each user.

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}