Project Overview
Scenario: You're a data analyst at "ShopKart", an e-commerce company. Leadership wants insights on Q1 2026 sales performance to guide Q2 strategy.
Your mission: Analyze sales data, identify trends, and make actionable recommendations.
The Analytics Workflow
Phase 1: Define the Problem
Phase 2: Collect Data
Phase 3: Clean Data
Phase 4: Explore & Analyze
Phase 5: Visualize Insights
Phase 6: Communicate Findings
Phase 1: Define the Problem
Business Questions
- What was our Q1 revenue and how does it compare to Q4 2025?
- Which products drove the most revenue?
- Which regions are performing best/worst?
- What's our customer retention rate?
- Are there seasonal patterns in sales?
Success Metrics
- Total revenue
- Month-over-month growth
- Top 10 products by revenue
- Revenue by region
- Repeat customer rate
Phase 2: Collect Data
Data Sources
Tables to pull:
- orders - order_id, customer_id, product_id, amount, date, status
- customers - customer_id, name, email, city, region, signup_date
- products - product_id, name, category, price
SQL Query:
SELECT
o.order_id,
o.customer_id,
o.product_id,
o.amount,
o.order_date,
o.status,
c.name AS customer_name,
c.city,
c.region,
p.product_name,
p.category
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date BETWEEN '2026-01-01' AND '2026-03-31'
AND o.status = 'Delivered';Export: Save as q1_sales.csv
Phase 3: Clean Data
import pandas as pd
import numpy as np
# Load data
df = pd.read_csv('q1_sales.csv')
# Check shape and info
print(f"Rows: {len(df)}, Columns: {len(df.columns)}")
print(df.info())
# 1. Handle missing values
print(df.isnull().sum())
df = df.dropna(subset=['amount', 'customer_id']) # Drop critical nulls
df['city'].fillna('Unknown', inplace=True) # Fill non-critical
# 2. Remove duplicates
print(f"Duplicates: {df.duplicated().sum()}")
df = df.drop_duplicates()
# 3. Fix data types
df['order_date'] = pd.to_datetime(df['order_date'])
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')
# 4. Clean text
df['city'] = df['city'].str.strip().str.title()
df['region'] = df['region'].str.upper()
# 5. Remove outliers (orders > βΉ1M likely errors)
df = df[df['amount'] < 1000000]
# 6. Create new columns
df['month'] = df['order_date'].dt.month
df['week'] = df['order_date'].dt.isocalendar().week
print(f"After cleaning: {len(df)} rows")Phase 4: Explore & Analyze
1. Overall Revenue
total_revenue = df['amount'].sum()
print(f"Q1 Total Revenue: βΉ{total_revenue:,.0f}")
# Monthly breakdown
monthly_revenue = df.groupby('month')['amount'].sum()
print(monthly_revenue)Output:
Q1 Total Revenue: βΉ12,450,000
month
1 3,800,000
2 4,200,000
3 4,450,000
Insight: Revenue growing month-over-month (+10% JanβFeb, +6% FebβMar)
2. Top Products
top_products = df.groupby('product_name')['amount'].sum().sort_values(ascending=False).head(10)
print(top_products)Output:
Laptop Pro 2,500,000
Wireless Earbuds 1,800,000
Smart Watch 1,200,000
...
Insight: Electronics dominate top 10 (70% of revenue)
3. Regional Performance
regional_revenue = df.groupby('region').agg({
'amount': 'sum',
'order_id': 'count'
}).rename(columns={'amount': 'revenue', 'order_id': 'orders'})
regional_revenue['avg_order_value'] = regional_revenue['revenue'] / regional_revenue['orders']
print(regional_revenue.sort_values('revenue', ascending=False))Output: | Region | Revenue | Orders | AOV | |--------|---------|--------|-----| | WEST | βΉ4.5M | 2,100 | βΉ2,143 | | NORTH | βΉ3.8M | 1,900 | βΉ2,000 | | SOUTH | βΉ2.6M | 1,500 | βΉ1,733 | | EAST | βΉ1.5M | 1,000 | βΉ1,500 |
Insight: West leads in revenue, but East has lowest AOV (opportunity!)
4. Customer Retention
# Customers who ordered in Q1
q1_customers = df['customer_id'].unique()
# Load Q4 2025 data
q4_df = pd.read_csv('q4_sales.csv')
q4_customers = q4_df['customer_id'].unique()
# Repeat customers
repeat_customers = np.intersect1d(q1_customers, q4_customers)
retention_rate = len(repeat_customers) / len(q4_customers) * 100
print(f"Retention Rate: {retention_rate:.1f}%")Output: Retention Rate: 42%
Insight: Room for improvement (industry benchmark: 50-60%)
5. Trend Analysis
daily_sales = df.groupby(df['order_date'].dt.date)['amount'].sum()
# 7-day moving average
daily_sales_ma = daily_sales.rolling(window=7).mean()
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(daily_sales.index, daily_sales, label='Daily Sales', alpha=0.5)
plt.plot(daily_sales_ma.index, daily_sales_ma, label='7-day MA', linewidth=2)
plt.title('Q1 Daily Sales Trend')
plt.legend()
plt.savefig('sales_trend.png')Insight: Clear weekly pattern (weekends higher), gradual upward trend
Phase 5: Visualize Insights
Dashboard Layout
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ
β Total β Growth β Avg Order β
β Revenue β % MoM β Value β
β βΉ12.45M β +8% β βΉ1,950 β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ€
β Monthly Revenue Trend (Line Chart) β
β [Jan, Feb, Mar trending up] β
ββββββββββββββββ¬βββββββββββββββββββββββββββ€
β Top Products β Regional Distribution β
β (Bar Chart) β (Map or Bar) β
β β West > North > South > Eastβ
ββββββββββββββββ΄βββββββββββββββββββββββββββ€
β Product Category Mix (Pie/Donut) β
β Electronics 55% | Fashion 25% | Other 20%β
βββββββββββββββββββββββββββββββββββββββββββ
Tools: Power BI, Tableau, or Python (Plotly/Matplotlib)
Phase 6: Communicate Findings
Executive Summary
To: Leadership Team From: Data Analytics Team Re: Q1 2026 Sales Performance Analysis
Key Findings:
-
β Strong Q1 Performance
- Total revenue: βΉ12.45M
- 8% MoM growth trend
- Beat Q1 2025 by 15%
-
π Product Insights
- Electronics drive 55% of revenue
- Top 3 products account for 45% of sales
- Opportunity: Expand mid-range product offerings
-
πΊοΈ Regional Performance
- West region leads (36% of revenue)
- East region lags with lowest AOV (βΉ1,500 vs βΉ2,143 company avg)
- Opportunity: Targeted campaigns in East
-
π Customer Retention
- 42% retention rate (below 50% target)
- Opportunity: Loyalty program, email campaigns
-
π Trends
- Clear weekly seasonality (weekend peaks)
- Gradual upward trajectory through Q1
- No major drop-offs or anomalies
Recommendations:
-
Expand East region presence
- Launch regional marketing campaigns
- Offer region-specific discounts
- Target AOV increase to βΉ2,000+
-
Improve retention to 50%+
- Implement loyalty rewards program
- Personalized email sequences
- Win-back campaigns for lapsed customers
-
Optimize inventory for top products
- Ensure Laptop Pro, Earbuds, Smart Watch always in stock
- Negotiate better supplier rates (top 3 = high volume)
-
Leverage weekend traffic
- Schedule promotions for Fri-Sun
- Increase ad spend on weekends
- Flash sales during peak times
Next Steps:
- Implement recommendations in Q2
- Monitor weekly KPIs
- Run A/B tests on retention tactics
Building Your Portfolio
What to Include
-
README.md
- Project overview
- Business questions
- Data sources
- Key findings
-
Jupyter Notebook
- Complete analysis code
- Visualizations
- Commentary
-
Dashboard Screenshot/Link
- Power BI / Tableau dashboard
- Interactive if possible
-
Presentation Deck
- Executive summary (5-10 slides)
- Charts and insights
- Recommendations
Portfolio Projects Ideas
- Sales Analysis (like this capstone)
- Customer Segmentation (RFM analysis)
- Churn Prediction (identify at-risk customers)
- A/B Test Analysis (landing page experiment)
- Social Media Sentiment Analysis (Twitter/Reddit)
- Web Analytics (Google Analytics deep dive)
Summary: The Complete Analytics Workflow
β Define: Clear business questions and success metrics β Collect: SQL queries, APIs, web scraping β Clean: Handle nulls, duplicates, outliers, data types β Analyze: Aggregations, trends, comparisons, correlations β Visualize: Dashboards, charts, clear storytelling β Communicate: Insights, recommendations, action items
You've completed the DataPath! π
You now have the skills to:
- Understand data types and analytics fundamentals
- Master Excel, SQL, Python, and Pandas
- Create compelling visualizations and dashboards
- Apply statistics and run A/B tests
- Execute end-to-end analytics projects
Next Steps:
- Build 2-3 portfolio projects
- Share on LinkedIn/GitHub
- Apply for data analyst roles
- Keep learning and growing!
Congratulations on finishing the course! π