Topic 75 of

Data Analyst Portfolio Projects: 10 Ideas That Get Interviews

65% of data analyst job offers go to candidates with strong portfolios. Your portfolio is proof you can do the job. This guide gives you 10 project ideas, datasets, and templates to build a portfolio that gets interviews.

📚Beginner
⏱️13 min
8 quizzes
💼

Why Portfolio Matters More Than Certifications

The Hard Truth About Certifications

Certifications prove you learned (watched videos, passed quiz). Portfolio proves you can DO (solved real problems, made decisions, delivered insights).

Recruiter perspective:

  • 1,000 candidates have "Google Data Analytics Certificate"
  • Only 50 have "E-commerce sales dashboard analyzing 100K+ orders with actionable insights" → Portfolio candidates get interviews (proof of skills vs. claim of skills)

What Makes a Strong Portfolio Project?

Weak project (tutorial-following):

Analyzed Titanic dataset from Kaggle - Loaded data in pandas - Created visualizations - Built prediction model

Why weak?:

  • Everyone does Titanic (no differentiation)
  • No business context (who cares about predicting Titanic survival?)
  • No insights (what did you learn? what would you recommend?)

Strong project (business-focused):

Swiggy Restaurant Performance Analysis - Scraped 50K+ orders from public Swiggy data - Analyzed delivery times, customer ratings, order patterns using SQL + Python - Found: Restaurants with <30min avg delivery time have 4.2★ rating vs 3.6★ for >45min - Built Tableau dashboard showing "optimal menu pricing" and "peak hour staffing" - Business impact: Recommendations could reduce delivery time 15% and increase ratings 0.4★

Why strong?:

  • Real-world business problem (restaurant optimization)
  • End-to-end pipeline (data collection → analysis → visualization → insights)
  • Actionable recommendations (reduce delivery time, optimize pricing)
  • Quantified impact (15% time reduction, 0.4★ rating increase)
Info

Portfolio goal: Prove you can solve business problems using data. Recruiters ask: "Could this person walk into our company on Day 1 and deliver value?" Strong portfolio answers "Yes."


What to Include in Each Project

Every portfolio project should have:

  1. Business context (Why does this problem matter?)

    • "E-commerce companies lose 30% of customers who abandon cart. This project identifies abandonment reasons and recommends solutions."
  2. Data pipeline (How did you get and prepare data?)

    • "Scraped 100K+ orders using Python BeautifulSoup → Cleaned with pandas (removed duplicates, handled missing values) → Loaded into PostgreSQL for analysis"
  3. Analysis methodology (What techniques did you use?)

    • "Performed cohort analysis using SQL window functions to compare retention across user segments"
    • "Built ARIMA time series model to forecast next quarter's demand"
  4. Visualizations (How did you communicate findings?)

    • Tableau/Power BI dashboard with filters, drill-downs, KPIs
    • Python plots (matplotlib/seaborn) showing trends, distributions, correlations
  5. Business insights (What did you find? So what?)

    • "Found: 60% of cart abandonment happens at payment step (UPI timeout) → Recommend adding auto-retry feature"
  6. Quantified impact (What happens if company implements your recommendations?)

    • "Implementing auto-retry could reduce abandonment from 30% → 22%, adding ₹2 crore revenue annually"
  7. GitHub + Live Demo (Proof it's real)

    • GitHub repo with code, data, README
    • Live Tableau dashboard or deployed Streamlit app
💡

10 Portfolio Project Ideas (Beginner to Advanced)

Beginner Projects (1-2 weeks, single tool focus)


1. IPL Cricket Analytics Dashboard (Tableau/Power BI)

Dataset: Kaggle IPL Dataset — 15K+ matches, 200K+ balls

Business question: Which teams/players perform best under pressure (playoffs, high-stakes matches)?

Analysis approach:

  • Calculate win rates by team, venue, toss decision (bat first vs chase)
  • Analyze player performance (strike rate, economy rate) in powerplay vs death overs
  • Identify clutch players (best performance in playoffs/finals)

Deliverables:

  • Interactive Tableau dashboard with filters (Season, Team, Player, Venue)
  • KPI cards: Win Rate, Average Score, Strike Rate by Player
  • Visualizations: Line chart (runs scored over seasons), Heatmap (team performance by venue)

Key insight example: "Teams batting first win 52% vs 48% batting second overall, but in Chennai (spin-friendly pitch), chasing teams win 58% → Recommendation: Win toss in Chennai → Bowl first"

Skills demonstrated: Data cleaning, calculated fields (DAX/Tableau), dashboard design, business storytelling

Time: 1 week (40 hours)


2. Personal Finance Tracker (Excel + Power Query)

Dataset: Your own bank statements (last 12 months) — export as CSV from bank app

Business question: Where am I overspending? How can I save ₹10K/month?

Analysis approach:

  • Categorize transactions (Food, Transport, Entertainment, Bills) using Excel VLOOKUP
  • Create pivot tables showing spending by category, month, merchant
  • Build year-over-year comparison (2024 vs 2025)
  • Set budget targets and track variance

Deliverables:

  • Excel dashboard with slicers (Month, Category)
  • Charts: Spending trend (line chart), Category breakdown (pie chart), Budget vs Actual (bar chart)
  • Automated monthly report using Power Query (refresh with new data each month)

Key insight example: "Spending ₹8K/month on food delivery (30% of income) → Recommendation: Cook 3× per week → Save ₹4K/month (₹48K/year)"

Skills demonstrated: Excel proficiency (VLOOKUP, Pivot Tables, Power Query), data cleaning, personal financial analysis

Time: 3-5 days (20 hours)


3. COVID-19 India Trend Analysis (SQL + Tableau)

Dataset: Our World in Data COVID-19 Dataset — Daily cases, deaths, vaccinations by country

Business question: How did India's COVID response compare to similar countries (Brazil, USA, Indonesia)?

Analysis approach:

  • Import CSV into PostgreSQL database
  • Write SQL queries: Daily new cases, 7-day moving average, death rate, vaccination rate
  • Compare India vs peer countries (cases per million, deaths per million)
  • Analyze vaccination rollout speed (% population vaccinated by date)

Deliverables:

  • SQL queries in GitHub repo (with comments explaining logic)
  • Tableau dashboard comparing India vs 5 countries
  • Time series chart: Cases over time (with lockdown annotations)

Key insight example: "India's second wave (Apr-May 2021) had 400K daily cases (peak) but death rate (1.2%) lower than USA (1.8%) due to younger population → Vaccination priority should focus on 60+ age group (70% of deaths)"

Skills demonstrated: SQL (joins, window functions, aggregations), data storytelling, comparative analysis

Time: 1 week (40 hours)


Intermediate Projects (2-4 weeks, multiple tools)


4. E-commerce Sales Analysis (Python + SQL + Tableau)

Dataset: Brazilian E-commerce Dataset (Olist) — 100K+ orders, customer reviews, seller ratings

Business question: What drives customer satisfaction and repeat purchases?

Analysis approach:

  • Data cleaning (Python pandas): Handle missing values, remove duplicates, standardize date formats
  • Database setup (SQL): Load into PostgreSQL, create star schema (fact_orders, dim_customers, dim_products)
  • RFM analysis (SQL): Segment customers by Recency, Frequency, Monetary value
  • Cohort analysis (SQL): Track retention by month of first purchase
  • Dashboard (Tableau): Revenue by category, top sellers, customer segments, retention cohort heatmap

Deliverables:

  • Python notebook (Jupyter) with data cleaning steps + code comments
  • SQL scripts creating database schema + analytical queries
  • Tableau dashboard with 5+ charts (revenue trend, RFM segments, cohort retention)
  • GitHub repo with README explaining methodology

Key insight example: "Customers who receive order in <5 days have 35% repeat rate vs 12% for >10 days → Recommendation: Invest in logistics to reduce delivery time → Increase repeat revenue by ₹15 crore annually"

Skills demonstrated: Full data pipeline (Python → SQL → Tableau), RFM analysis, cohort analysis, business recommendations

Time: 2-3 weeks (80 hours)


5. Netflix Content Strategy Analysis (Python)

Dataset: Netflix Movies & TV Shows Dataset — 8K+ titles with release year, genre, country, duration

Business question: What content types (genre, country, duration) perform best? What should Netflix produce next?

Analysis approach:

  • EDA (Exploratory Data Analysis): Distribution of genres, countries, release years
  • Content trends: Growth of TV shows vs movies over time, genre popularity shifts
  • Country analysis: Which countries produce most content? (USA, India, UK)
  • Duration analysis: Optimal movie length (90-120 min) vs TV show seasons (1-3 seasons)

Deliverables:

  • Jupyter notebook with matplotlib/seaborn visualizations
  • Heatmap: Content production by country + year
  • Bar chart: Top 10 genres by count
  • Line chart: Movies vs TV shows over time

Key insight example: "Netflix added 1,200 Indian titles (2019-2024), 70% in Hindi/Tamil → India is fastest-growing market → Recommendation: Invest in regional content (Telugu, Bengali) to capture next 100M subscribers"

Skills demonstrated: Python (pandas, matplotlib, seaborn), exploratory data analysis, trend identification

Time: 1 week (40 hours)


6. Zomato Restaurant Recommendation System (Python + ML)

Dataset: Zomato Bangalore Restaurants Dataset — 50K+ restaurants with ratings, cuisine, location, price

Business question: Can we recommend restaurants to users based on their preferences?

Analysis approach:

  • Feature engineering: Extract features (cuisine type, price range, location, rating)
  • Content-based filtering: Recommend similar restaurants (e.g., user likes "Italian, ₹₹, 4.5★" → Recommend other high-rated Italian restaurants)
  • Collaborative filtering: Use user ratings to find similar users (users who liked Restaurant A also liked Restaurant B)
  • Model evaluation: Precision, recall, F1-score

Deliverables:

  • Jupyter notebook with code + model training
  • Function: recommend_restaurants(user_preferences) returning top 10 matches
  • Evaluation metrics (accuracy, RMSE for rating prediction)

Key insight example: "Content-based filtering (70% accuracy) outperforms collaborative filtering (60%) for cold-start users (new users with no rating history) → Use hybrid approach: Content-based for new users → Switch to collaborative after 10+ ratings"

Skills demonstrated: Machine learning (recommendation systems), feature engineering, model evaluation, Python (scikit-learn)

Time: 2 weeks (60 hours)


Advanced Projects (4-6 weeks, production-level)


7. Flipkart Sales Forecasting Dashboard (Python + SQL + Streamlit)

Dataset: Generate synthetic Flipkart sales data (or use Online Retail Dataset)

Business question: Forecast next quarter's sales to optimize inventory and marketing spend.

Analysis approach:

  • Time series decomposition: Trend, seasonality, residuals (statsmodels)
  • ARIMA model: Forecast next 90 days of sales
  • Prophet model (Facebook): Handle seasonality (Diwali spike, Republic Day sale)
  • Feature engineering: Add external variables (holidays, promotions, competitor sales)
  • Dashboard: Live forecasting dashboard with confidence intervals

Deliverables:

  • Python scripts for data pipeline (ETL: Extract from PostgreSQL → Transform → Load forecast to database)
  • ARIMA + Prophet models with evaluation metrics (MAPE, RMSE)
  • Streamlit dashboard: Upload new data → Get forecast → Download CSV
  • Deployed on Streamlit Cloud (public URL)

Key insight example: "Diwali week drives 25% of Q4 revenue → Forecast: ₹1,200 crore Diwali GMV (±8% margin of error) → Recommendation: Stock 40% more inventory in warehouses 2 weeks before Diwali"

Skills demonstrated: Time series forecasting, model deployment, dashboard building (Streamlit), production-ready code

Time: 4 weeks (120 hours)


8. Customer Churn Prediction Model (Python + ML + Flask API)

Dataset: Telco Customer Churn Dataset — 7K customers with churn label

Business question: Which customers are likely to churn in next 30 days? How to retain them?

Analysis approach:

  • EDA: Churn rate by tenure, contract type, payment method
  • Feature engineering: Create features (avg monthly charges, tenure in months, support tickets filed)
  • Model training: Logistic regression, Random Forest, XGBoost (compare performance)
  • Model interpretation: Feature importance (which factors drive churn?), SHAP values
  • API deployment: Flask API accepting customer data, returning churn probability

Deliverables:

  • Jupyter notebook with model training + evaluation
  • Flask API code (POST /predict endpoint)
  • Deployed on Heroku/Render (public API endpoint)
  • Postman collection for testing API

Key insight example: "Customers on month-to-month contracts churn at 42% vs 11% on yearly contracts → Recommendation: Offer 10% discount for yearly contract upgrade → Reduce churn from 42% → 30%, saving ₹5 crore annually in retention costs"

Skills demonstrated: Machine learning (classification), model deployment (Flask API), feature engineering, business impact quantification

Time: 3-4 weeks (100 hours)


9. A/B Test Analysis Framework (Python + Statistical Testing)

Dataset: Generate synthetic A/B test data (control vs treatment groups)

Business question: Did new feature (free shipping) increase conversion significantly?

Analysis approach:

  • Sample size calculation: How many users needed for 95% confidence, 80% power?
  • Hypothesis testing: Two-sample t-test, chi-square test
  • Conversion rate analysis: Control (2.1%) vs Treatment (2.5%) → +19% lift
  • Statistical significance: p-value < 0.05? Confidence interval?
  • Power analysis: Was sample size sufficient?

Deliverables:

  • Jupyter notebook with statistical tests + code comments
  • Reusable function: ab_test_analysis(control_data, treatment_data) returning significance, lift, confidence interval
  • Visualization: Conversion rate comparison (bar chart), confidence intervals (error bars)

Key insight example: "Treatment (free shipping ≥ ₹399) has 2.5% conversion vs Control (≥ ₹499) 2.1% → +19% lift, p-value = 0.003 (statistically significant) → Recommendation: Deploy Treatment → Expected ₹3.5 crore incremental revenue"

Skills demonstrated: A/B testing, hypothesis testing, statistical significance, sample size calculation

Time: 1-2 weeks (50 hours)


10. Real-time Twitter Sentiment Analysis (Python + Streaming + NLP)

Dataset: Scrape live tweets using Twitter API (or use pre-collected Sentiment140 dataset)

Business question: What's public sentiment about brand/product (e.g., iPhone 15 launch)?

Analysis approach:

  • Data collection: Twitter API → Fetch tweets mentioning "iPhone 15"
  • Text preprocessing: Remove URLs, mentions, hashtags; lowercase; tokenize
  • Sentiment analysis: Use VADER (lexicon-based) or train classifier (Naive Bayes, LSTM)
  • Real-time dashboard: Streamlit app showing live sentiment (Positive: 60%, Negative: 25%, Neutral: 15%)
  • Trend analysis: Sentiment over time (did negative sentiment spike after review video?)

Deliverables:

  • Python script for Twitter scraping + sentiment analysis
  • Streamlit dashboard (updates every 5 minutes with new tweets)
  • Time series chart: Sentiment over time
  • Word cloud: Most common words in positive/negative tweets

Key insight example: "iPhone 15 launch: 65% positive sentiment initially → Dropped to 45% after YouTuber review highlighting battery issues → Recommendation: Apple should address battery concerns in marketing (crisis management)"

Skills demonstrated: API integration, NLP (sentiment analysis), real-time data processing, dashboard building

Time: 2-3 weeks (80 hours)

⚠️ CheckpointQuiz error: Missing or invalid options array

🎨

How to Showcase Your Portfolio

Portfolio Structure

Your portfolio should live in 3 places:

  1. GitHub (code repository)
  2. Tableau Public / Power BI Web (live dashboards)
  3. Personal website / Medium blog (project write-ups with context)

1. GitHub Best Practices

Repository structure:

ecommerce-sales-analysis/ ├── README.md # Project overview, methodology, insights ├── data/ │ ├── raw/ # Original dataset (CSV) │ └── processed/ # Cleaned data (parquet/CSV) ├── notebooks/ │ ├── 01_data_cleaning.ipynb # Jupyter notebook with comments │ ├── 02_eda.ipynb │ └── 03_modeling.ipynb ├── sql/ │ ├── schema.sql # Database schema │ └── queries.sql # Analytical queries ├── dashboards/ │ └── sales_dashboard.twbx # Tableau workbook ├── requirements.txt # Python dependencies └── LICENSE

README template:

README.mdMarkdown
# E-commerce Sales Analysis

## Business Problem
E-commerce companies lose 30% of customers who abandon cart. This project identifies abandonment reasons and recommends solutions.

## Dataset
- **Source**: Olist Brazilian E-commerce (Kaggle)
- **Size**: 100K+ orders, 2016-2018
- **Features**: Customer demographics, product categories, delivery times, ratings

## Methodology
1. **Data Cleaning**: Removed duplicates, handled missing values (5% of orders missing delivery date → dropped)
2. **Database Setup**: Loaded into PostgreSQL, created star schema (fact_orders, dim_customers, dim_products)
3. **Analysis**: RFM segmentation, cohort retention, funnel analysis
4. **Visualization**: Tableau dashboard with revenue trends, customer segments, retention heatmap

## Key Insights
- **Insight 1**: Customers who receive order in <5 days have 35% repeat rate vs 12% for >10 days
- **Insight 2**: Top 20% customers (RFM score ≥ 9) generate 65% of revenue
- **Insight 3**: Cart abandonment peaks at payment step (40% drop-off) due to limited payment options

## Recommendations
1. Invest in logistics to reduce delivery time from avg 8 days → 5 days (increase repeat rate 12% → 25%)
2. Launch VIP program for top 20% customers (personalized discounts, free shipping)
3. Add UPI, wallets (Paytm, PhonePe) to payment options (reduce abandonment 40% → 28%)

## Business Impact
- Recommendation 1: ₹15 crore incremental revenue from improved retention
- Recommendation 2: ₹8 crore revenue from VIP program (20% of customers)
- Recommendation 3: ₹12 crore revenue from reduced cart abandonment

## Live Dashboard
[View Tableau Dashboard](https://public.tableau.com/...)

## Tech Stack
- **Languages**: Python, SQL
- **Libraries**: pandas, NumPy, matplotlib, seaborn, scikit-learn
- **Database**: PostgreSQL
- **Visualization**: Tableau

## How to Run
1. Clone repo: `git clone https://github.com/username/ecommerce-analysis.git`
2. Install dependencies: `pip install -r requirements.txt`
3. Run notebooks in order: `01_data_cleaning.ipynb``02_eda.ipynb``03_modeling.ipynb`
4. Open Tableau workbook: `dashboards/sales_dashboard.twbx`

Key elements:

  • Start with business problem (not "This is my project" — explain WHY it matters)
  • Summarize methodology (data → analysis → insights → recommendations)
  • List key insights (3-5 bullet points)
  • Quantify business impact (₹ saved, % improvement)
  • Link live dashboard (proof it's real)
  • Instructions to run code (makes it reproducible)

2. Tableau Public Best Practices

Dashboard design:

  • Title: Clear, descriptive ("E-commerce Sales Analysis 2016-2018" not "Dashboard 1")
  • KPIs at top: Revenue, Orders, Avg Order Value, Customer Count (big numbers, easy to scan)
  • Filters visible: Year, Category, Region (let viewer explore)
  • Color coding: Use semantic colors (green = good, red = bad; consistent across charts)
  • Tooltips: Show details on hover (don't clutter chart)
  • Annotations: Highlight key insights ("Revenue dipped in June 2017 due to delivery delays")

Profile setup:

  • Bio: "Data Analyst | SQL, Python, Tableau | Building projects in e-commerce analytics, customer segmentation, forecasting"
  • Avatar: Professional photo or initials (not blank)
  • Featured dashboard: Pin best project to top of profile

3. Personal Website / Medium Blog

Why needed?: GitHub shows code; Tableau shows viz; Website tells the STORY (how you thought, why you made decisions).

Blog post structure (1,000-1,500 words):

Title: How I Reduced E-commerce Cart Abandonment by 30% Using Data Analysis Introduction (2-3 paragraphs) - Problem: E-commerce loses ₹500 crore annually to cart abandonment in India - My approach: Analyzed 100K orders to identify abandonment reasons - Result: 3 recommendations that could reduce abandonment 40% → 28% (₹12 crore impact) Methodology 1. Data Collection: Kaggle Olist dataset (100K orders, 2016-2018) 2. Data Cleaning: [Include 1-2 code snippets showing pandas data cleaning] 3. Funnel Analysis: [SQL query showing conversion funnel] Key Findings [Include 3-4 visualizations from Tableau with explanations] - Finding 1: 40% drop-off at payment step [Bar chart] - Finding 2: Delivery time <5 days → 35% repeat rate [Line chart] - Finding 3: Top 20% customers → 65% revenue [Pareto chart] Business Recommendations 1. Add payment options (UPI, wallets) 2. Optimize delivery (partner with local logistics) 3. VIP program for top 20% customers Conclusion - What I learned: RFM segmentation, cohort analysis, SQL window functions - Next steps: Build predictive model to identify at-risk customers - [Link to GitHub repo + Tableau dashboard]

Where to publish:

  • Medium: Largest audience, good SEO (your article ranks on Google)
  • Dev.to: Tech-focused community
  • Personal website: Full control (use GitHub Pages, free hosting)
Info

Pro tip: Publish 1 blog post per project. Share on LinkedIn with hashtags (#DataAnalytics #SQL #Python). Tag companies you're applying to ("Loved analyzing e-commerce data @Flipkart @Amazon — open to opportunities!"). This gets recruiter attention.

⚠️

Common Portfolio Mistakes to Avoid

Mistake 1: Tutorial Projects Only (Titanic, Iris Dataset)

Why bad: Everyone does these → No differentiation.

Fix: Use unique datasets (scrape your own, use local/Indian datasets like Zomato Bangalore, Nifty 50 stocks).


Mistake 2: No Business Context

Bad: "Analyzed dataset, created visualizations, built model." → Recruiter asks: "So what? Why does this matter?"

Good: "E-commerce loses 30% customers to cart abandonment. I identified top 3 reasons and recommended solutions that could save ₹12 crore annually." → Shows business thinking, not just technical skills.


Mistake 3: Code Without Comments

Bad: df.groupby('category').agg({'revenue': 'sum'}).sort_values('revenue', ascending=False).head(10) → Recruiter doesn't know what this does or why.

Good:

code.pyPython
# Calculate total revenue by product category
# Insight: Electronics generates 40% of revenue (top category)
revenue_by_category = df.groupby('category').agg({'revenue': 'sum'})
revenue_by_category = revenue_by_category.sort_values('revenue', ascending=False)
top_10_categories = revenue_by_category.head(10)

→ Comments explain WHAT and WHY (not just HOW).


Mistake 4: No Insights (Just Charts)

Bad: Dashboard with 10 charts, no annotations, no recommendations. → Recruiter sees: "You can make charts but can't think critically."

Good: Dashboard with 5 charts + annotations ("Revenue dipped in June 2017 — delivery delays during monsoon") + recommendations ("Partner with local logistics to reduce delays"). → Shows: You can analyze (find patterns) AND synthesize (make recommendations).


Mistake 5: Broken Links / Incomplete Projects

Bad: GitHub repo with empty README, missing data files, code that doesn't run. → Recruiter assumes: "Sloppy work ethic, can't deliver complete project."

Good: Every project has:

  • ✅ Complete README (problem, methodology, insights, how to run)
  • ✅ Working code (tested on fresh clone)
  • ✅ Live dashboard link (Tableau Public, Streamlit)
  • ✅ Data files (or clear instructions to download data)

Test: Clone your repo on friend's laptop → Can they run your code without asking you questions? If no, fix documentation.

⚠️ FinalQuiz error: Missing or invalid questions array

⚠️ SummarySection error: Missing or invalid items array

Received: {"hasItems":false,"isArray":false}