#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Bivariate Analysis

Learn to analyze relationships between two columns

Bivariate Analysis

What is Bivariate Analysis?

Bivariate = two variables. Looking at how two columns relate.

Questions like:

  • Do older people earn more?
  • Do men or women buy more?
  • Does education affect salary?

Number vs Number

Compare two numeric columns:

code.py
import pandas as pd

df = pd.DataFrame({
    'Age': [25, 30, 35, 40, 45, 50],
    'Salary': [40000, 50000, 55000, 65000, 70000, 80000]
})

# Correlation: do they move together?
print(df['Age'].corr(df['Salary']))

Output: 0.98 (very strong positive relationship)

  • Close to 1: Both go up together
  • Close to -1: One goes up, other goes down
  • Close to 0: No relationship

Category vs Number

Compare categories with numbers:

code.py
df = pd.DataFrame({
    'Department': ['Sales', 'IT', 'Sales', 'IT', 'HR', 'HR'],
    'Salary': [50000, 70000, 55000, 75000, 45000, 48000]
})

# Average salary by department
print(df.groupby('Department')['Salary'].mean())

Output:

Department HR 46500 IT 72500 Sales 52500

IT earns the most!

More Stats by Group

code.py
# Multiple stats per group
print(df.groupby('Department')['Salary'].agg(['mean', 'min', 'max', 'count']))

Category vs Category

Compare two categorical columns:

code.py
df = pd.DataFrame({
    'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
    'Bought': ['Yes', 'Yes', 'No', 'Yes', 'No', 'No']
})

# Cross tabulation
print(pd.crosstab(df['Gender'], df['Bought']))

Output:

Bought No Yes Gender F 1 2 M 2 1

Add Percentages

code.py
# Percentage by row
print(pd.crosstab(df['Gender'], df['Bought'], normalize='index') * 100)

Output:

Bought No Yes Gender F 33.33 66.67 M 66.67 33.33

67% of females bought, only 33% of males bought.

Quick Summary by Group

code.py
df = pd.DataFrame({
    'City': ['NYC', 'LA', 'NYC', 'LA', 'NYC'],
    'Age': [25, 30, 28, 35, 22],
    'Salary': [50000, 60000, 55000, 70000, 45000]
})

# Summary for each city
print(df.groupby('City').describe())

Key Questions to Ask

  1. Numbers: What's the correlation?
  2. Category + Number: What's the average per group?
  3. Categories: How do combinations distribute?

Key Points

  • corr() measures relationship between numbers
  • groupby().mean() compares groups
  • crosstab() counts category combinations
  • Correlation close to 1 or -1 = strong relationship
  • Correlation close to 0 = no relationship

What's Next?

Deep dive into correlation analysis and what the numbers mean.