5 min read min read
Cross Tabulation
Learn to count combinations of categories with crosstab()
Cross Tabulation
What is Cross Tabulation?
Cross tabulation counts how many times combinations appear.
Example questions it answers:
- How many men vs women in each city?
- How many students got A, B, C in each subject?
- How many orders per day per product?
Basic Crosstab
code.py
import pandas as pd
data = pd.DataFrame({
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC']
})
print("Data:")
print(data)
# Count combinations
result = pd.crosstab(data['Gender'], data['City'])
print("\nCrosstab:")
print(result)Output:
Data:
Gender City
0 Male NYC
1 Female NYC
2 Male LA
3 Female LA
4 Male NYC
Crosstab:
City LA NYC
Gender
Female 1 1
Male 1 2
2 males in NYC, 1 male in LA, etc.
Add Row and Column Totals
code.py
result = pd.crosstab(
data['Gender'],
data['City'],
margins=True,
margins_name='Total'
)
print(result)Output:
City LA NYC Total
Gender
Female 1 1 2
Male 1 2 3
Total 2 3 5
Show Percentages
code.py
# Percentage of total
result = pd.crosstab(
data['Gender'],
data['City'],
normalize=True
) * 100
print(result.round(1))Output:
City LA NYC
Gender
Female 20.0 20.0
Male 20.0 40.0
Different Percentage Types
code.py
# Percentage by row
pd.crosstab(data['Gender'], data['City'], normalize='index')
# Percentage by column
pd.crosstab(data['Gender'], data['City'], normalize='columns')
# Percentage of total
pd.crosstab(data['Gender'], data['City'], normalize='all')Crosstab with Values
Count is default, but you can calculate other things:
code.py
data = pd.DataFrame({
'Store': ['NYC', 'NYC', 'LA', 'LA'],
'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
'Sales': [100, 150, 120, 180]
})
# Sum of sales
result = pd.crosstab(
data['Store'],
data['Product'],
values=data['Sales'],
aggfunc='sum'
)
print(result)Output:
Product Apple Banana
Store
LA 120 180
NYC 100 150
Multiple Row or Column Categories
code.py
data = pd.DataFrame({
'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
'Age': ['Young', 'Young', 'Old', 'Old', 'Young', 'Old'],
'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC', 'NYC']
})
# Two categories in rows
result = pd.crosstab(
[data['Gender'], data['Age']],
data['City']
)
print(result)Output:
City LA NYC
Gender Age
F Old 1 1
Young 0 1
M Old 1 0
Young 0 2
Practice Example
code.py
import pandas as pd
# Survey responses
survey = pd.DataFrame({
'Department': ['Sales', 'IT', 'Sales', 'IT', 'HR', 'HR'],
'Satisfaction': ['Happy', 'Happy', 'Neutral', 'Unhappy', 'Happy', 'Happy'],
'Years': [2, 5, 3, 1, 4, 2]
})
print("Survey Data:")
print(survey)
# Count by Department and Satisfaction
counts = pd.crosstab(
survey['Department'],
survey['Satisfaction'],
margins=True
)
print("\nSatisfaction by Department:")
print(counts)
# Percentage by department
pct = pd.crosstab(
survey['Department'],
survey['Satisfaction'],
normalize='index'
) * 100
print("\nPercentage by Department:")
print(pct.round(1))Crosstab vs Pivot Table
| Feature | Crosstab | Pivot Table |
|---|---|---|
| Default | Count | Mean |
| Input | Series | DataFrame |
| Best for | Counting categories | Numeric summaries |
code.py
# Crosstab - counts automatically
pd.crosstab(df['A'], df['B'])
# Pivot table - needs aggfunc='count'
pd.pivot_table(df, index='A', columns='B', aggfunc='count')Key Points
pd.crosstab(row, col)counts combinationsmargins=Trueadds totalsnormalize='index'shows row percentagesnormalize='columns'shows column percentagesnormalize='all'shows total percentages- Add
valuesandaggfuncfor calculations other than count
Common Mistakes
Mistake 1: Expecting percentages by default
code.py
# This gives counts, not percentages
pd.crosstab(df['A'], df['B'])
# For percentages
pd.crosstab(df['A'], df['B'], normalize=True)Mistake 2: Forgetting normalize options
code.py
# Row percentages (each row adds to 100%)
normalize='index'
# Column percentages (each column adds to 100%)
normalize='columns'
# Total percentages (whole table adds to 100%)
normalize='all'What's Next?
You learned crosstab. Next, you'll learn Melt and Stack - reshaping data between wide and long formats.