Cross Tabulation

What is Cross Tabulation?

Cross tabulation counts how many times combinations appear.

Example questions it answers:

How many men vs women in each city?
How many students got A, B, C in each subject?
How many orders per day per product?

Basic Crosstab

code.py

import pandas as pd

data = pd.DataFrame({
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC']
})

print("Data:")
print(data)

# Count combinations
result = pd.crosstab(data['Gender'], data['City'])
print("\nCrosstab:")
print(result)

Output:

Data:
   Gender City
0    Male  NYC
1  Female  NYC
2    Male   LA
3  Female   LA
4    Male  NYC

Crosstab:
City    LA  NYC
Gender
Female   1    1
Male     1    2

2 males in NYC, 1 male in LA, etc.

Add Row and Column Totals

code.py

result = pd.crosstab(
    data['Gender'],
    data['City'],
    margins=True,
    margins_name='Total'
)
print(result)

Output:

City    LA  NYC  Total
Gender
Female   1    1      2
Male     1    2      3
Total    2    3      5

Show Percentages

code.py

# Percentage of total
result = pd.crosstab(
    data['Gender'],
    data['City'],
    normalize=True
) * 100
print(result.round(1))

Output:

City      LA   NYC
Gender
Female  20.0  20.0
Male    20.0  40.0

Different Percentage Types

code.py

# Percentage by row
pd.crosstab(data['Gender'], data['City'], normalize='index')

# Percentage by column
pd.crosstab(data['Gender'], data['City'], normalize='columns')

# Percentage of total
pd.crosstab(data['Gender'], data['City'], normalize='all')

Crosstab with Values

Count is default, but you can calculate other things:

code.py

data = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
    'Sales': [100, 150, 120, 180]
})

# Sum of sales
result = pd.crosstab(
    data['Store'],
    data['Product'],
    values=data['Sales'],
    aggfunc='sum'
)
print(result)

Output:

Product  Apple  Banana
Store
LA         120     180
NYC        100     150

Multiple Row or Column Categories

code.py

data = pd.DataFrame({
    'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
    'Age': ['Young', 'Young', 'Old', 'Old', 'Young', 'Old'],
    'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC', 'NYC']
})

# Two categories in rows
result = pd.crosstab(
    [data['Gender'], data['Age']],
    data['City']
)
print(result)

Output:

City        LA  NYC
Gender Age
F      Old   1    1
       Young 0    1
M      Old   1    0
       Young 0    2

Practice Example

code.py

import pandas as pd

# Survey responses
survey = pd.DataFrame({
    'Department': ['Sales', 'IT', 'Sales', 'IT', 'HR', 'HR'],
    'Satisfaction': ['Happy', 'Happy', 'Neutral', 'Unhappy', 'Happy', 'Happy'],
    'Years': [2, 5, 3, 1, 4, 2]
})

print("Survey Data:")
print(survey)

# Count by Department and Satisfaction
counts = pd.crosstab(
    survey['Department'],
    survey['Satisfaction'],
    margins=True
)
print("\nSatisfaction by Department:")
print(counts)

# Percentage by department
pct = pd.crosstab(
    survey['Department'],
    survey['Satisfaction'],
    normalize='index'
) * 100
print("\nPercentage by Department:")
print(pct.round(1))

Crosstab vs Pivot Table

Feature	Crosstab	Pivot Table
Default	Count	Mean
Input	Series	DataFrame
Best for	Counting categories	Numeric summaries

code.py

# Crosstab - counts automatically
pd.crosstab(df['A'], df['B'])

# Pivot table - needs aggfunc='count'
pd.pivot_table(df, index='A', columns='B', aggfunc='count')

Key Points

pd.crosstab(row, col) counts combinations
margins=True adds totals
normalize='index' shows row percentages
normalize='columns' shows column percentages
normalize='all' shows total percentages
Add values and aggfunc for calculations other than count

Common Mistakes

Mistake 1: Expecting percentages by default

code.py

# This gives counts, not percentages
pd.crosstab(df['A'], df['B'])

# For percentages
pd.crosstab(df['A'], df['B'], normalize=True)

Mistake 2: Forgetting normalize options

code.py

# Row percentages (each row adds to 100%)
normalize='index'

# Column percentages (each column adds to 100%)
normalize='columns'

# Total percentages (whole table adds to 100%)
normalize='all'

What's Next?

You learned crosstab. Next, you'll learn Melt and Stack - reshaping data between wide and long formats.