#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Cross Tabulation

Learn to count combinations of categories with crosstab()

Cross Tabulation

What is Cross Tabulation?

Cross tabulation counts how many times combinations appear.

Example questions it answers:

  • How many men vs women in each city?
  • How many students got A, B, C in each subject?
  • How many orders per day per product?

Basic Crosstab

code.py
import pandas as pd

data = pd.DataFrame({
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC']
})

print("Data:")
print(data)

# Count combinations
result = pd.crosstab(data['Gender'], data['City'])
print("\nCrosstab:")
print(result)

Output:

Data: Gender City 0 Male NYC 1 Female NYC 2 Male LA 3 Female LA 4 Male NYC Crosstab: City LA NYC Gender Female 1 1 Male 1 2

2 males in NYC, 1 male in LA, etc.

Add Row and Column Totals

code.py
result = pd.crosstab(
    data['Gender'],
    data['City'],
    margins=True,
    margins_name='Total'
)
print(result)

Output:

City LA NYC Total Gender Female 1 1 2 Male 1 2 3 Total 2 3 5

Show Percentages

code.py
# Percentage of total
result = pd.crosstab(
    data['Gender'],
    data['City'],
    normalize=True
) * 100
print(result.round(1))

Output:

City LA NYC Gender Female 20.0 20.0 Male 20.0 40.0

Different Percentage Types

code.py
# Percentage by row
pd.crosstab(data['Gender'], data['City'], normalize='index')

# Percentage by column
pd.crosstab(data['Gender'], data['City'], normalize='columns')

# Percentage of total
pd.crosstab(data['Gender'], data['City'], normalize='all')

Crosstab with Values

Count is default, but you can calculate other things:

code.py
data = pd.DataFrame({
    'Store': ['NYC', 'NYC', 'LA', 'LA'],
    'Product': ['Apple', 'Banana', 'Apple', 'Banana'],
    'Sales': [100, 150, 120, 180]
})

# Sum of sales
result = pd.crosstab(
    data['Store'],
    data['Product'],
    values=data['Sales'],
    aggfunc='sum'
)
print(result)

Output:

Product Apple Banana Store LA 120 180 NYC 100 150

Multiple Row or Column Categories

code.py
data = pd.DataFrame({
    'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
    'Age': ['Young', 'Young', 'Old', 'Old', 'Young', 'Old'],
    'City': ['NYC', 'NYC', 'LA', 'LA', 'NYC', 'NYC']
})

# Two categories in rows
result = pd.crosstab(
    [data['Gender'], data['Age']],
    data['City']
)
print(result)

Output:

City LA NYC Gender Age F Old 1 1 Young 0 1 M Old 1 0 Young 0 2

Practice Example

code.py
import pandas as pd

# Survey responses
survey = pd.DataFrame({
    'Department': ['Sales', 'IT', 'Sales', 'IT', 'HR', 'HR'],
    'Satisfaction': ['Happy', 'Happy', 'Neutral', 'Unhappy', 'Happy', 'Happy'],
    'Years': [2, 5, 3, 1, 4, 2]
})

print("Survey Data:")
print(survey)

# Count by Department and Satisfaction
counts = pd.crosstab(
    survey['Department'],
    survey['Satisfaction'],
    margins=True
)
print("\nSatisfaction by Department:")
print(counts)

# Percentage by department
pct = pd.crosstab(
    survey['Department'],
    survey['Satisfaction'],
    normalize='index'
) * 100
print("\nPercentage by Department:")
print(pct.round(1))

Crosstab vs Pivot Table

FeatureCrosstabPivot Table
DefaultCountMean
InputSeriesDataFrame
Best forCounting categoriesNumeric summaries
code.py
# Crosstab - counts automatically
pd.crosstab(df['A'], df['B'])

# Pivot table - needs aggfunc='count'
pd.pivot_table(df, index='A', columns='B', aggfunc='count')

Key Points

  • pd.crosstab(row, col) counts combinations
  • margins=True adds totals
  • normalize='index' shows row percentages
  • normalize='columns' shows column percentages
  • normalize='all' shows total percentages
  • Add values and aggfunc for calculations other than count

Common Mistakes

Mistake 1: Expecting percentages by default

code.py
# This gives counts, not percentages
pd.crosstab(df['A'], df['B'])

# For percentages
pd.crosstab(df['A'], df['B'], normalize=True)

Mistake 2: Forgetting normalize options

code.py
# Row percentages (each row adds to 100%)
normalize='index'

# Column percentages (each column adds to 100%)
normalize='columns'

# Total percentages (whole table adds to 100%)
normalize='all'

What's Next?

You learned crosstab. Next, you'll learn Melt and Stack - reshaping data between wide and long formats.