6 min read min read
Melt and Stack
Learn to reshape data between wide and long formats
Melt and Stack
Wide vs Long Data
Wide format: Each variable is a column
Name Math English
John 85 90
Sarah 92 88
Long format: Each row is one measurement
Name Subject Score
John Math 85
John English 90
Sarah Math 92
Sarah English 88
Why Reshape Data?
- Some charts need long format
- Some analysis tools need wide format
- Different databases store data differently
Melt - Wide to Long
code.py
import pandas as pd
# Wide format
wide = pd.DataFrame({
'Name': ['John', 'Sarah'],
'Math': [85, 92],
'English': [90, 88]
})
print("Wide format:")
print(wide)
# Convert to long
long = pd.melt(wide, id_vars=['Name'], var_name='Subject', value_name='Score')
print("\nLong format:")
print(long)Output:
Wide format:
Name Math English
0 John 85 90
1 Sarah 92 88
Long format:
Name Subject Score
0 John Math 85
1 Sarah Math 92
2 John English 90
3 Sarah English 88
Melt Parameters Explained
code.py
pd.melt(
wide,
id_vars=['Name'], # Columns to keep as is
var_name='Subject', # Name for the new "variable" column
value_name='Score' # Name for the new "value" column
)Melt Multiple ID Columns
code.py
data = pd.DataFrame({
'Name': ['John', 'Sarah'],
'Grade': ['A', 'B'],
'Math': [85, 92],
'English': [90, 88]
})
long = pd.melt(
data,
id_vars=['Name', 'Grade'],
var_name='Subject',
value_name='Score'
)
print(long)Output:
Name Grade Subject Score
0 John A Math 85
1 Sarah B Math 92
2 John A English 90
3 Sarah B English 88
Pivot - Long to Wide
The opposite of melt:
code.py
# Long format
long = pd.DataFrame({
'Name': ['John', 'John', 'Sarah', 'Sarah'],
'Subject': ['Math', 'English', 'Math', 'English'],
'Score': [85, 90, 92, 88]
})
print("Long format:")
print(long)
# Convert to wide
wide = long.pivot(index='Name', columns='Subject', values='Score')
print("\nWide format:")
print(wide)Output:
Long format:
Name Subject Score
0 John Math 85
1 John English 90
2 Sarah Math 92
3 Sarah English 88
Wide format:
Subject English Math
Name
John 90 85
Sarah 88 92
Stack - Multi-level to Long
Stack turns columns into rows:
code.py
df = pd.DataFrame({
'Math': [85, 92],
'English': [90, 88]
}, index=['John', 'Sarah'])
print("Original:")
print(df)
stacked = df.stack()
print("\nStacked:")
print(stacked)Output:
Original:
Math English
John 85 90
Sarah 92 88
Stacked:
John Math 85
English 90
Sarah Math 92
English 88
dtype: int64
Unstack - Long to Wide
The opposite of stack:
code.py
unstacked = stacked.unstack()
print(unstacked)Output:
Math English
John 85 90
Sarah 92 88
When to Use Which?
| Method | Use When |
|---|---|
| melt | Wide DataFrame to long |
| pivot | Long DataFrame to wide |
| stack | Turn columns to rows (Series) |
| unstack | Turn rows to columns |
Practice Example
code.py
import pandas as pd
# Monthly sales - wide format
sales_wide = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry'],
'Jan': [100, 150, 80],
'Feb': [120, 140, 90],
'Mar': [110, 160, 100]
})
print("Wide format (good for reading):")
print(sales_wide)
# Convert to long format (good for analysis)
sales_long = pd.melt(
sales_wide,
id_vars=['Product'],
var_name='Month',
value_name='Sales'
)
print("\nLong format (good for charts):")
print(sales_long)
# Total sales per product
print("\nTotal sales per product:")
print(sales_long.groupby('Product')['Sales'].sum())
# Total sales per month
print("\nTotal sales per month:")
print(sales_long.groupby('Month')['Sales'].sum())Converting Back
code.py
# Long to wide using pivot
back_to_wide = sales_long.pivot(
index='Product',
columns='Month',
values='Sales'
)
print("Back to wide:")
print(back_to_wide)Key Points
- melt() converts wide to long format
- pivot() converts long to wide format
- stack() turns columns into rows
- unstack() turns rows into columns
- id_vars = columns to keep unchanged
- var_name = name for the new variable column
- value_name = name for the new value column
Common Mistakes
Mistake 1: Confusing melt and pivot
code.py
# Wide to long = melt
pd.melt(wide_df, id_vars=['Name'])
# Long to wide = pivot
long_df.pivot(index='Name', columns='Subject', values='Score')Mistake 2: Duplicate values in pivot
code.py
# pivot() fails if there are duplicate index/column combinations
# Use pivot_table() instead with aggfuncMistake 3: Forgetting id_vars in melt
code.py
# Without id_vars, ALL columns become rows
pd.melt(df) # Everything melts
# Keep identifier columns
pd.melt(df, id_vars=['Name']) # Name stays as columnWhat's Next?
You learned to reshape data. Next, you'll learn Apply and Map - running custom functions on your data.