Melt and Stack

Wide vs Long Data

Wide format: Each variable is a column

Name    Math  English
John    85    90
Sarah   92    88

Long format: Each row is one measurement

Name    Subject  Score
John    Math     85
John    English  90
Sarah   Math     92
Sarah   English  88

Why Reshape Data?

Some charts need long format
Some analysis tools need wide format
Different databases store data differently

Melt - Wide to Long

code.py

import pandas as pd

# Wide format
wide = pd.DataFrame({
    'Name': ['John', 'Sarah'],
    'Math': [85, 92],
    'English': [90, 88]
})
print("Wide format:")
print(wide)

# Convert to long
long = pd.melt(wide, id_vars=['Name'], var_name='Subject', value_name='Score')
print("\nLong format:")
print(long)

Output:

Wide format:
    Name  Math  English
0   John    85       90
1  Sarah    92       88

Long format:
    Name  Subject  Score
0   John     Math     85
1  Sarah     Math     92
2   John  English     90
3  Sarah  English     88

Melt Parameters Explained

code.py

pd.melt(
    wide,
    id_vars=['Name'],      # Columns to keep as is
    var_name='Subject',    # Name for the new "variable" column
    value_name='Score'     # Name for the new "value" column
)

Melt Multiple ID Columns

code.py

data = pd.DataFrame({
    'Name': ['John', 'Sarah'],
    'Grade': ['A', 'B'],
    'Math': [85, 92],
    'English': [90, 88]
})

long = pd.melt(
    data,
    id_vars=['Name', 'Grade'],
    var_name='Subject',
    value_name='Score'
)
print(long)

Output:

    Name Grade  Subject  Score
0   John     A     Math     85
1  Sarah     B     Math     92
2   John     A  English     90
3  Sarah     B  English     88

Pivot - Long to Wide

The opposite of melt:

code.py

# Long format
long = pd.DataFrame({
    'Name': ['John', 'John', 'Sarah', 'Sarah'],
    'Subject': ['Math', 'English', 'Math', 'English'],
    'Score': [85, 90, 92, 88]
})
print("Long format:")
print(long)

# Convert to wide
wide = long.pivot(index='Name', columns='Subject', values='Score')
print("\nWide format:")
print(wide)

Output:

Long format:
    Name  Subject  Score
0   John     Math     85
1   John  English     90
2  Sarah     Math     92
3  Sarah  English     88

Wide format:
Subject  English  Math
Name
John          90    85
Sarah         88    92

Stack - Multi-level to Long

Stack turns columns into rows:

code.py

df = pd.DataFrame({
    'Math': [85, 92],
    'English': [90, 88]
}, index=['John', 'Sarah'])

print("Original:")
print(df)

stacked = df.stack()
print("\nStacked:")
print(stacked)

Output:

Original:
       Math  English
John     85       90
Sarah    92       88

Stacked:
John   Math       85
       English    90
Sarah  Math       92
       English    88
dtype: int64

Unstack - Long to Wide

The opposite of stack:

code.py

unstacked = stacked.unstack()
print(unstacked)

Output:

       Math  English
John     85       90
Sarah    92       88

When to Use Which?

Method	Use When
melt	Wide DataFrame to long
pivot	Long DataFrame to wide
stack	Turn columns to rows (Series)
unstack	Turn rows to columns

Practice Example

code.py

import pandas as pd

# Monthly sales - wide format
sales_wide = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Cherry'],
    'Jan': [100, 150, 80],
    'Feb': [120, 140, 90],
    'Mar': [110, 160, 100]
})

print("Wide format (good for reading):")
print(sales_wide)

# Convert to long format (good for analysis)
sales_long = pd.melt(
    sales_wide,
    id_vars=['Product'],
    var_name='Month',
    value_name='Sales'
)
print("\nLong format (good for charts):")
print(sales_long)

# Total sales per product
print("\nTotal sales per product:")
print(sales_long.groupby('Product')['Sales'].sum())

# Total sales per month
print("\nTotal sales per month:")
print(sales_long.groupby('Month')['Sales'].sum())

Converting Back

code.py

# Long to wide using pivot
back_to_wide = sales_long.pivot(
    index='Product',
    columns='Month',
    values='Sales'
)
print("Back to wide:")
print(back_to_wide)

Key Points

melt() converts wide to long format
pivot() converts long to wide format
stack() turns columns into rows
unstack() turns rows into columns
id_vars = columns to keep unchanged
var_name = name for the new variable column
value_name = name for the new value column

Common Mistakes

Mistake 1: Confusing melt and pivot

code.py

# Wide to long = melt
pd.melt(wide_df, id_vars=['Name'])

# Long to wide = pivot
long_df.pivot(index='Name', columns='Subject', values='Score')

Mistake 2: Duplicate values in pivot

code.py

# pivot() fails if there are duplicate index/column combinations
# Use pivot_table() instead with aggfunc

Mistake 3: Forgetting id_vars in melt

code.py

# Without id_vars, ALL columns become rows
pd.melt(df)  # Everything melts

# Keep identifier columns
pd.melt(df, id_vars=['Name'])  # Name stays as column

What's Next?

You learned to reshape data. Next, you'll learn Apply and Map - running custom functions on your data.