#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Filling Missing Data

Learn to replace missing values with useful data

Filling Missing Data

Why Fill Instead of Drop?

Dropping removes entire rows. Filling keeps your data and replaces empty cells with something useful.

Fill with a Fixed Value

code.py
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['John', 'Sarah', None],
    'Age': [25, None, 30],
    'Score': [85, 90, None]
})

# Fill all missing with 0
filled = df.fillna(0)
print(filled)

Output:

Name Age Score 0 John 25.0 85.0 1 Sarah 0.0 90.0 2 0 30.0 0.0

Fill Different Values for Different Columns

code.py
filled = df.fillna({
    'Name': 'Unknown',
    'Age': 0,
    'Score': 50
})
print(filled)

Output:

Name Age Score 0 John 25.0 85.0 1 Sarah 0.0 90.0 2 Unknown 30.0 50.0

Fill with Average (Mean)

Best for numbers. Keeps the overall average same.

code.py
df = pd.DataFrame({
    'Score': [80, 90, None, 85, None]
})

# Fill with average
avg = df['Score'].mean()
df['Score'] = df['Score'].fillna(avg)
print(df)

Output:

Score 0 80.0 1 90.0 2 85.0 <- was missing, now average 3 85.0 4 85.0 <- was missing, now average

Fill with Middle Value (Median)

Better when data has extreme values.

code.py
df['Score'] = df['Score'].fillna(df['Score'].median())

Fill with Most Common Value (Mode)

Best for categories like "Male/Female" or "Yes/No".

code.py
df = pd.DataFrame({
    'City': ['NYC', 'LA', None, 'NYC', 'NYC']
})

# Most common city
most_common = df['City'].mode()[0]
df['City'] = df['City'].fillna(most_common)
print(df)

Output:

City 0 NYC 1 LA 2 NYC <- was missing, now most common 3 NYC 4 NYC

Fill with Previous Value (Forward Fill)

Good for time data. Uses the value before the empty cell.

code.py
df = pd.DataFrame({
    'Day': [1, 2, 3, 4, 5],
    'Temp': [20, None, None, 25, 26]
})

df['Temp'] = df['Temp'].ffill()
print(df)

Output:

Day Temp 0 1 20.0 1 2 20.0 <- copied from row 0 2 3 20.0 <- copied from row 1 3 4 25.0 4 5 26.0

Fill with Next Value (Backward Fill)

Uses the value after the empty cell.

code.py
df['Temp'] = df['Temp'].bfill()

Which Method to Use?

Data TypeBest Method
Numbers (normal)Mean
Numbers (has outliers)Median
CategoriesMode
Time seriesForward/Backward fill
UnknownFixed value like 0 or "Unknown"

Key Points

  • fillna(value) replaces all missing with one value
  • fillna({'col': value}) different values per column
  • fillna(df['col'].mean()) fills with average
  • ffill() fills with previous value
  • bfill() fills with next value

What's Next?

For time data, there's a smarter way to fill: interpolation. Learn it next!