#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Histograms

Learn to show how data is distributed

Histograms

What is a Histogram?

A histogram shows how data is spread out:

  • How many people are age 20-30? 30-40? 40-50?
  • How many products cost $10-20? $20-30?

It groups data into bins and counts each bin.

Basic Histogram

code.py
import matplotlib.pyplot as plt

ages = [22, 25, 27, 28, 30, 31, 33, 35, 38, 40, 42, 45, 48, 55]

fig, ax = plt.subplots()
ax.hist(ages)
ax.set_xlabel('Age')
ax.set_ylabel('Count')
ax.set_title('Age Distribution')
plt.show()

Set Number of Bins

code.py
ax.hist(ages, bins=5)   # 5 bins
ax.hist(ages, bins=10)  # 10 bins
ax.hist(ages, bins=20)  # 20 bins

More bins = more detail, but can be noisy.

Set Specific Bin Edges

code.py
# Custom ranges: 20-30, 30-40, 40-50, 50-60
ax.hist(ages, bins=[20, 30, 40, 50, 60])

Change Color

code.py
ax.hist(ages, color='green')

Add Edge Color

code.py
ax.hist(ages, color='skyblue', edgecolor='black')

Multiple Histograms

code.py
import matplotlib.pyplot as plt

men_ages = [25, 28, 30, 32, 35, 38, 40, 42]
women_ages = [22, 25, 27, 30, 32, 33, 36, 39]

fig, ax = plt.subplots()
ax.hist(men_ages, alpha=0.5, label='Men', color='blue')
ax.hist(women_ages, alpha=0.5, label='Women', color='red')

ax.legend()
plt.show()

alpha makes bars transparent so they overlap nicely.

Show Percentage Instead of Count

code.py
ax.hist(ages, density=True)

Horizontal Histogram

code.py
ax.hist(ages, orientation='horizontal')

Add Mean Line

code.py
import numpy as np

fig, ax = plt.subplots()
ax.hist(ages, color='skyblue', edgecolor='black')

# Add vertical line at mean
mean_age = np.mean(ages)
ax.axvline(mean_age, color='red', linestyle='--', label=f'Mean: {mean_age:.1f}')
ax.legend()

plt.show()

Complete Example

code.py
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(42)
salaries = np.random.normal(50000, 15000, 500)  # 500 people

fig, ax = plt.subplots(figsize=(10, 6))

# Create histogram
ax.hist(salaries, bins=20, color='steelblue', edgecolor='white')

# Add mean and median lines
mean_sal = np.mean(salaries)
median_sal = np.median(salaries)

ax.axvline(mean_sal, color='red', linestyle='--', label=f'Mean: ${mean_sal:,.0f}')
ax.axvline(median_sal, color='green', linestyle='--', label=f'Median: ${median_sal:,.0f}')

ax.set_xlabel('Salary ($)')
ax.set_ylabel('Number of Employees')
ax.set_title('Salary Distribution')
ax.legend()

plt.show()

Histogram vs Bar Chart

HistogramBar Chart
Continuous dataCategories
Shows distributionShows comparison
Bars touchBars separate

Key Points

  • hist() creates histograms
  • bins controls number of groups
  • Shows how data is distributed
  • Use alpha for overlapping histograms
  • Use density=True for percentages
  • Add axvline for mean/median markers

What's Next?

Learn box plots for showing data spread and outliers.