#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Distribution Analysis

Learn to understand how your data is spread

Distribution Analysis

What is Distribution?

Distribution shows how values are spread out:

  • Are most values in the middle?
  • Are they spread evenly?
  • Is there a long tail on one side?

Normal Distribution (Bell Curve)

Most common pattern:

  • Most values in the middle
  • Fewer values at extremes
  • Symmetric left and right

Example: Human heights

Check Distribution Shape

code.py
import pandas as pd

df = pd.DataFrame({
    'Age': [22, 25, 27, 28, 30, 31, 32, 35, 40, 65]
})

# Basic stats
print("Mean:", df['Age'].mean())
print("Median:", df['Age'].median())
print("Skewness:", df['Age'].skew())

Skewness Explained

code.py
skew = df['Age'].skew()
SkewnessShapeExample
~0SymmetricTest scores
> 0Right tail (most low)Income
< 0Left tail (most high)Age at retirement

Compare Mean and Median

code.py
mean = df['Age'].mean()
median = df['Age'].median()

if mean > median:
    print("Right-skewed (has high outliers)")
elif mean < median:
    print("Left-skewed (has low outliers)")
else:
    print("Symmetric")

Percentiles

code.py
# Where do values fall?
print("10th percentile:", df['Age'].quantile(0.10))
print("25th percentile:", df['Age'].quantile(0.25))
print("50th percentile:", df['Age'].quantile(0.50))  # Median
print("75th percentile:", df['Age'].quantile(0.75))
print("90th percentile:", df['Age'].quantile(0.90))

90th percentile = 90% of values are below this

Value Counts (Histogram Data)

code.py
# Count values in ranges
print(df['Age'].value_counts(bins=5).sort_index())

Output:

(21.957, 30.6] 5 (30.6, 39.2] 3 (39.2, 47.8] 1 (47.8, 56.4] 0 (56.4, 65.0] 1

Most people are 22-30 years old.

Kurtosis (Peakedness)

code.py
print("Kurtosis:", df['Age'].kurtosis())
  • High kurtosis: Sharp peak, heavy tails
  • Low kurtosis: Flat top, light tails
  • ~0: Normal bell curve

Quick Distribution Summary

code.py
def analyze_distribution(series):
    print(f"Column: {series.name}")
    print(f"Mean: {series.mean():.2f}")
    print(f"Median: {series.median():.2f}")
    print(f"Std: {series.std():.2f}")
    print(f"Skewness: {series.skew():.2f}")
    print(f"Min: {series.min()}")
    print(f"Max: {series.max()}")

    # Shape interpretation
    if abs(series.skew()) < 0.5:
        print("Shape: Approximately symmetric")
    elif series.skew() > 0:
        print("Shape: Right-skewed (tail to right)")
    else:
        print("Shape: Left-skewed (tail to left)")

analyze_distribution(df['Age'])

Common Distributions

DistributionShapeExamples
NormalBell curveHeight, test scores
Right-skewedTail rightIncome, house prices
UniformFlatDice rolls
BimodalTwo peaksMixed groups

Key Points

  • Distribution = how values spread
  • Skewness tells direction of tail
  • Mean vs Median reveals skewness
  • Percentiles show where values fall
  • Most real data is NOT perfectly normal

What's Next?

Learn to create summary reports that combine all your analysis.