5 min read min read
Hypothesis Testing
Learn to test claims with data
Hypothesis Testing
What is Hypothesis Testing?
A way to test claims using data.
Example: "Is this new drug effective?" or "Do customers prefer design A?"
The Two Hypotheses
Null Hypothesis (H₀)
- The "nothing special" claim
- What we assume is true
- Example: "The drug has no effect"
Alternative Hypothesis (H₁)
- What we want to prove
- Example: "The drug works"
Steps of Hypothesis Testing
- State H₀ and H₁
- Collect data
- Calculate test statistic
- Find p-value
- Make decision
P-value
The probability of getting our result if H₀ is true.
- Low p-value (< 0.05): Reject H₀, evidence for H₁
- High p-value (≥ 0.05): Can't reject H₀
code.py
# If p-value < 0.05, we say result is "statistically significant"One-Sample T-Test
Test if mean equals a specific value:
code.py
from scipy import stats
import numpy as np
# Company claims average battery lasts 10 hours
# Our sample of 30 batteries:
sample = [9.8, 10.2, 9.5, 10.1, 9.9, 10.3, 9.7, 10.0, 9.6, 10.1,
9.9, 10.0, 9.8, 10.2, 9.7, 10.1, 9.9, 9.6, 10.0, 9.8,
10.1, 9.7, 10.0, 9.9, 10.2, 9.8, 10.1, 9.6, 10.0, 9.9]
# H₀: mean = 10
# H₁: mean ≠ 10
t_stat, p_value = stats.ttest_1samp(sample, 10)
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")
if p_value < 0.05:
print("Reject H₀: Mean is different from 10")
else:
print("Cannot reject H₀: No evidence mean differs from 10")Two-Sample T-Test
Compare means of two groups:
code.py
# Test scores: Class A vs Class B
class_a = [85, 90, 78, 92, 88, 76, 95, 89, 82, 91]
class_b = [78, 82, 75, 80, 77, 83, 79, 81, 76, 84]
# H₀: means are equal
# H₁: means are different
t_stat, p_value = stats.ttest_ind(class_a, class_b)
print(f"Class A mean: {np.mean(class_a):.1f}")
print(f"Class B mean: {np.mean(class_b):.1f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Significant difference between classes")
else:
print("No significant difference")Paired T-Test
Compare same group before and after:
code.py
# Weight before and after diet program
before = [180, 175, 190, 185, 170, 195, 180, 175, 185, 190]
after = [175, 172, 185, 180, 168, 188, 176, 170, 182, 185]
# H₀: no change (mean difference = 0)
# H₁: there is change
t_stat, p_value = stats.ttest_rel(before, after)
print(f"Average weight loss: {np.mean(before) - np.mean(after):.1f} lbs")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Diet program is effective!")One-Tailed vs Two-Tailed
Two-tailed: Testing if different (≠) One-tailed: Testing if greater (>) or less (<)
code.py
# One-tailed: Is new method better?
# Divide p-value by 2 and check direction
t_stat, p_value = stats.ttest_ind(class_a, class_b)
# For one-tailed (Class A > Class B)
if t_stat > 0 and p_value/2 < 0.05:
print("Class A is significantly better")Type I and Type II Errors
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct! |
| Don't Reject | Correct! | Type II Error (β) |
- Type I: False positive (see effect that isn't there)
- Type II: False negative (miss real effect)
Significance Level (α)
- Usually α = 0.05 (5%)
- Lower α = harder to reject H₀
- Common values: 0.01, 0.05, 0.10
code.py
alpha = 0.05
if p_value < alpha:
print("Reject H₀ at 5% significance level")Complete Example
code.py
from scipy import stats
import numpy as np
# A/B Test: Does new website design increase time on site?
# Control: old design, Treatment: new design
np.random.seed(42)
control = np.random.normal(120, 30, 100) # seconds on site
treatment = np.random.normal(135, 35, 100) # seconds on site
# H₀: No difference between designs
# H₁: New design increases time
print("=== A/B Test Results ===")
print(f"Control mean: {np.mean(control):.1f} seconds")
print(f"Treatment mean: {np.mean(treatment):.1f} seconds")
print(f"Difference: {np.mean(treatment) - np.mean(control):.1f} seconds")
t_stat, p_value = stats.ttest_ind(treatment, control)
print(f"\nT-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("\n✓ New design significantly increases time on site!")
else:
print("\n✗ No significant difference found")Key Points
- H₀: Null hypothesis (no effect)
- H₁: Alternative hypothesis (effect exists)
- p-value < 0.05: Reject H₀
- Use ttest_1samp for one sample vs value
- Use ttest_ind for two independent groups
- Use ttest_rel for paired/before-after
- Watch out for Type I and Type II errors
What's Next?
Learn about T-tests and Chi-Square tests in detail.