Hypothesis Testing

What is Hypothesis Testing?

A way to test claims using data.

Example: "Is this new drug effective?" or "Do customers prefer design A?"

The Two Hypotheses

Null Hypothesis (H₀)

The "nothing special" claim
What we assume is true
Example: "The drug has no effect"

Alternative Hypothesis (H₁)

What we want to prove
Example: "The drug works"

Steps of Hypothesis Testing

State H₀ and H₁
Collect data
Calculate test statistic
Find p-value
Make decision

P-value

The probability of getting our result if H₀ is true.

Low p-value (< 0.05): Reject H₀, evidence for H₁
High p-value (≥ 0.05): Can't reject H₀

code.pyPython

# If p-value < 0.05, we say result is "statistically significant"

One-Sample T-Test

Test if mean equals a specific value:

code.pyPython

from scipy import stats
import numpy as np

# Company claims average battery lasts 10 hours
# Our sample of 30 batteries:
sample = [9.8, 10.2, 9.5, 10.1, 9.9, 10.3, 9.7, 10.0, 9.6, 10.1,
          9.9, 10.0, 9.8, 10.2, 9.7, 10.1, 9.9, 9.6, 10.0, 9.8,
          10.1, 9.7, 10.0, 9.9, 10.2, 9.8, 10.1, 9.6, 10.0, 9.9]

# H₀: mean = 10
# H₁: mean ≠ 10

t_stat, p_value = stats.ttest_1samp(sample, 10)

print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")

if p_value < 0.05:
    print("Reject H₀: Mean is different from 10")
else:
    print("Cannot reject H₀: No evidence mean differs from 10")

Two-Sample T-Test

Compare means of two groups:

code.pyPython

# Test scores: Class A vs Class B
class_a = [85, 90, 78, 92, 88, 76, 95, 89, 82, 91]
class_b = [78, 82, 75, 80, 77, 83, 79, 81, 76, 84]

# H₀: means are equal
# H₁: means are different

t_stat, p_value = stats.ttest_ind(class_a, class_b)

print(f"Class A mean: {np.mean(class_a):.1f}")
print(f"Class B mean: {np.mean(class_b):.1f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Significant difference between classes")
else:
    print("No significant difference")

Paired T-Test

Compare same group before and after:

code.pyPython

# Weight before and after diet program
before = [180, 175, 190, 185, 170, 195, 180, 175, 185, 190]
after = [175, 172, 185, 180, 168, 188, 176, 170, 182, 185]

# H₀: no change (mean difference = 0)
# H₁: there is change

t_stat, p_value = stats.ttest_rel(before, after)

print(f"Average weight loss: {np.mean(before) - np.mean(after):.1f} lbs")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Diet program is effective!")

One-Tailed vs Two-Tailed

Two-tailed: Testing if different (≠) One-tailed: Testing if greater (>) or less (<)

code.pyPython

# One-tailed: Is new method better?
# Divide p-value by 2 and check direction

t_stat, p_value = stats.ttest_ind(class_a, class_b)

# For one-tailed (Class A > Class B)
if t_stat > 0 and p_value/2 < 0.05:
    print("Class A is significantly better")

Type I and Type II Errors

	H₀ True	H₀ False
Reject H₀	Type I Error (α)	Correct!
Don't Reject	Correct!	Type II Error (β)

Type I: False positive (see effect that isn't there)
Type II: False negative (miss real effect)

Significance Level (α)

Usually α = 0.05 (5%)
Lower α = harder to reject H₀
Common values: 0.01, 0.05, 0.10

code.pyPython

alpha = 0.05

if p_value < alpha:
    print("Reject H₀ at 5% significance level")

Complete Example

code.pyPython

from scipy import stats
import numpy as np

# A/B Test: Does new website design increase time on site?
# Control: old design, Treatment: new design

np.random.seed(42)
control = np.random.normal(120, 30, 100)    # seconds on site
treatment = np.random.normal(135, 35, 100)  # seconds on site

# H₀: No difference between designs
# H₁: New design increases time

print("=== A/B Test Results ===")
print(f"Control mean: {np.mean(control):.1f} seconds")
print(f"Treatment mean: {np.mean(treatment):.1f} seconds")
print(f"Difference: {np.mean(treatment) - np.mean(control):.1f} seconds")

t_stat, p_value = stats.ttest_ind(treatment, control)

print(f"\nT-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("\n✓ New design significantly increases time on site!")
else:
    print("\n✗ No significant difference found")

Key Points

H₀: Null hypothesis (no effect)
H₁: Alternative hypothesis (effect exists)
p-value < 0.05: Reject H₀
Use ttest_1samp for one sample vs value
Use ttest_ind for two independent groups
Use ttest_rel for paired/before-after
Watch out for Type I and Type II errors

What's Next?

Learn about T-tests and Chi-Square tests in detail.