What is A/B Testing?
A/B testing compares two versions (A vs B) to see which performs better.
Examples:
- Button color: Blue vs Green
- Headline: "Buy Now" vs "Get Started"
- Pricing: โน999 vs โน1,499
- Email subject line
Why it matters:
- Remove opinions, trust data
- Optimize conversion rates
- Increase revenue scientifically
When to A/B Test
โ Good use cases:
- Landing page design
- Email subject lines
- Product pricing
- Call-to-action buttons
- Checkout flow
โ Don't A/B test:
- When you have <1000 users/week
- Critical bug fixes (just fix it!)
- Unethical changes
The A/B Testing Process
Step 1: Form Hypothesis
Bad: "Let's test a green button" Good: "Green button will increase clicks by 10% because it stands out more"
Template: "Changing [X] will [increase/decrease] [metric] by [Y]% because [reason]"
Step 2: Choose Metric
Primary metric (one only!):
- Click-through rate
- Conversion rate
- Revenue per user
Secondary metrics:
- Time on page
- Bounce rate
Step 3: Calculate Sample Size
Inputs needed:
- Baseline conversion rate
- Minimum detectable effect
- Statistical power (usually 80%)
- Significance level (usually 0.05)
Example:
- Current conversion: 5%
- Want to detect: 10% increase (to 5.5%)
- Need: ~15,000 visitors per variant
Tools: Use online calculators (Optimizely, VWO, Evan's AB Test Calculator)
Step 4: Run Experiment
Duration:
- Minimum: 1-2 weeks
- Run full business cycles
- Don't stop early!
Random assignment:
- 50% see version A
- 50% see version B
Step 5: Analyze Results
Calculate:
- Conversion rate for each variant
- Statistical significance (p-value)
- Confidence interval
Decide:
- p < 0.05: Winner! (statistically significant)
- p โฅ 0.05: No clear winner (keep original)
Statistical Significance
p-value < 0.05 means:
- Less than 5% chance difference is random
- 95% confident there's a real effect
Example Results: | Variant | Visitors | Conversions | Conv. Rate | |---------|----------|-------------|------------| | A (Control) | 10,000 | 500 | 5.0% | | B (Test) | 10,000 | 570 | 5.7% |
Analysis:
- Lift: +14%
- p-value: 0.02
- Result: B wins!
Sample Size Matters
Why larger is better:
| Sample Size | Conversion A | Conversion B | Significant? | |-------------|--------------|--------------|--------------| | 100 | 5% | 10% | No (p=0.18) | | 1,000 | 5% | 7% | Yes (p=0.04) | | 10,000 | 5% | 5.5% | Yes (p=0.02) |
Lesson: Small samples miss real effects, large samples detect small effects.
Common Pitfalls
1. Peeking Problem
โ Wrong: Check results daily, stop when significant
Why it's bad: Increases false positives
โ Right: Decide duration upfront, run full period
2. Multiple Testing
โ Wrong: Test 20 variants, pick the one with p<0.05
Why it's bad: 5% false positive rate means 1 in 20 will be "significant" by chance!
โ Right: Test A vs B only, or adjust significance level
3. Ignoring External Factors
โ Wrong: Run test during festival season
Why it's bad: Seasonal effects confound results
โ Right: Run during normal periods, avoid holidays
4. Small Sample Size
โ Wrong: 100 visitors per variant
Why it's bad: Not enough power to detect real effects
โ Right: Use sample size calculator upfront
5. Testing Too Many Things
โ Wrong: Change button color AND text AND position
Why it's bad: Can't tell what caused the change
โ Right: Test one change at a time
Real Example: Email Subject Line Test
Scenario: E-commerce company wants to increase email open rates.
Hypothesis: Personalized subject line will increase opens by 15%.
Variants:
- A (Control): "New arrivals this week"
- B (Test): "[Name], check out these new arrivals!"
Setup:
- Send to 20,000 subscribers
- 10,000 get A, 10,000 get B
- Measure: Open rate
Results: | Variant | Sent | Opens | Open Rate | |---------|------|-------|-----------| | A | 10,000 | 1,500 | 15% | | B | 10,000 | 1,800 | 18% |
Analysis:
- Lift: +20%
- p-value: 0.001
- Decision: Use personalized subject lines!
Impact:
- 3% increase in open rate
- On 1M emails/month = 30,000 extra opens
- Drives significant revenue
Multivariate Testing
Test multiple elements simultaneously.
Example:
- Button color (Blue/Green)
- Button text (Buy/Get)
- = 4 combinations to test
When to use:
- High traffic sites
- Want to optimize multiple elements
- Need faster results
Downside:
- Requires much larger sample size
- More complex analysis
Bayesian A/B Testing
Alternative to traditional (frequentist) approach.
Advantages:
- Can peek at results anytime
- Shows probability of A being better than B
- More intuitive interpretation
Disadvantage:
- Requires choosing a prior distribution
Tools: VWO, Optimizely (Bayesian mode)
Tools for A/B Testing
| Tool | Best For | Cost | |------|----------|------| | Google Optimize | Websites | Free | | Optimizely | Enterprise | $$$$ | | VWO | Mid-size | $$$ | | Unbounce | Landing pages | $$ | | Mailchimp | Email testing | $ |
Python for A/B Testing
from scipy import stats
# Sample data
conversions_a = 500 # out of 10,000
conversions_b = 570 # out of 10,000
visitors_a = 10000
visitors_b = 10000
# Conversion rates
rate_a = conversions_a / visitors_a
rate_b = conversions_b / visitors_b
# Chi-square test
observed = [[conversions_a, visitors_a - conversions_a],
[conversions_b, visitors_b - conversions_b]]
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print(f"Conversion A: {rate_a:.2%}")
print(f"Conversion B: {rate_b:.2%}")
print(f"Lift: {(rate_b/rate_a - 1):.2%}")
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
print("โ
Statistically significant!")
else:
print("โ Not significant")Summary
โ A/B testing removes guesswork from decisions โ Form clear hypothesis before testing โ Calculate required sample size upfront โ Run test for full business cycle โ Don't peek early (peeking problem) โ p < 0.05 = statistically significant โ Test one change at a time โ Consider practical significance, not just statistical
Next: Capstone Project โ Put it all together! ๐