#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
Module 4
8 min read

Chi-Square Tests

Test relationships between categorical variables

What You'll Learn

  • Chi-square test of independence
  • Goodness-of-fit test
  • Expected vs observed frequencies
  • Applications

Chi-Square Test

For: Categorical data

Two types:

  1. Goodness of fit: One variable
  2. Test of independence: Two variables

Test of Independence

Test of Independence

Question: Are two categorical variables related?

Example: Is gender related to product preference?

Data format: Contingency table (cross-tab)

Steps:

  1. Calculate expected frequencies
  2. Compare to observed
  3. Calculate chi-square statistic
  4. Check p-value

Excel: =CHISQ.TEST(observed_range, expected_range) Python: scipy.stats.chi2_contingency(table)

Expected Frequencies

Formula: Expected = (Row Total × Column Total) / Grand Total

Example: Men who prefer Product A: Expected = (Total Men × Total Product A) / Total People

Chi-Square Statistic

Chi-Square Formula

Formula: χ² = Σ [(Observed - Expected)² / Expected]

Large χ²: Big difference between observed and expected = Variables are related!

P-Value Interpretation

p < 0.05: Variables ARE related (reject independence)

p ≥ 0.05: No evidence of relationship

Goodness of Fit

Goodness of Fit Test

Question: Does data match expected distribution?

Example: Do die rolls follow uniform distribution?

Use: Test if sample matches theoretical distribution

Assumptions

Requirements:

  • Expected frequency ≥ 5 in each cell
  • Independent observations
  • Random sample

Practice Exercise

Survey: Men: Like=40, Dislike=10 Women: Like=20, Dislike=30

Test if gender and preference are independent.

Next Steps

Learn about Type I & II Errors!

Tip: Chi-square for categorical data, t-test for continuous!