#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Boolean Indexing and Filtering

Learn to filter and select data using boolean conditions

Boolean Indexing and Filtering

Boolean Masks

A boolean mask is an array of True/False values used to filter another array.

code.py
import numpy as np

numbers = np.array([10, 25, 30, 15, 40])
mask = numbers > 20
print("Mask:", mask)
print("Filtered:", numbers[mask])

Output:

Mask: [False True True False True] Filtered: [25 30 40]

How it works: True positions are kept, False positions are filtered out.

Direct Filtering

You don't need to create the mask separately.

code.py
import numpy as np

scores = np.array([78, 85, 92, 68, 95, 72])
high_scores = scores[scores > 80]
print("Scores above 80:", high_scores)

Output: [85 92 95]

Multiple Conditions

AND (&)

Both conditions must be true.

code.py
import numpy as np

prices = np.array([45, 32, 67, 28, 51, 39])
mid_range = prices[(prices >= 30) & (prices <= 50)]
print("Prices 30-50:", mid_range)

Output: [45 32 39]

OR (|)

At least one condition must be true.

code.py
import numpy as np

temps = np.array([72, 85, 68, 90, 75])
extreme = temps[(temps < 70) | (temps > 80)]
print("Extreme temps:", extreme)

Output: [85 68 90]

NOT (~)

Inverts the condition.

code.py
import numpy as np

numbers = np.array([1, 2, 3, 4, 5])
not_three = numbers[numbers != 3]
print("Not 3:", not_three)

not_small = numbers[~(numbers < 3)]
print("Not small:", not_small)

Output:

Not 3: [1 2 4 5] Not small: [3 4 5]

Counting Matches

code.py
import numpy as np

scores = np.array([78, 85, 92, 68, 95, 72, 88])

passing = scores >= 70
count = np.sum(passing)
print("Passing students:", count)

percentage = (count / len(scores)) * 100
print("Pass rate:", round(percentage, 1) + " percent")

Why sum works: True counts as 1, False as 0.

Finding Positions

code.py
import numpy as np

temps = np.array([72, 68, 75, 70, 73])
cold_indices = np.where(temps < 70)[0]
print("Cold day indices:", cold_indices)
print("Cold temps:", temps[cold_indices])

Output:

Cold day indices: [1] Cold temps: [68]

Conditional Replacement

Replace values that meet condition.

code.py
import numpy as np

scores = np.array([78, 65, 92, 58, 88])
scores[scores < 70] = 70
print("After curve:", scores)

Output: [78 70 92 70 88]

Use case: Apply minimum grade, cap maximum values, fix outliers.

Using np.where for Replacement

code.py
import numpy as np

scores = np.array([85, 92, 78, 95, 88])
grades = np.where(scores >= 90, "A", "B")
print(grades)

Output: ['B' 'A' 'B' 'A' 'B']

Syntax: np.where(condition, value_if_true, value_if_false)

Complex Conditions

code.py
import numpy as np

data = np.array([15, 25, 35, 45, 55, 65])

condition1 = data > 20
condition2 = data < 50
condition3 = data % 2 == 1

result = data[condition1 & condition2 & condition3]
print("Odd, between 20 and 50:", result)

Output: [25 35 45]

Filtering 2D Arrays

code.py
import numpy as np

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

above_five = matrix[matrix > 5]
print("Values > 5:", above_five)

Output: [6 7 8 9]

Note: Returns 1D array of matching values.

Filter Rows

code.py
import numpy as np

data = np.array([[10, 25], [30, 15], [20, 40]])

high_first_col = data[data[:, 0] > 15]
print("Rows where first column > 15:")
print(high_first_col)

Output:

[[30 15] [20 40]]

Practice Example

The scenario: Analyze and filter student performance data.

code.py
import numpy as np

student_scores = np.array([78, 85, 92, 68, 95, 72, 88, 76, 90, 82])

print("All scores:", student_scores)
print("Total students:", len(student_scores))
print()

passing = student_scores >= 70
print("Passing scores:", student_scores[passing])
print("Passing count:", np.sum(passing))
print("Pass rate:", round(np.mean(passing) * 100, 1) + " percent")
print()

excellent = student_scores[student_scores >= 90]
print("Excellent (90+):", excellent)
print("Count:", len(excellent))
print()

needs_help = student_scores[student_scores < 75]
print("Needs help (<75):", needs_help)
print("Count:", len(needs_help))
print()

mid_range = student_scores[(student_scores >= 75) & (student_scores < 90)]
print("Mid range (75-89):", mid_range)
print()

above_average = student_scores[student_scores > student_scores.mean()]
print("Above average:", above_average)
print("Average:", round(student_scores.mean(), 1))
print()

outliers = student_scores[(student_scores < 70) | (student_scores > 95)]
print("Outliers:", outliers)

What this analysis shows:

  1. All student scores
  2. How many passed (70+)
  3. Excellent performers (90+)
  4. Students needing help (<75)
  5. Mid-range students
  6. Above-average performers
  7. Outliers (very low or very high)

Using isin()

Check if values are in a list.

code.py
import numpy as np

grades = np.array(["A", "B", "C", "A", "D", "B", "A"])
high_grades = np.isin(grades, ["A", "B"])
print("High grades:", grades[high_grades])

Output: ['A' 'B' 'A' 'B' 'A']

Masking Invalid Data

code.py
import numpy as np

data = np.array([10, -999, 25, -999, 30])
mask = data != -999
valid_data = data[mask]
print("Valid data:", valid_data)
print("Average:", valid_data.mean())

Use case: Remove placeholder values before calculations.

Selecting Random Subset

code.py
import numpy as np

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
mask = np.random.random(len(data)) > 0.5
sample = data[mask]
print("Random sample:", sample)

What this does: Randomly selects about half the values.

Key Points to Remember

Boolean indexing uses True/False arrays to filter data. Create with comparison operators.

Combine conditions with & (and), | (or), ~ (not). Always use parentheses around conditions.

np.sum() on boolean array counts True values. np.mean() gives proportion.

np.where() finds positions or does conditional replacement.

Filtering 2D arrays returns 1D results unless you filter entire rows/columns.

Common Mistakes

Mistake 1: Using "and" instead of &

code.py
arr[(arr > 5) and (arr < 10)]  # Error!
arr[(arr > 5) & (arr < 10)]  # Correct

Mistake 2: Forgetting parentheses

code.py
arr[arr > 5 & arr < 10]  # Wrong!
arr[(arr > 5) & (arr < 10)]  # Correct

Mistake 3: Counting wrong

code.py
len(arr[arr > 5])  # Count of filtered values
np.sum(arr > 5)  # Faster way to count True values

Mistake 4: Modifying filtered copy

code.py
filtered = arr[arr > 5]
filtered[0] = 99  # Doesn't change original arr
arr[arr > 5] = 99  # This changes original

What's Next?

You now know boolean indexing and filtering. Next, you'll learn statistical functions - advanced statistics and analysis with NumPy.