#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Join Operations

Learn to combine DataFrames using their index with join

Join Operations

What is Join?

Join is like merge, but it uses the index (row labels) to connect DataFrames.

Think of it like matching books by their library code number.

Basic Join Example

code.py
import pandas as pd

# Create DataFrames with named index
students = pd.DataFrame({
    'name': ['John', 'Sarah', 'Mike']
}, index=['S1', 'S2', 'S3'])

scores = pd.DataFrame({
    'marks': [85, 92, 78]
}, index=['S1', 'S2', 'S3'])

print("Students:")
print(students)
print("\nScores:")
print(scores)

Output:

Students: name S1 John S2 Sarah S3 Mike Scores: marks S1 85 S2 92 S3 78

Now join them:

code.py
result = students.join(scores)
print(result)

Output:

name marks S1 John 85 S2 Sarah 92 S3 Mike 78

Joined using the index (S1, S2, S3).

Types of Joins

Left Join (Default)

Keep all rows from the left DataFrame.

code.py
students = pd.DataFrame({
    'name': ['John', 'Sarah', 'Mike', 'Emma']
}, index=['S1', 'S2', 'S3', 'S4'])

scores = pd.DataFrame({
    'marks': [85, 92]
}, index=['S1', 'S2'])

result = students.join(scores, how='left')
print(result)

Output:

name marks S1 John 85.0 S2 Sarah 92.0 S3 Mike NaN S4 Emma NaN

All students kept. Missing scores are NaN.

Right Join

Keep all rows from the right DataFrame.

code.py
result = students.join(scores, how='right')
print(result)

Output:

name marks S1 John 85 S2 Sarah 92

Inner Join

Keep only matching rows.

code.py
result = students.join(scores, how='inner')
print(result)

Output:

name marks S1 John 85 S2 Sarah 92

Outer Join

Keep all rows from both DataFrames.

code.py
students = pd.DataFrame({
    'name': ['John', 'Sarah']
}, index=['S1', 'S2'])

scores = pd.DataFrame({
    'marks': [85, 78]
}, index=['S1', 'S3'])

result = students.join(scores, how='outer')
print(result)

Output:

name marks S1 John 85.0 S2 Sarah NaN S3 NaN 78.0

Join Multiple DataFrames

code.py
names = pd.DataFrame({'name': ['John', 'Sarah']}, index=['S1', 'S2'])
ages = pd.DataFrame({'age': [25, 30]}, index=['S1', 'S2'])
cities = pd.DataFrame({'city': ['NYC', 'LA']}, index=['S1', 'S2'])

# Join all three
result = names.join([ages, cities])
print(result)

Output:

name age city S1 John 25 NYC S2 Sarah 30 LA

Join with Column Name Conflicts

When both DataFrames have same column name:

code.py
df1 = pd.DataFrame({'score': [85, 92]}, index=['S1', 'S2'])
df2 = pd.DataFrame({'score': [90, 88]}, index=['S1', 'S2'])

result = df1.join(df2, lsuffix='_math', rsuffix='_english')
print(result)

Output:

score_math score_english S1 85 90 S2 92 88

lsuffix = left suffix, rsuffix = right suffix

Join on a Column Instead of Index

code.py
students = pd.DataFrame({
    'student_id': ['S1', 'S2', 'S3'],
    'name': ['John', 'Sarah', 'Mike']
})

scores = pd.DataFrame({
    'marks': [85, 92, 78]
}, index=['S1', 'S2', 'S3'])

# Set student_id as index, then join
result = students.set_index('student_id').join(scores)
print(result)

Output:

name marks student_id S1 John 85 S2 Sarah 92 S3 Mike 78

Practice Example

code.py
import pandas as pd

# Product info
products = pd.DataFrame({
    'name': ['Laptop', 'Mouse', 'Keyboard'],
    'category': ['Electronics', 'Electronics', 'Electronics']
}, index=['P1', 'P2', 'P3'])

# Product prices
prices = pd.DataFrame({
    'price': [1000, 25, 75]
}, index=['P1', 'P2', 'P3'])

# Product stock
stock = pd.DataFrame({
    'quantity': [50, 200, 100]
}, index=['P1', 'P2', 'P3'])

# Join all together
result = products.join([prices, stock])
print(result)

# Calculate total value
result['total_value'] = result['price'] * result['quantity']
print("\nWith total value:")
print(result)

Key Points

  • df1.join(df2) joins using the index
  • Default is left join (keeps all from left)
  • Use how='inner', 'outer', 'right' for other types
  • Use lsuffix and rsuffix for same column names
  • Join multiple DataFrames with a list: df1.join([df2, df3])

Join vs Merge

FeatureJoinMerge
Joins onIndex (default)Column
Syntaxdf1.join(df2)pd.merge(df1, df2)
SpeedFaster for indexBetter for columns

Common Mistakes

Mistake 1: Forgetting to set index

code.py
# If you want to join on 'id' column
# First set it as index
df1 = df1.set_index('id')
df2 = df2.set_index('id')
result = df1.join(df2)

Mistake 2: Same column names without suffix

code.py
# This will error if both have 'score' column
df1.join(df2)

# Add suffixes
df1.join(df2, lsuffix='_left', rsuffix='_right')

What's Next?

You learned concat, merge, and join. Next, you'll learn when to use each one.