Adding and Removing Columns
Learn to add new columns and remove existing ones from DataFrames
Adding and Removing Columns
Adding Single Column
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike'],
'Salary': [50000, 60000, 55000]
})
df['Department'] = 'Sales'
print(df)Output:
Name Salary Department
0 John 50000 Sales
1 Sarah 60000 Sales
2 Mike 55000 Sales
All rows get same value.
Adding from List
df['Age'] = [25, 30, 28]
print(df)List must match number of rows!
Adding from Calculation
df['Bonus'] = df['Salary'] * 0.1
print(df)Creates new column from existing column.
Adding from Multiple Columns
df['Total'] = df['Salary'] + df['Bonus']
print(df)Adding with Conditions
import numpy as np
df['Level'] = np.where(df['Salary'] > 55000, 'Senior', 'Junior')
print(df)Adding with apply()
def calculate_tax(salary):
return salary * 0.2
df['Tax'] = df['Salary'].apply(calculate_tax)
print(df)Using lambda:
df['Tax'] = df['Salary'].apply(lambda x: x * 0.2)Adding Multiple Columns
df[['Bonus', 'Tax']] = df['Salary'] * 0.1, df['Salary'] * 0.2
print(df)Or separately:
df['Bonus'] = df['Salary'] * 0.1
df['Tax'] = df['Salary'] * 0.2Insert at Specific Position
df.insert(1, 'ID', [101, 102, 103])
print(df)Inserts ID column at position 1 (after first column).
Removing Single Column
df_new = df.drop('Age', axis=1)
print(df_new)Original df unchanged.
Remove permanently:
df.drop('Age', axis=1, inplace=True)Removing Multiple Columns
df_new = df.drop(['Age', 'Bonus'], axis=1)
print(df_new)Delete with del
del df['Tax']
print(df)Modifies DataFrame immediately!
Pop Column
Remove and return column.
bonus_column = df.pop('Bonus')
print("Bonus column:", bonus_column)
print("DataFrame now:", df)Column removed from df.
Practice Example
The scenario: Build employee database with calculated columns.
import pandas as pd
import numpy as np
employees = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
'Base_Salary': [50000, 65000, 55000, 70000, 60000],
'Years': [3, 7, 4, 9, 5]
})
print("Initial data:")
print(employees)
print()
print("1. Add Department:")
employees['Department'] = ['Sales', 'IT', 'Sales', 'HR', 'IT']
print(employees)
print()
print("2. Add Employee ID at start:")
employees.insert(0, 'ID', range(101, 106))
print(employees)
print()
print("3. Calculate bonus (10% of base):")
employees['Bonus'] = employees['Base_Salary'] * 0.1
print(employees)
print()
print("4. Calculate tax (20% of base):")
employees['Tax'] = employees['Base_Salary'] * 0.2
print(employees)
print()
print("5. Add experience level:")
employees['Level'] = np.where(
employees['Years'] >= 7, 'Senior',
np.where(employees['Years'] >= 4, 'Mid', 'Junior')
)
print(employees)
print()
print("6. Calculate total compensation:")
employees['Total_Comp'] = employees['Base_Salary'] + employees['Bonus']
print(employees)
print()
print("7. Add performance multiplier:")
def get_multiplier(row):
if row['Level'] == 'Senior':
return 1.5
elif row['Level'] == 'Mid':
return 1.2
else:
return 1.0
employees['Multiplier'] = employees.apply(get_multiplier, axis=1)
print(employees)
print()
print("8. Remove Tax column:")
employees = employees.drop('Tax', axis=1)
print(employees)
print()
print("Final summary:")
print("Columns:", employees.columns.tolist())
print("Shape:", employees.shape)
print("Total compensation:", employees['Total_Comp'].sum())Adding Empty Column
df['Notes'] = None
print(df)Or with NaN:
df['Comments'] = np.nanAdding from Series
new_data = pd.Series([100, 200, 300])
df['Values'] = new_dataConditional Column Addition
if 'Bonus' not in df.columns:
df['Bonus'] = 0Adding with assign()
Creates copy with new column.
df_new = df.assign(
Bonus=df['Salary'] * 0.1,
Tax=df['Salary'] * 0.2
)
print(df_new)Original df unchanged.
Chain Multiple Operations
df_result = (df
.assign(Bonus=df['Salary'] * 0.1)
.assign(Tax=df['Salary'] * 0.2)
.assign(Net=lambda x: x['Salary'] - x['Tax'])
)Removing Columns by Pattern
cols_to_drop = [col for col in df.columns if 'temp' in col]
df = df.drop(cols_to_drop, axis=1)Keep Only Specific Columns
df = df[['Name', 'Salary', 'Age']]Drops all other columns.
Reorder Columns
df = df[['ID', 'Name', 'Age', 'Salary']]Add Prefix to Columns
df = df.add_prefix('emp_')
print(df.columns.tolist())Output: ['emp_Name', 'emp_Salary', 'emp_Age']
Add Suffix to Columns
df = df.add_suffix('_2024')
print(df.columns.tolist())Copy Column
df['Salary_Backup'] = df['Salary']Replace Column
df['Salary'] = df['Salary'] * 1.1Overwrites existing column.
Key Points to Remember
Add column with df['NewCol'] = values. Simple and direct.
Remove column with drop('Col', axis=1). Use inplace=True to modify original.
del df['Col'] removes immediately without creating copy.
insert(position, name, values) adds column at specific position.
assign() creates new DataFrame with added columns. Original unchanged.
List length must match number of rows when adding from list.
Common Mistakes
Mistake 1: Wrong list length
df['Age'] = [25, 30] # Error if df has 3 rows!
# Check: len(df) must equal len([25, 30])Mistake 2: Forgetting axis
df.drop('Age') # Error!
df.drop('Age', axis=1) # CorrectMistake 3: Not assigning result
df.drop('Age', axis=1) # Doesn't change df!
df = df.drop('Age', axis=1) # Correct
# OR
df.drop('Age', axis=1, inplace=True)Mistake 4: Using del on filtered DataFrame
subset = df[df['Age'] > 25]
del subset['Name'] # May affect original!
subset = df[df['Age'] > 25].copy() # SafeMistake 5: Column name typo
df.drop('Sallary', axis=1) # Error if column is 'Salary'
print(df.columns.tolist()) # Check names firstWhat's Next?
You now know how to add and remove columns. Next, you'll learn about renaming columns - changing column names to better ones.