Adding and Removing Columns

Adding Single Column

code.pyPython

import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Salary': [50000, 60000, 55000]
})

df['Department'] = 'Sales'
print(df)

Output:

    Name  Salary Department
0   John   50000      Sales
1  Sarah   60000      Sales
2   Mike   55000      Sales

All rows get same value.

Adding from List

code.pyPython

df['Age'] = [25, 30, 28]
print(df)

List must match number of rows!

Adding from Calculation

code.pyPython

df['Bonus'] = df['Salary'] * 0.1
print(df)

Creates new column from existing column.

Adding from Multiple Columns

code.pyPython

df['Total'] = df['Salary'] + df['Bonus']
print(df)

Adding with Conditions

code.pyPython

import numpy as np

df['Level'] = np.where(df['Salary'] > 55000, 'Senior', 'Junior')
print(df)

Adding with apply()

code.pyPython

def calculate_tax(salary):
    return salary * 0.2

df['Tax'] = df['Salary'].apply(calculate_tax)
print(df)

Using lambda:

code.pyPython

df['Tax'] = df['Salary'].apply(lambda x: x * 0.2)

Adding Multiple Columns

code.pyPython

df[['Bonus', 'Tax']] = df['Salary'] * 0.1, df['Salary'] * 0.2
print(df)

Or separately:

code.pyPython

df['Bonus'] = df['Salary'] * 0.1
df['Tax'] = df['Salary'] * 0.2

Insert at Specific Position

code.pyPython

df.insert(1, 'ID', [101, 102, 103])
print(df)

Inserts ID column at position 1 (after first column).

Removing Single Column

code.pyPython

df_new = df.drop('Age', axis=1)
print(df_new)

Original df unchanged.

Remove permanently:

code.pyPython

df.drop('Age', axis=1, inplace=True)

Removing Multiple Columns

code.pyPython

df_new = df.drop(['Age', 'Bonus'], axis=1)
print(df_new)

Delete with del

code.pyPython

del df['Tax']
print(df)

Modifies DataFrame immediately!

Pop Column

Remove and return column.

code.pyPython

bonus_column = df.pop('Bonus')
print("Bonus column:", bonus_column)
print("DataFrame now:", df)

Column removed from df.

Practice Example

The scenario: Build employee database with calculated columns.

code.pyPython

import pandas as pd
import numpy as np

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
    'Base_Salary': [50000, 65000, 55000, 70000, 60000],
    'Years': [3, 7, 4, 9, 5]
})

print("Initial data:")
print(employees)
print()

print("1. Add Department:")
employees['Department'] = ['Sales', 'IT', 'Sales', 'HR', 'IT']
print(employees)
print()

print("2. Add Employee ID at start:")
employees.insert(0, 'ID', range(101, 106))
print(employees)
print()

print("3. Calculate bonus (10% of base):")
employees['Bonus'] = employees['Base_Salary'] * 0.1
print(employees)
print()

print("4. Calculate tax (20% of base):")
employees['Tax'] = employees['Base_Salary'] * 0.2
print(employees)
print()

print("5. Add experience level:")
employees['Level'] = np.where(
    employees['Years'] >= 7, 'Senior',
    np.where(employees['Years'] >= 4, 'Mid', 'Junior')
)
print(employees)
print()

print("6. Calculate total compensation:")
employees['Total_Comp'] = employees['Base_Salary'] + employees['Bonus']
print(employees)
print()

print("7. Add performance multiplier:")
def get_multiplier(row):
    if row['Level'] == 'Senior':
        return 1.5
    elif row['Level'] == 'Mid':
        return 1.2
    else:
        return 1.0

employees['Multiplier'] = employees.apply(get_multiplier, axis=1)
print(employees)
print()

print("8. Remove Tax column:")
employees = employees.drop('Tax', axis=1)
print(employees)
print()

print("Final summary:")
print("Columns:", employees.columns.tolist())
print("Shape:", employees.shape)
print("Total compensation:", employees['Total_Comp'].sum())

Adding Empty Column

code.pyPython

df['Notes'] = None
print(df)

Or with NaN:

code.pyPython

df['Comments'] = np.nan

Adding from Series

code.pyPython

new_data = pd.Series([100, 200, 300])
df['Values'] = new_data

Conditional Column Addition

code.pyPython

if 'Bonus' not in df.columns:
    df['Bonus'] = 0

Adding with assign()

Creates copy with new column.

code.pyPython

df_new = df.assign(
    Bonus=df['Salary'] * 0.1,
    Tax=df['Salary'] * 0.2
)
print(df_new)

Original df unchanged.

Chain Multiple Operations

code.pyPython

df_result = (df
    .assign(Bonus=df['Salary'] * 0.1)
    .assign(Tax=df['Salary'] * 0.2)
    .assign(Net=lambda x: x['Salary'] - x['Tax'])
)

Removing Columns by Pattern

code.pyPython

cols_to_drop = [col for col in df.columns if 'temp' in col]
df = df.drop(cols_to_drop, axis=1)

Keep Only Specific Columns

code.pyPython

df = df[['Name', 'Salary', 'Age']]

Drops all other columns.

Reorder Columns

code.pyPython

df = df[['ID', 'Name', 'Age', 'Salary']]

Add Prefix to Columns

code.pyPython

df = df.add_prefix('emp_')
print(df.columns.tolist())

Output: ['emp_Name', 'emp_Salary', 'emp_Age']

Add Suffix to Columns

code.pyPython

df = df.add_suffix('_2024')
print(df.columns.tolist())

Copy Column

code.pyPython

df['Salary_Backup'] = df['Salary']

Replace Column

code.pyPython

df['Salary'] = df['Salary'] * 1.1

Overwrites existing column.

Key Points to Remember

Add column with df['NewCol'] = values. Simple and direct.

Remove column with drop('Col', axis=1). Use inplace=True to modify original.

del df['Col'] removes immediately without creating copy.

insert(position, name, values) adds column at specific position.

assign() creates new DataFrame with added columns. Original unchanged.

List length must match number of rows when adding from list.

Common Mistakes

Mistake 1: Wrong list length

code.pyPython

df['Age'] = [25, 30]  # Error if df has 3 rows!
# Check: len(df) must equal len([25, 30])

Mistake 2: Forgetting axis

code.pyPython

df.drop('Age')  # Error!
df.drop('Age', axis=1)  # Correct

Mistake 3: Not assigning result

code.pyPython

df.drop('Age', axis=1)  # Doesn't change df!
df = df.drop('Age', axis=1)  # Correct
# OR
df.drop('Age', axis=1, inplace=True)

Mistake 4: Using del on filtered DataFrame

code.pyPython

subset = df[df['Age'] > 25]
del subset['Name']  # May affect original!

subset = df[df['Age'] > 25].copy()  # Safe

Mistake 5: Column name typo

code.pyPython

df.drop('Sallary', axis=1)  # Error if column is 'Salary'
print(df.columns.tolist())  # Check names first

What's Next?

You now know how to add and remove columns. Next, you'll learn about renaming columns - changing column names to better ones.