5 min read min read
Data Type Conversions
Learn to convert between different data types in pandas
Data Type Conversions
Why Convert Types?
Sometimes data comes in wrong format:
- Numbers stored as text: "100" instead of 100
- Dates stored as text: "2024-01-15" instead of date
- Categories stored as text: takes more memory
Check Current Data Types
code.py
import pandas as pd
df = pd.DataFrame({
'Price': ['100', '200', '150'],
'Quantity': [5, 10, 8],
'Date': ['2024-01-01', '2024-01-02', '2024-01-03']
})
print(df.dtypes)Output:
Price object <- text (should be number!)
Quantity int64 <- number (good)
Date object <- text (should be date!)
Convert Text to Number
code.py
# Convert Price from text to number
df['Price'] = pd.to_numeric(df['Price'])
print(df.dtypes)Now Price is int64 (number).
Handle Bad Data in Conversion
code.py
df = pd.DataFrame({
'Price': ['100', '200', 'unknown', '150']
})
# This will error because 'unknown' can't be a number
# df['Price'] = pd.to_numeric(df['Price'])
# Use errors='coerce' to make bad values NaN
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
print(df)Output:
Price
0 100.0
1 200.0
2 NaN <- 'unknown' became NaN
3 150.0
Convert to Text (String)
code.py
df = pd.DataFrame({
'ID': [1, 2, 3]
})
df['ID'] = df['ID'].astype(str)
print(df.dtypes)ID is now object (text).
Convert Text to Date
code.py
df = pd.DataFrame({
'Date': ['2024-01-15', '2024-02-20', '2024-03-10']
})
df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)
print(df)Output:
Date datetime64[ns]
Date
0 2024-01-15
1 2024-02-20
2 2024-03-10
Convert to Category
Categories use less memory for repeated text.
code.py
df = pd.DataFrame({
'Status': ['Active', 'Inactive', 'Active', 'Active', 'Inactive']
})
print("Before:", df['Status'].memory_usage())
df['Status'] = df['Status'].astype('category')
print("After:", df['Status'].memory_usage())Uses less memory after conversion!
Common Conversions
| From | To | Method |
|---|---|---|
| Text → Number | pd.to_numeric() | |
| Text → Date | pd.to_datetime() | |
| Any → Text | .astype(str) | |
| Any → Integer | .astype(int) | |
| Any → Float | .astype(float) | |
| Text → Category | .astype('category') |
Convert Multiple Columns
code.py
df = df.astype({
'Price': float,
'Quantity': int,
'Status': 'category'
})Key Points
- df.dtypes shows all column types
- pd.to_numeric() converts to number
- pd.to_datetime() converts to date
- .astype() converts to any type
- errors='coerce' turns bad values to NaN
- Categories save memory for repeated text
What's Next?
Learn to clean and work with text data using string methods.