Creating DataFrames
Learn different ways to create Pandas DataFrames
Creating DataFrames
From Dictionary
The most common way to create DataFrames.
import pandas as pd
data = {
'Name': ['John', 'Sarah', 'Mike'],
'Age': [25, 30, 28],
'City': ['NYC', 'LA', 'Chicago']
}
df = pd.DataFrame(data)
print(df)Output:
Name Age City
0 John 25 NYC
1 Sarah 30 LA
2 Mike 28 Chicago
How it works: Each key becomes a column name. Values become the data.
From List of Lists
import pandas as pd
data = [
['John', 25, 'NYC'],
['Sarah', 30, 'LA'],
['Mike', 28, 'Chicago']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)Important: Specify column names or they'll be numbered (0, 1, 2).
From List of Dictionaries
import pandas as pd
data = [
{'Name': 'John', 'Age': 25, 'City': 'NYC'},
{'Name': 'Sarah', 'Age': 30, 'City': 'LA'},
{'Name': 'Mike', 'Age': 28, 'City': 'Chicago'}
]
df = pd.DataFrame(data)
print(df)What this creates: Each dictionary becomes a row.
From NumPy Array
import pandas as pd
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(arr, columns=['A', 'B', 'C'])
print(df)Output:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
From CSV File
Most common real-world method.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())What head() does: Shows first 5 rows by default.
With options:
df = pd.read_csv('data.csv',
delimiter=',',
header=0,
names=['Col1', 'Col2'],
skiprows=1)From Excel File
import pandas as pd
df = pd.read_excel('data.xlsx')
print(df.head())Specific sheet:
df = pd.read_excel('data.xlsx', sheet_name='Sales')Multiple sheets:
dfs = pd.read_excel('data.xlsx', sheet_name=None)
for sheet_name, df in dfs.items():
print(sheet_name, df.shape)Empty DataFrame
import pandas as pd
df = pd.DataFrame()
print("Empty:", df.empty)Add columns later:
df['Name'] = ['John', 'Sarah']
df['Age'] = [25, 30]With Custom Index
import pandas as pd
data = {
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [999, 599, 399]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
print(df)Output:
Product Price
A Laptop 999
B Phone 599
C Tablet 399
From Series
import pandas as pd
names = pd.Series(['John', 'Sarah', 'Mike'])
ages = pd.Series([25, 30, 28])
df = pd.DataFrame({'Name': names, 'Age': ages})
print(df)From SQL Database
import pandas as pd
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM users', conn)
print(df.head())
conn.close()Or with query:
query = "SELECT name, age FROM users WHERE age > 25"
df = pd.read_sql(query, conn)From JSON
import pandas as pd
df = pd.read_json('data.json')
print(df.head())From JSON string:
json_str = '[{"Name":"John","Age":25},{"Name":"Sarah","Age":30}]'
df = pd.read_json(json_str)Practice Example
The scenario: Create product inventory DataFrame from different sources.
import pandas as pd
import numpy as np
print("Method 1: From Dictionary")
inventory = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],
'Stock': [15, 30, 20, 12],
'Price': [999, 599, 399, 299]
}
df1 = pd.DataFrame(inventory)
print(df1)
print()
print("Method 2: From List of Lists")
data = [
['Mouse', 50, 25],
['Keyboard', 40, 75],
['Webcam', 25, 89]
]
df2 = pd.DataFrame(data, columns=['Product', 'Stock', 'Price'])
print(df2)
print()
print("Combined Inventory:")
combined = pd.concat([df1, df2], ignore_index=True)
print(combined)
print()
print("Add calculated column:")
combined['Total Value'] = combined['Stock'] * combined['Price']
print(combined)
print()
print("Summary:")
print("Total items:", combined['Stock'].sum())
print("Average price:", combined['Price'].mean())
print("Inventory value:", combined['Total Value'].sum())What this demonstrates:
- Create from dictionary
- Create from list of lists
- Combine DataFrames
- Add calculated column
- Basic analysis
Specifying Data Types
import pandas as pd
data = {
'ID': [1, 2, 3],
'Name': ['A', 'B', 'C'],
'Price': [10.5, 20.3, 15.7]
}
df = pd.DataFrame(data, dtype={'ID': int, 'Price': float})
print(df.dtypes)From Clipboard
Copy data from Excel, then:
import pandas as pd
df = pd.read_clipboard()
print(df)Useful for quick testing!
Creating Date Ranges
import pandas as pd
dates = pd.date_range('2024-01-01', periods=7, freq='D')
df = pd.DataFrame({'Date': dates, 'Sales': [100, 150, 120, 180, 160, 140, 200]})
print(df)Key Points to Remember
Dictionary is the most common way to create DataFrames. Keys become column names.
pd.read_csv() and pd.read_excel() are used for loading external data files.
Always check data after loading with head(), info(), or describe().
You can combine multiple DataFrames with pd.concat().
Specify column names when creating from lists to avoid numeric column names.
Common Mistakes
Mistake 1: Inconsistent lengths
data = {'A': [1, 2], 'B': [1, 2, 3]} # Error! Different lengthsMistake 2: Forgetting column names
df = pd.DataFrame([[1, 2], [3, 4]]) # Columns will be 0, 1
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']) # BetterMistake 3: Wrong file path
df = pd.read_csv('data.csv') # FileNotFoundError if wrong pathMistake 4: Not checking data types
df = pd.read_csv('data.csv')
# Numbers might be read as strings!
print(df.dtypes) # Always checkWhat's Next?
You now know how to create DataFrames. Next, you'll learn about inspecting DataFrames - viewing structure, checking data types, and understanding your data.