Creating DataFrames

From Dictionary

The most common way to create DataFrames.

code.py

import pandas as pd

data = {
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [25, 30, 28],
    'City': ['NYC', 'LA', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

    Name  Age     City
0   John   25      NYC
1  Sarah   30       LA
2   Mike   28  Chicago

How it works: Each key becomes a column name. Values become the data.

From List of Lists

code.py

import pandas as pd

data = [
    ['John', 25, 'NYC'],
    ['Sarah', 30, 'LA'],
    ['Mike', 28, 'Chicago']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

Important: Specify column names or they'll be numbered (0, 1, 2).

From List of Dictionaries

code.py

import pandas as pd

data = [
    {'Name': 'John', 'Age': 25, 'City': 'NYC'},
    {'Name': 'Sarah', 'Age': 30, 'City': 'LA'},
    {'Name': 'Mike', 'Age': 28, 'City': 'Chicago'}
]

df = pd.DataFrame(data)
print(df)

What this creates: Each dictionary becomes a row.

From NumPy Array

code.py

import pandas as pd
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

df = pd.DataFrame(arr, columns=['A', 'B', 'C'])
print(df)

Output:

From CSV File

Most common real-world method.

code.py

import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())

What head() does: Shows first 5 rows by default.

With options:

code.py

df = pd.read_csv('data.csv',
                 delimiter=',',
                 header=0,
                 names=['Col1', 'Col2'],
                 skiprows=1)

From Excel File

code.py

import pandas as pd

df = pd.read_excel('data.xlsx')
print(df.head())

Specific sheet:

code.py

df = pd.read_excel('data.xlsx', sheet_name='Sales')

Multiple sheets:

code.py

dfs = pd.read_excel('data.xlsx', sheet_name=None)
for sheet_name, df in dfs.items():
    print(sheet_name, df.shape)

Empty DataFrame

code.py

import pandas as pd

df = pd.DataFrame()
print("Empty:", df.empty)

Add columns later:

code.py

df['Name'] = ['John', 'Sarah']
df['Age'] = [25, 30]

With Custom Index

code.py

import pandas as pd

data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}

df = pd.DataFrame(data, index=['A', 'B', 'C'])
print(df)

Output:

   Product  Price
A   Laptop    999
B    Phone    599
C   Tablet    399

From Series

code.py

import pandas as pd

names = pd.Series(['John', 'Sarah', 'Mike'])
ages = pd.Series([25, 30, 28])

df = pd.DataFrame({'Name': names, 'Age': ages})
print(df)

From SQL Database

code.py

import pandas as pd
import sqlite3

conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM users', conn)
print(df.head())
conn.close()

Or with query:

code.py

query = "SELECT name, age FROM users WHERE age > 25"
df = pd.read_sql(query, conn)

From JSON

code.py

import pandas as pd

df = pd.read_json('data.json')
print(df.head())

From JSON string:

code.py

json_str = '[{"Name":"John","Age":25},{"Name":"Sarah","Age":30}]'
df = pd.read_json(json_str)

Practice Example

The scenario: Create product inventory DataFrame from different sources.

code.py

import pandas as pd
import numpy as np

print("Method 1: From Dictionary")
inventory = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],
    'Stock': [15, 30, 20, 12],
    'Price': [999, 599, 399, 299]
}
df1 = pd.DataFrame(inventory)
print(df1)
print()

print("Method 2: From List of Lists")
data = [
    ['Mouse', 50, 25],
    ['Keyboard', 40, 75],
    ['Webcam', 25, 89]
]
df2 = pd.DataFrame(data, columns=['Product', 'Stock', 'Price'])
print(df2)
print()

print("Combined Inventory:")
combined = pd.concat([df1, df2], ignore_index=True)
print(combined)
print()

print("Add calculated column:")
combined['Total Value'] = combined['Stock'] * combined['Price']
print(combined)
print()

print("Summary:")
print("Total items:", combined['Stock'].sum())
print("Average price:", combined['Price'].mean())
print("Inventory value:", combined['Total Value'].sum())

What this demonstrates:

Create from dictionary
Create from list of lists
Combine DataFrames
Add calculated column
Basic analysis

Specifying Data Types

code.py

import pandas as pd

data = {
    'ID': [1, 2, 3],
    'Name': ['A', 'B', 'C'],
    'Price': [10.5, 20.3, 15.7]
}

df = pd.DataFrame(data, dtype={'ID': int, 'Price': float})
print(df.dtypes)

From Clipboard

Copy data from Excel, then:

code.py

import pandas as pd

df = pd.read_clipboard()
print(df)

Useful for quick testing!

Creating Date Ranges

code.py

import pandas as pd

dates = pd.date_range('2024-01-01', periods=7, freq='D')
df = pd.DataFrame({'Date': dates, 'Sales': [100, 150, 120, 180, 160, 140, 200]})
print(df)

Key Points to Remember

Dictionary is the most common way to create DataFrames. Keys become column names.

pd.read_csv() and pd.read_excel() are used for loading external data files.

Always check data after loading with head(), info(), or describe().

You can combine multiple DataFrames with pd.concat().

Specify column names when creating from lists to avoid numeric column names.

Common Mistakes

Mistake 1: Inconsistent lengths

code.py

data = {'A': [1, 2], 'B': [1, 2, 3]}  # Error! Different lengths

Mistake 2: Forgetting column names

code.py

df = pd.DataFrame([[1, 2], [3, 4]])  # Columns will be 0, 1
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])  # Better

Mistake 3: Wrong file path

code.py

df = pd.read_csv('data.csv')  # FileNotFoundError if wrong path

Mistake 4: Not checking data types

code.py

df = pd.read_csv('data.csv')
# Numbers might be read as strings!
print(df.dtypes)  # Always check

What's Next?

You now know how to create DataFrames. Next, you'll learn about inspecting DataFrames - viewing structure, checking data types, and understanding your data.