Inspecting DataFrames
Learn to view and understand DataFrame structure and content
Inspecting DataFrames
Viewing Data
head() - First Rows
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
'Age': [25, 30, 28, 32, 27],
'City': ['NYC', 'LA', 'Chicago', 'Miami', 'Boston']
})
print(df.head())Shows first 5 rows by default.
Custom number:
print(df.head(3))tail() - Last Rows
print(df.tail())
print(df.tail(2))Shows last 5 rows by default.
DataFrame Shape
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
print("Shape:", df.shape)
print("Rows:", df.shape[0])
print("Columns:", df.shape[1])Output:
Shape: (3, 2)
Rows: 3
Columns: 2
Column Information
columns
print("Column names:", df.columns.tolist())dtypes - Data Types
print(df.dtypes)Common types:
- int64: Integers
- float64: Decimals
- object: Strings
- bool: True/False
- datetime64: Dates
info() - Overview
df.info()Shows:
- Number of rows
- Column names
- Data types
- Non-null counts
- Memory usage
Example output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 5 non-null object
1 Age 5 non-null int64
2 City 5 non-null object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes
describe() - Statistics
print(df.describe())For numeric columns:
- count: Number of values
- mean: Average
- std: Standard deviation
- min: Minimum
- 25%: First quartile
- 50%: Median
- 75%: Third quartile
- max: Maximum
Include all columns:
print(df.describe(include='all'))Index Information
print("Index:", df.index)
print("Index start:", df.index[0])
print("Index end:", df.index[-1])Unique Values
print("Unique cities:", df['City'].unique())
print("Count unique:", df['City'].nunique())Value Counts
print(df['City'].value_counts())Shows: How many times each value appears.
Null Values
isnull()
print(df.isnull())Returns True/False for each cell.
Count nulls
print("Null per column:")
print(df.isnull().sum())
print("Total nulls:", df.isnull().sum().sum())Any nulls?
print("Has nulls:", df.isnull().values.any())Memory Usage
print("Memory:", df.memory_usage())
print("Total:", df.memory_usage(deep=True).sum(), "bytes")Sample Rows
Get random rows.
print(df.sample(3))Useful for: Quick preview of large datasets.
Practice Example
The scenario: Inspect sales dataset.
import pandas as pd
import numpy as np
sales = pd.DataFrame({
'Date': pd.date_range('2024-01-01', periods=100),
'Product': np.random.choice(['Laptop', 'Phone', 'Tablet'], 100),
'Quantity': np.random.randint(1, 10, 100),
'Price': np.random.choice([999, 599, 399], 100)
})
print("=== BASIC INFO ===")
print("Shape:", sales.shape)
print("Columns:", sales.columns.tolist())
print()
print("=== FIRST ROWS ===")
print(sales.head(3))
print()
print("=== DATA TYPES ===")
print(sales.dtypes)
print()
print("=== DETAILED INFO ===")
sales.info()
print()
print("=== STATISTICS ===")
print(sales.describe())
print()
print("=== UNIQUE VALUES ===")
print("Products:", sales['Product'].unique())
print("Product counts:")
print(sales['Product'].value_counts())
print()
print("=== NULL CHECK ===")
print("Any nulls:", sales.isnull().values.any())
print("Nulls per column:")
print(sales.isnull().sum())
print()
print("=== RANDOM SAMPLE ===")
print(sales.sample(5))Getting Specific Values
At position
value = df.iloc[0, 0]
print("First cell:", value)By label
value = df.at[0, 'Name']
print("Value:", value)Column Statistics
print("Max age:", df['Age'].max())
print("Min age:", df['Age'].min())
print("Mean age:", df['Age'].mean())
print("Sum ages:", df['Age'].sum())Checking Duplicates
print("Duplicates:", df.duplicated().sum())
print("Duplicate rows:")
print(df[df.duplicated()])Correlation
For numeric columns.
print(df.corr())Shows: How columns relate to each other (-1 to 1).
Quick Functions
print("Min:", df.min())
print("Max:", df.max())
print("Sum:", df.sum())
print("Mean:", df.mean())
print("Median:", df.median())
print("Mode:", df.mode())Key Points to Remember
head() shows first rows, tail() shows last rows. Use these to preview data.
info() gives complete overview: types, nulls, memory. Always run this first.
describe() shows statistics for numeric columns. Great for understanding data range.
shape gives (rows, columns). dtypes shows data type of each column.
isnull().sum() counts missing values per column. Critical for data quality check.
Common Mistakes
Mistake 1: Not checking data after loading
df = pd.read_csv('data.csv')
# Start analyzing without looking!Always do:
print(df.head())
df.info()Mistake 2: Assuming no nulls
df['Age'].mean() # May give wrong result if nulls existCheck first:
print(df.isnull().sum())Mistake 3: Wrong shape access
rows = df.shape # This is tuple (3, 2)
rows = df.shape[0] # Correct way to get rowsMistake 4: Ignoring data types
df['Price'].mean() # Error if Price is string!
print(df.dtypes) # Check firstWhat's Next?
You now know how to inspect DataFrames. Next, you'll learn about selecting columns - how to choose and work with specific columns from your DataFrame.