#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Inspecting DataFrames

Learn to view and understand DataFrame structure and content

Inspecting DataFrames

Viewing Data

head() - First Rows

code.pyPython
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
    'Age': [25, 30, 28, 32, 27],
    'City': ['NYC', 'LA', 'Chicago', 'Miami', 'Boston']
})

print(df.head())

Shows first 5 rows by default.

Custom number:

code.pyPython
print(df.head(3))

tail() - Last Rows

code.pyPython
print(df.tail())
print(df.tail(2))

Shows last 5 rows by default.

DataFrame Shape

code.pyPython
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

print("Shape:", df.shape)
print("Rows:", df.shape[0])
print("Columns:", df.shape[1])

Output:

Shape: (3, 2) Rows: 3 Columns: 2

Column Information

columns

code.pyPython
print("Column names:", df.columns.tolist())

dtypes - Data Types

code.pyPython
print(df.dtypes)

Common types:

  • int64: Integers
  • float64: Decimals
  • object: Strings
  • bool: True/False
  • datetime64: Dates

info() - Overview

code.pyPython
df.info()

Shows:

  • Number of rows
  • Column names
  • Data types
  • Non-null counts
  • Memory usage

Example output:

<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 5 non-null object 1 Age 5 non-null int64 2 City 5 non-null object dtypes: int64(1), object(2) memory usage: 248.0+ bytes

describe() - Statistics

code.pyPython
print(df.describe())

For numeric columns:

  • count: Number of values
  • mean: Average
  • std: Standard deviation
  • min: Minimum
  • 25%: First quartile
  • 50%: Median
  • 75%: Third quartile
  • max: Maximum

Include all columns:

code.pyPython
print(df.describe(include='all'))

Index Information

code.pyPython
print("Index:", df.index)
print("Index start:", df.index[0])
print("Index end:", df.index[-1])

Unique Values

code.pyPython
print("Unique cities:", df['City'].unique())
print("Count unique:", df['City'].nunique())

Value Counts

code.pyPython
print(df['City'].value_counts())

Shows: How many times each value appears.

Null Values

isnull()

code.pyPython
print(df.isnull())

Returns True/False for each cell.

Count nulls

code.pyPython
print("Null per column:")
print(df.isnull().sum())

print("Total nulls:", df.isnull().sum().sum())

Any nulls?

code.pyPython
print("Has nulls:", df.isnull().values.any())

Memory Usage

code.pyPython
print("Memory:", df.memory_usage())
print("Total:", df.memory_usage(deep=True).sum(), "bytes")

Sample Rows

Get random rows.

code.pyPython
print(df.sample(3))

Useful for: Quick preview of large datasets.

Practice Example

The scenario: Inspect sales dataset.

code.pyPython
import pandas as pd
import numpy as np

sales = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=100),
    'Product': np.random.choice(['Laptop', 'Phone', 'Tablet'], 100),
    'Quantity': np.random.randint(1, 10, 100),
    'Price': np.random.choice([999, 599, 399], 100)
})

print("=== BASIC INFO ===")
print("Shape:", sales.shape)
print("Columns:", sales.columns.tolist())
print()

print("=== FIRST ROWS ===")
print(sales.head(3))
print()

print("=== DATA TYPES ===")
print(sales.dtypes)
print()

print("=== DETAILED INFO ===")
sales.info()
print()

print("=== STATISTICS ===")
print(sales.describe())
print()

print("=== UNIQUE VALUES ===")
print("Products:", sales['Product'].unique())
print("Product counts:")
print(sales['Product'].value_counts())
print()

print("=== NULL CHECK ===")
print("Any nulls:", sales.isnull().values.any())
print("Nulls per column:")
print(sales.isnull().sum())
print()

print("=== RANDOM SAMPLE ===")
print(sales.sample(5))

Getting Specific Values

At position

code.pyPython
value = df.iloc[0, 0]
print("First cell:", value)

By label

code.pyPython
value = df.at[0, 'Name']
print("Value:", value)

Column Statistics

code.pyPython
print("Max age:", df['Age'].max())
print("Min age:", df['Age'].min())
print("Mean age:", df['Age'].mean())
print("Sum ages:", df['Age'].sum())

Checking Duplicates

code.pyPython
print("Duplicates:", df.duplicated().sum())
print("Duplicate rows:")
print(df[df.duplicated()])

Correlation

For numeric columns.

code.pyPython
print(df.corr())

Shows: How columns relate to each other (-1 to 1).

Quick Functions

code.pyPython
print("Min:", df.min())
print("Max:", df.max())
print("Sum:", df.sum())
print("Mean:", df.mean())
print("Median:", df.median())
print("Mode:", df.mode())

Key Points to Remember

head() shows first rows, tail() shows last rows. Use these to preview data.

info() gives complete overview: types, nulls, memory. Always run this first.

describe() shows statistics for numeric columns. Great for understanding data range.

shape gives (rows, columns). dtypes shows data type of each column.

isnull().sum() counts missing values per column. Critical for data quality check.

Common Mistakes

Mistake 1: Not checking data after loading

code.pyPython
df = pd.read_csv('data.csv')
# Start analyzing without looking!

Always do:

code.pyPython
print(df.head())
df.info()

Mistake 2: Assuming no nulls

code.pyPython
df['Age'].mean()  # May give wrong result if nulls exist

Check first:

code.pyPython
print(df.isnull().sum())

Mistake 3: Wrong shape access

code.pyPython
rows = df.shape  # This is tuple (3, 2)
rows = df.shape[0]  # Correct way to get rows

Mistake 4: Ignoring data types

code.pyPython
df['Price'].mean()  # Error if Price is string!
print(df.dtypes)  # Check first

What's Next?

You now know how to inspect DataFrames. Next, you'll learn about selecting columns - how to choose and work with specific columns from your DataFrame.