Working with CSV Files

What is a CSV File?

CSV stands for Comma Separated Values. It's a simple text file where data is organized in rows and columns, with commas separating each value.

Think of it like a spreadsheet saved as plain text. Each line is a row, and commas separate the columns.

Why CSV files are popular:

Work with any program (Excel, Google Sheets, Python)
Small file size
Easy to read and edit
Perfect for data analysis
Most common format for datasets

Example CSV content:

name,age,city
John,25,New York
Sarah,30,London
Mike,28,Tokyo

The csv Module

Python has a built-in csv module for working with CSV files.

code.py

import csv

What this does: Imports Python's csv module so you can read and write CSV files easily.

Reading CSV Files

Basic Reading

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file)

    for row in reader:
        print(row)

What this does:

Opens data.csv file
Creates a csv reader
Loops through each row
Each row is a list of values

Example output:

['name', 'age', 'city']
['John', '25', 'New York']
['Sarah', '30', 'London']

Skipping Header Row

Often the first row has column names. You can skip it.

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)

    for row in reader:
        print(row)

What next() does: Skips the first row (header) and starts reading from the second row.

Accessing Individual Values

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)

    for row in reader:
        name = row[0]
        age = row[1]
        city = row[2]
        print("Name:", name, "Age:", age, "City:", city)

What this does: Gets each value by position: row[0] is first column, row[1] is second, etc.

Reading CSV with DictReader

DictReader is easier to use. It treats each row as a dictionary with column names as keys.

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.DictReader(file)

    for row in reader:
        print(row["name"], row["age"], row["city"])

Why this is better: You use column names instead of positions. Much clearer and less error-prone.

Each row is a dictionary:

code.py

for row in reader:
    print(row)

Shows: {'name': 'John', 'age': '25', 'city': 'New York'}

Converting to List

Store all data in a list for later use.

code.py

import csv

data = []

with open("data.csv", "r") as file:
    reader = csv.DictReader(file)

    for row in reader:
        data.append(row)

print("Total rows:", len(data))
print("First person:", data[0]["name"])

What this does: Reads all rows into a list, then you can access data anytime.

Writing CSV Files

Basic Writing

code.py

import csv

data = [
    ["name", "age", "city"],
    ["John", "25", "New York"],
    ["Sarah", "30", "London"]
]

with open("output.csv", "w", newline="") as file:
    writer = csv.writer(file)

    for row in data:
        writer.writerow(row)

What this creates: A CSV file with 3 rows (including header).

Important: Use newline="" to avoid extra blank lines.

Writing Header and Data Separately

code.py

import csv

header = ["name", "age", "city"]
data = [
    ["John", "25", "New York"],
    ["Sarah", "30", "London"]
]

with open("output.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(header)

    for row in data:
        writer.writerow(row)

What this does: Writes header first, then writes each data row.

Writing Multiple Rows at Once

code.py

import csv

data = [
    ["name", "age", "city"],
    ["John", "25", "New York"],
    ["Sarah", "30", "London"],
    ["Mike", "28", "Tokyo"]
]

with open("output.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

What writerows() does: Writes all rows at once. Faster than looping.

Writing with DictWriter

DictWriter lets you write dictionaries instead of lists.

code.py

import csv

data = [
    {"name": "John", "age": "25", "city": "New York"},
    {"name": "Sarah", "age": "30", "city": "London"},
    {"name": "Mike", "age": "28", "city": "Tokyo"}
]

with open("output.csv", "w", newline="") as file:
    fieldnames = ["name", "age", "city"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)

    writer.writeheader()

    for row in data:
        writer.writerow(row)

What this does:

fieldnames defines column order
writeheader() writes column names
Each dictionary becomes a row

Handling Different Delimiters

Some files use semicolons, tabs, or other separators instead of commas.

Reading Tab-Separated Files

code.py

import csv

with open("data.tsv", "r") as file:
    reader = csv.reader(file, delimiter="\t")

    for row in reader:
        print(row)

What delimiter does: Tells Python what character separates values. \t is tab.

Reading Semicolon-Separated Files

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file, delimiter=";")

    for row in reader:
        print(row)

Practice Example

The scenario: You have a CSV file with student grades. You want to read it, calculate averages, and create a new CSV with results.

Input file (students.csv):

name,math,science,english
John,85,90,88
Sarah,92,88,95
Mike,78,85,80

Python program:

code.py

import csv

students = []

with open("students.csv", "r") as file:
    reader = csv.DictReader(file)

    for row in reader:
        name = row["name"]
        math = int(row["math"])
        science = int(row["science"])
        english = int(row["english"])

        average = (math + science + english) / 3

        students.append({
            "name": name,
            "math": math,
            "science": science,
            "english": english,
            "average": round(average, 2)
        })

print("Processed", len(students), "students")

with open("results.csv", "w", newline="") as file:
    fieldnames = ["name", "math", "science", "english", "average"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)

    writer.writeheader()

    for student in students:
        writer.writerow(student)

print("Results saved to results.csv")

What this program does:

Reads students.csv using DictReader
Converts grade strings to numbers
Calculates average for each student
Stores all data with averages
Creates new CSV file with results
Includes all original data plus average

Output file (results.csv):

name,math,science,english,average
John,85,90,88,87.67
Sarah,92,88,95,91.67
Mike,78,85,80,81.0

Appending to CSV Files

Add new rows to existing CSV without deleting old data.

code.py

import csv

new_student = ["Emma", "95", "92", "90"]

with open("students.csv", "a", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(new_student)

What this does: Opens file in append mode ("a") and adds new row at the end.

Handling Missing Data

Real data often has empty values. Handle them carefully.

code.py

import csv

with open("data.csv", "r") as file:
    reader = csv.DictReader(file)

    for row in reader:
        name = row["name"]
        age = row["age"] if row["age"] else "Unknown"
        city = row["city"] if row["city"] else "Not specified"

        print(name, age, city)

What this does: If value is empty, uses default text instead.

Key Points to Remember

CSV files store data as text with commas separating values. Python's csv module makes reading and writing easy.

Use csv.reader() for simple reading, csv.DictReader() to access columns by name. DictReader is easier for most tasks.

Use csv.writer() for simple writing, csv.DictWriter() to write dictionaries. Always use newline="" when opening files for writing.

DictReader automatically uses first row as column names. You can access values like row["column_name"].

Values from CSV are always strings. Convert to int() or float() if you need numbers for calculations.

Common Mistakes

Mistake 1: Forgetting newline parameter

code.py

with open("data.csv", "w") as file:  # Extra blank lines!

Fix:

code.py

with open("data.csv", "w", newline="") as file:

Mistake 2: Not converting strings to numbers

code.py

total = row["price"] + row["tax"]  # Concatenates strings!
total = int(row["price"]) + int(row["tax"])  # Correct

Mistake 3: Wrong file mode

code.py

with open("data.csv", "w") as file:  # Erases existing data!

Use "a" to append, "w" only for new files.

Mistake 4: Accessing wrong column

code.py

print(row["Name"])  # KeyError if column is "name"

Column names are case-sensitive.

What's Next?

You now know how to work with CSV files. Next, you'll learn about reading Excel files - working with .xlsx files that have multiple sheets and formatting.