Working with CSV Files
Learn to read and write CSV files for data analysis
Working with CSV Files
What is a CSV File?
CSV stands for Comma Separated Values. It's a simple text file where data is organized in rows and columns, with commas separating each value.
Think of it like a spreadsheet saved as plain text. Each line is a row, and commas separate the columns.
Why CSV files are popular:
- Work with any program (Excel, Google Sheets, Python)
- Small file size
- Easy to read and edit
- Perfect for data analysis
- Most common format for datasets
Example CSV content:
name,age,city
John,25,New York
Sarah,30,London
Mike,28,Tokyo
The csv Module
Python has a built-in csv module for working with CSV files.
import csvWhat this does: Imports Python's csv module so you can read and write CSV files easily.
Reading CSV Files
Basic Reading
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row)What this does:
- Opens data.csv file
- Creates a csv reader
- Loops through each row
- Each row is a list of values
Example output:
['name', 'age', 'city']
['John', '25', 'New York']
['Sarah', '30', 'London']
Skipping Header Row
Often the first row has column names. You can skip it.
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file)
next(reader)
for row in reader:
print(row)What next() does: Skips the first row (header) and starts reading from the second row.
Accessing Individual Values
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file)
next(reader)
for row in reader:
name = row[0]
age = row[1]
city = row[2]
print("Name:", name, "Age:", age, "City:", city)What this does: Gets each value by position: row[0] is first column, row[1] is second, etc.
Reading CSV with DictReader
DictReader is easier to use. It treats each row as a dictionary with column names as keys.
import csv
with open("data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
print(row["name"], row["age"], row["city"])Why this is better: You use column names instead of positions. Much clearer and less error-prone.
Each row is a dictionary:
for row in reader:
print(row)Shows: {'name': 'John', 'age': '25', 'city': 'New York'}
Converting to List
Store all data in a list for later use.
import csv
data = []
with open("data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
data.append(row)
print("Total rows:", len(data))
print("First person:", data[0]["name"])What this does: Reads all rows into a list, then you can access data anytime.
Writing CSV Files
Basic Writing
import csv
data = [
["name", "age", "city"],
["John", "25", "New York"],
["Sarah", "30", "London"]
]
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
for row in data:
writer.writerow(row)What this creates: A CSV file with 3 rows (including header).
Important: Use newline="" to avoid extra blank lines.
Writing Header and Data Separately
import csv
header = ["name", "age", "city"]
data = [
["John", "25", "New York"],
["Sarah", "30", "London"]
]
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerow(header)
for row in data:
writer.writerow(row)What this does: Writes header first, then writes each data row.
Writing Multiple Rows at Once
import csv
data = [
["name", "age", "city"],
["John", "25", "New York"],
["Sarah", "30", "London"],
["Mike", "28", "Tokyo"]
]
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerows(data)What writerows() does: Writes all rows at once. Faster than looping.
Writing with DictWriter
DictWriter lets you write dictionaries instead of lists.
import csv
data = [
{"name": "John", "age": "25", "city": "New York"},
{"name": "Sarah", "age": "30", "city": "London"},
{"name": "Mike", "age": "28", "city": "Tokyo"}
]
with open("output.csv", "w", newline="") as file:
fieldnames = ["name", "age", "city"]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)What this does:
- fieldnames defines column order
- writeheader() writes column names
- Each dictionary becomes a row
Handling Different Delimiters
Some files use semicolons, tabs, or other separators instead of commas.
Reading Tab-Separated Files
import csv
with open("data.tsv", "r") as file:
reader = csv.reader(file, delimiter="\t")
for row in reader:
print(row)What delimiter does: Tells Python what character separates values. \t is tab.
Reading Semicolon-Separated Files
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file, delimiter=";")
for row in reader:
print(row)Practice Example
The scenario: You have a CSV file with student grades. You want to read it, calculate averages, and create a new CSV with results.
Input file (students.csv):
name,math,science,english
John,85,90,88
Sarah,92,88,95
Mike,78,85,80
Python program:
import csv
students = []
with open("students.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["name"]
math = int(row["math"])
science = int(row["science"])
english = int(row["english"])
average = (math + science + english) / 3
students.append({
"name": name,
"math": math,
"science": science,
"english": english,
"average": round(average, 2)
})
print("Processed", len(students), "students")
with open("results.csv", "w", newline="") as file:
fieldnames = ["name", "math", "science", "english", "average"]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for student in students:
writer.writerow(student)
print("Results saved to results.csv")What this program does:
- Reads students.csv using DictReader
- Converts grade strings to numbers
- Calculates average for each student
- Stores all data with averages
- Creates new CSV file with results
- Includes all original data plus average
Output file (results.csv):
name,math,science,english,average
John,85,90,88,87.67
Sarah,92,88,95,91.67
Mike,78,85,80,81.0
Appending to CSV Files
Add new rows to existing CSV without deleting old data.
import csv
new_student = ["Emma", "95", "92", "90"]
with open("students.csv", "a", newline="") as file:
writer = csv.writer(file)
writer.writerow(new_student)What this does: Opens file in append mode ("a") and adds new row at the end.
Handling Missing Data
Real data often has empty values. Handle them carefully.
import csv
with open("data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["name"]
age = row["age"] if row["age"] else "Unknown"
city = row["city"] if row["city"] else "Not specified"
print(name, age, city)What this does: If value is empty, uses default text instead.
Key Points to Remember
CSV files store data as text with commas separating values. Python's csv module makes reading and writing easy.
Use csv.reader() for simple reading, csv.DictReader() to access columns by name. DictReader is easier for most tasks.
Use csv.writer() for simple writing, csv.DictWriter() to write dictionaries. Always use newline="" when opening files for writing.
DictReader automatically uses first row as column names. You can access values like row["column_name"].
Values from CSV are always strings. Convert to int() or float() if you need numbers for calculations.
Common Mistakes
Mistake 1: Forgetting newline parameter
with open("data.csv", "w") as file: # Extra blank lines!Fix:
with open("data.csv", "w", newline="") as file:Mistake 2: Not converting strings to numbers
total = row["price"] + row["tax"] # Concatenates strings!
total = int(row["price"]) + int(row["tax"]) # CorrectMistake 3: Wrong file mode
with open("data.csv", "w") as file: # Erases existing data!Use "a" to append, "w" only for new files.
Mistake 4: Accessing wrong column
print(row["Name"]) # KeyError if column is "name"Column names are case-sensitive.
What's Next?
You now know how to work with CSV files. Next, you'll learn about reading Excel files - working with .xlsx files that have multiple sheets and formatting.