Processing CSV Files in Python

· 4 min read · Updated March 14, 2026 · beginner
python csv data files

CSV (Comma-Separated Values) files are everywhere. They are the lowest common denominator for data exchange — spreadsheets, databases, and APIs all speak CSV. Python gives you multiple ways to work with them, from the simple csv module to the powerful pandas library.

The CSV Format

A CSV file is plain text with rows separated by newlines and columns separated by commas:

name,age,city
Alice,30,New York
Bob,25,San Francisco
Charlie,35,Chicago

But CSVs are trickier than they look. What if a value contains a comma? The standard solution is wrapping fields in quotes:

name,city,interest
Alice,New York,"reading, coding"
Bob,Chicago,sports

Python’s csv module handles these edge cases automatically.

Reading CSV Files with the csv Module

The csv module is part of Python’s standard library:

import csv

with open("data.csv", "r", newline="") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

Output:

['name', 'age', 'city']
['Alice', '30', 'New York']
['Bob', '25', 'San Francisco']

Always use newline="" when opening CSV files. It prevents issues with line endings across different operating systems.

Reading CSV Files into Dictionaries

Reading into dictionaries gives you named fields instead of positional indices:

import csv

with open("data.csv", "r", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"{row['name']} is {row['age']} years old")

Output:

Alice is 30 years old
Bob is 25 years old

The first row is automatically used as the header. This makes your code more readable and robust.

Writing CSV Files

Writing is just as straightforward:

import csv

data = [
    ["name", "age", "city"],
    ["Alice", 30, "New York"],
    ["Bob", 25, "San Francisco"]
]

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

For dictionary-based data:

import csv

data = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "San Francisco"}
]

fieldnames = ["name", "age", "city"]

with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

Handling Different Delimiters

CSV files do not always use commas. Tab-separated files (TSV) are common:

import csv

with open("data.tsv", "r", newline="") as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        print(row)

Other delimiters include:

  • Pipe (|)
  • Semicolon (;)
  • Single tab for TSV files

Reading Large Files

Loading a massive CSV into memory can crash your program. Process it line by line instead:

import csv

total = 0
count = 0

with open("large_file.csv", "r", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        total += int(row["amount"])
        count += 1

print(f"Average: {total / count}")

This approach uses constant memory regardless of file size.

Using Pandas for CSV Files

For analysis and data manipulation, pandas is the standard choice:

import pandas as pd

# Read CSV into a DataFrame
df = pd.read_csv("data.csv")

print(df.head())
print(df.shape)

Pandas handles millions of rows, missing values, and automatic type inference.

Selecting Columns

import pandas as pd

df = pd.read_csv("data.csv")

# Single column
names = df["name"]

# Multiple columns
subset = df[["name", "age"]]

Filtering Rows

import pandas as pd

df = pd.read_csv("data.csv")

# Filter by condition
adults = df[df["age"] >= 18]

# Multiple conditions
young_in_ny = df[(df["age"] < 30) & (df["city"] == "New York")]

Grouping and Aggregating

import pandas as pd

df = pd.read_csv("sales.csv")

# Group by city and calculate mean sales
city_stats = df.groupby("city")["sales"].mean()

# Multiple aggregations
summary = df.groupby("category").agg({
    "sales": ["sum", "mean", "count"],
    "quantity": "sum"
})

Writing to CSV

import pandas as pd

df = pd.read_csv("input.csv")

# Filter and transform
result = df[df["active"] == True]

# Write to CSV
result.to_csv("output.csv", index=False)

The index=False prevents pandas from writing row indices as a column.

Common Pitfalls

Missing newline parameter

# Wrong - causes double newlines on Windows
with open("data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerow(["a", "b"])

# Correct
with open("data.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["a", "b"])

Not handling missing values

import pandas as pd

# Default behavior: missing values become NaN
df = pd.read_csv("data.csv")

# Fill missing values
df = df.fillna(0)

# Drop rows with missing values
df = df.dropna()

Encoding issues

import pandas as pd

# Specify encoding if the file is not UTF-8
df = pd.read_csv("data.csv", encoding="latin-1")

# Common encodings: utf-8, latin-1, cp1252, iso-8859-1

Quick Reference

Taskcsv ModulePandas
Read filecsv.reader(f)pd.read_csv()
Write filecsv.writer(f)df.to_csv()
Read as dictscsv.DictReader(f)pd.read_csv() then to_dict()
Large filesLine-by-line processingchunksize parameter
FilteringManual loopsdf[df[col] > val]

See Also