CSV Files to Data Analysis

March 7, 2026 · 4 min read · Updated March 7, 2026 · beginner

csv data analysis pandas beginner

CSV files are everywhere. Every spreadsheet can export to CSV. Every database can dump to CSV. Every data export comes as CSV. If you work with data, you’ll deal with CSV files constantly.

This tutorial teaches you how to read CSV files in Python, parse them into usable data, and perform basic analysis. You’ll learn two approaches: the built-in csv module for simple tasks, and pandas for powerful data analysis.

What is CSV?

CSV stands for Comma-Separated Values. It’s a plain text format where each line represents a row, and values are separated by commas. Here’s a simple example:

name,age,city
Alice,30,London
Bob,25,Manchester
Charlie,35,Bristol

The first line is typically the header row, containing column names. The CSV module in Python’s standard library handles reading and writing these files.

Reading CSV Files with the csv Module

Python’s csv module is built into the standard library. No installation needed. Start by opening the file and creating a reader object:

import csv

with open('data.csv', 'r', newline='') as file:
    reader = csv.reader(file)
    
    # Read the header first
    headers = next(reader)
    print(f"Columns: {headers}")
    
    # Read each data row
    for row in reader:
        print(row)

The newline='' parameter is important on Windows - it prevents extra blank lines between rows.

Reading into a List of Dictionaries

Working with column names is easier. Use csv.DictReader to get dictionaries instead of lists:

import csv

with open('data.csv', 'r', newline='') as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        print(f"{row['name']} is {row['age']} years old")

Each row becomes a dictionary where keys are the column headers. This makes your code more readable.

Writing CSV Files

Creating CSV files is just as straightforward:

import csv

data = [
    ['name', 'age', 'city'],
    ['Alice', '30', 'London'],
    ['Bob', '25', 'Manchester'],
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

For dictionaries, use DictWriter:

import csv

data = [
    {'name': 'Alice', 'age': 30, 'city': 'London'},
    {'name': 'Bob', 'age': 25, 'city': 'Manchester'},
]

with open('output.csv', 'w', newline='') as file:
    fieldnames = ['name', 'age', 'city']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()
    writer.writerows(data)

Introduction to Pandas for CSV Analysis

The csv module works for simple tasks. When you need real data analysis, pandas is the tool. Install it first:

pip install pandas

Pandas reads CSV files into DataFrames - essentially tables that you can query, filter, and analyze:

import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv('data.csv')

# See the first few rows
print(df.head())

# Get basic info
print(df.info())

# Calculate statistics
print(df.describe())

The describe() method gives you count, mean, std, min, max, and quartile values for numeric columns automatically.

Filtering and Selecting Data

One of pandas’ strongest features is easy filtering. Select columns by name:

import pandas as pd

df = pd.read_csv('data.csv')

# Select a single column
names = df['name']

# Select multiple columns
subset = df[['name', 'age']]

Filter rows based on conditions:

import pandas as pd

df = pd.read_csv('data.csv')

# Filter where age is greater than 30
older = df[df['age'] > 30]

# Filter by city
londoners = df[df['city'] == 'London']

# Combine conditions with & (and) | (or)
adults_in_london = df[(df['age'] >= 18) & (df['city'] == 'London')]

Sorting and Aggregating Data

Sort DataFrames by any column:

import pandas as pd

df = pd.read_csv('data.csv')

# Sort by age, ascending
by_age = df.sort_values('age')

# Sort by age, descending
by_age_desc = df.sort_values('age', ascending=False)

# Sort by multiple columns
sorted_df = df.sort_values(['city', 'age'])

Aggregate to summarize data:

import pandas as pd

df = pd.read_csv('sales.csv')

# Sum a column
total_sales = df['sales'].sum()

# Group by category and sum
by_category = df.groupby('category')['sales'].sum()

# Multiple aggregations at once
summary = df.groupby('category').agg({
    'sales': ['sum', 'mean', 'count'],
    'quantity': ['sum', 'mean']
})

A Complete Example

Let’s put this together. Here’s a script that analyzes sales data:

import pandas as pd

# Load the data
df = pd.read_csv('sales.csv')

print(f"Total records: {len(df)}")
print(f"Total sales: ${df['amount'].sum():,.2f}")
print(f"Average sale: ${df['amount'].mean():,.2f}")

# Top customers by total spending
top_customers = df.groupby('customer')['amount'].sum().sort_values(ascending=False)
print("\nTop customers:")
print(top_customers.head(10))

# Sales by month
df['month'] = pd.to_datetime(df['date']).dt.month
monthly = df.groupby('month')['amount'].sum()
print("\nMonthly sales:")
print(monthly)

This demonstrates the typical data analysis workflow: load, explore, summarize, and drill down into specific insights.

When to Use Which Tool

Use the csv module when:

You’re processing simple files with predictable structure
You need fine-grained control over parsing
You want minimal dependencies

Use pandas when:

You need to filter, sort, or aggregate data
You’re working with multiple files
You need statistical analysis
Your data has missing values or mixed types

Summary

CSV files are the universal data format. Python gives you two ways to work with them. The csv module handles simple reading and writing with no dependencies. Pandas provides powerful data analysis with filtering, sorting, grouping, and statistics.

Start with the csv module for basic tasks. Move to pandas when you need real analysis capabilities. Both tools will serve you throughout your data work.