Python for Finance: Getting Started
Python has become the dominant language for financial analysis. Banks, hedge funds, and fintech companies use it for everything from pricing derivatives to building trading algorithms. This tutorial teaches you the foundations — setting up your environment, understanding financial data structures, and performing your first analysis.
Setting Up Your Environment
Before analyzing financial data, you need the right tools. The Python finance ecosystem relies on a few key libraries.
Install them with pip:
pip install pandas numpy matplotlib yfinance
Here’s what each library does:
- pandas — handles tabular data and time series
- numpy — numerical computing foundation
- matplotlib — creates charts and visualizations
- yfinance — downloads free stock data from Yahoo Finance
Create a file called analysis.py and import these libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
print("Libraries imported successfully")
Run this to verify everything works. If you see the success message, you’re ready to proceed.
Understanding Financial Data Structures
Financial data comes in time series — observations recorded at specific dates. Pandas provides two structures that handle this perfectly: Series and DataFrame.
A Series is a single column of data with an index of dates:
prices = pd.Series(
[100, 102, 101, 105],
index=pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"])
)
print(prices)
# 2024-01-01 100
# 2024-01-02 102
# 2024-01-03 101
# 2024-01-04 105
# dtype: int64
A DataFrame is multiple columns side by side — like a spreadsheet:
data = {
"open": [100, 101, 100, 103],
"high": [102, 104, 103, 107],
"low": [99, 100, 99, 102],
"close": [102, 101, 105, 106]
}
df = pd.DataFrame(data, index=pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"]))
print(df)
# open high low close
# 2024-01-01 100 102 99 102
# 2024-01-02 101 104 100 101
# 2024-01-03 100 103 99 105
# 2024-01-04 103 107 102 106
This OHLC (open, high, low, close) format is the standard for daily stock data. You’ll use it constantly in financial analysis.
Fetching Real Stock Data
Now let’s grab real data. The yfinance library provides free access to Yahoo Finance data.
# Download Apple stock data
aapl = yf.download("AAPL", start="2024-01-01", end="2024-12-31")
print(aapl.head())
print(f"\nShape: {aapl.shape}")
This downloads daily OHLC data for Apple throughout 2024. The DataFrame contains columns: Open, High, Low, Close, Adj Close, and Volume.
Let’s examine the data:
# Get just the closing prices
closes = aapl["Close"]
print(closes.head())
# Date
# 2024-01-02 185.64
# 2024-01-03 185.56
# 2024-01-04 185.83
# Name: Close, dtype: float64
# Calculate basic statistics
print(f"Mean: ${closes.mean():.2f}")
print(f"Min: ${closes.min():.2f}")
print(f"Max: ${closes.max():.2f}")
Calculating Returns
Returns measure how much a stock’s price changed over time. They’re the foundation of financial analysis.
Simple returns calculate the percentage change from one period to the next:
# Calculate daily returns
daily_returns = closes.pct_change()
print(daily_returns.head())
# Date
# 2024-01-02 NaN
# 2024-01-03 -0.000431
# 2024-01-04 0.001455
# Name: Close, dtype: float64
The first value is NaN because there’s no previous day to compare against.
Cumulative returns show the total return from the start of the period:
cumulative_returns = (1 + daily_returns).cumprod() - 1
print(f"Total return: {cumulative_returns.iloc[-1]:.2%}")
This tells you the percentage gain or loss over the entire period.
Visualizing Stock Performance
Matplotlib lets you create meaningful visualizations:
plt.figure(figsize=(12, 5))
# Plot closing prices
plt.subplot(1, 2, 1)
closes.plot(title="AAPL Closing Prices")
plt.ylabel("Price ($)")
# Plot daily returns distribution
plt.subplot(1, 2, 2)
daily_returns.dropna().hist(bins=50)
plt.title("Daily Returns Distribution")
plt.xlabel("Return")
plt.ylabel("Frequency")
plt.tight_layout()
plt.savefig("aapl_analysis.png")
plt.show()
The histogram shows how returns are distributed — most days have small moves, with occasional larger jumps.
Comparing Multiple Stocks
You can download and compare multiple stocks at once:
tickers = ["AAPL", "GOOGL", "MSFT"]
data = yf.download(tickers, start="2024-01-01", end="2024-12-31")["Close"]
# Calculate normalized returns (starting at 100)
normalized = (data / data.iloc[0]) * 100
normalized.plot(title="Normalized Price Performance (Base=100)")
This shows how $100 invested in each stock would have performed relative to each other.
Next Steps
You now have the foundation for financial analysis in Python. The next tutorial in this series covers fetching specific types of data with yfinance and handling common data issues.
From here, you can explore:
- Calculating volatility and risk metrics
- Building a simple portfolio analyzer
- Backtesting trading strategies
The skills you’ve learned — loading data, calculating returns, and visualizing results — apply to every type of financial analysis you’ll do in Python.
Written
- File: sites/pyguides/src/content/tutorials/finance-getting-started.md
- Words: ~850
- Read time: 12 min
- Topics covered: Environment setup, pandas Series/DataFrame, yfinance data fetching, calculating returns, visualizing stock data
- Verified via: Python docs, yfinance documentation
- Unverified items: none