Plotting Data with Matplotlib
Matplotlib is the cornerstone of data visualization in Python. Whether you’re analyzing trends in a dataset, presenting findings to stakeholders, or building machine learning models, visualization helps you understand and communicate your data effectively. This tutorial walks you through creating your first plots and gradually introduces more sophisticated techniques.
What is Matplotlib?
Matplotlib is a 2D plotting library that produces publication-quality figures in a variety of formats. It was created by John Hunter in 2003 to enable interactive scientific plotting, and it has since become the most widely used visualization tool in the Python ecosystem.
The library’s strength lies in its flexibility. You can create simple charts with just a few lines of code, or build complex, customized visualizations with fine-grained control over every element. Matplotlib integrates seamlessly with NumPy, making it the natural choice for visualizing NumPy arrays and pandas DataFrames.
Installing Matplotlib
Getting started with Matplotlib is straightforward. Install it using pip:
pip install matplotlib
If you’re using conda:
conda install matplotlib
Once installed, import it in your Python scripts. The conventional import uses the alias plt:
import matplotlib.pyplot as plt
import numpy as np
The pyplot module provides a MATLAB-like interface that makes it easy to create common plot types quickly.
Your First Plot
Creating a basic line plot requires only a few lines of code:
import matplotlib.pyplot as plt
import numpy as np
# Create simple data
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# Create the plot
plt.plot(x, y)
plt.show()
This creates a simple line plot showing the relationship between x and y values. The plt.show() function displays the figure in a new window.
Let’s make this more interesting with NumPy:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()
This plots a sine wave from 0 to 10. The linspace function creates 100 evenly spaced points, resulting in a smooth curve.
Customizing Your Plot
Matplotlib offers extensive customization options. Let’s build a more complete example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create figure and axis
fig, ax = plt.subplots()
# Plot with custom styling
ax.plot(x, y, color='blue', linewidth=2, linestyle='-')
# Add title and labels
ax.set_title('Sine Wave', fontsize=16, fontweight='bold')
ax.set_xlabel('X values', fontsize=12)
ax.set_ylabel('Sin(x)', fontsize=12)
# Add grid
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
The subplots() function creates a figure and axis object, giving you more control over your visualization. The tight_layout() function adjusts spacing to prevent labels from being cut off.
Different Plot Types
Matplotlib supports numerous plot types beyond simple line plots.
Scatter Plots
Scatter plots are ideal for showing the relationship between two variables:
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
np.random.seed(42)
x = np.random.randn(100)
y = 2 * x + np.random.randn(100) * 0.5
fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.6, edgecolors='w', linewidth=0.5)
ax.set_title('Scatter Plot Example')
ax.set_xlabel('X values')
ax.set_ylabel('Y values')
plt.show()
The alpha parameter controls transparency, which is useful when plotting many points.
Bar Charts
Bar charts work well for comparing categorical data:
import matplotlib.pyplot as plt
categories = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
popularity = [45, 38, 30, 25, 20]
fig, ax = plt.subplots()
bars = ax.bar(categories, popularity, color=['#3776ab', '#f7df1e', '#007396', '#00599c', '#00add8'])
ax.set_title('Programming Language Popularity')
ax.set_xlabel('Language')
ax.set_ylabel('Popularity Score')
plt.show()
Histograms
Histograms display the distribution of numerical data:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
fig, ax = plt.subplots()
ax.hist(data, bins=30, edgecolor='black', alpha=0.7)
ax.set_title('Normal Distribution')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
plt.show()
Working with Multiple Plots
You can create subplots to display multiple visualizations in one figure:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
# Top left: sine
axes[0, 0].plot(x, np.sin(x), color='blue')
axes[0, 0].set_title('Sine')
# Top right: cosine
axes[0, 1].plot(x, np.cos(x), color='red')
axes[0, 1].set_title('Cosine')
# Bottom left: tangent
axes[1, 0].plot(x, np.tan(x), color='green')
axes[1, 0].set_title('Tangent')
axes[1, 0].set_ylim(-5, 5)
# Bottom right: exponential
axes[1, 1].plot(x, np.exp(x * 0.1), color='purple')
axes[1, 1].set_title('Exponential')
plt.tight_layout()
plt.show()
The subplots() function accepts parameters for rows and columns. The figsize parameter controls the figure size in inches.
Saving Figures
Once you’ve created a plot, you can save it in various formats:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.savefig('sine_wave.png', dpi=300, bbox_inches='tight')
plt.savefig('sine_wave.pdf', bbox_inches='tight')
Common formats include PNG (raster), PDF (vector), and SVG (vector). Use higher DPI for publications and vector formats for scalability.
Styling with Seaborn
While Matplotlib gives you complete control, the Seaborn library provides a higher-level interface with attractive default styles:
pip install seaborn
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Set Seaborn style
sns.set_style("whitegrid")
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()
Seaborn automatically applies better color palettes, spacing, and fonts, making your plots look professional with minimal effort.
Conclusion
Matplotlib is an essential skill for any Python developer working with data. Start with simple line plots to understand the basics, then gradually explore scatter plots, bar charts, and subplots. The library’s extensive customization options mean you can create exactly the visualization you need, whether it’s a quick analysis chart or a publication-ready figure.
As you continue your journey, explore Matplotlib’s advanced features like animations, 3D plotting, and custom themes. Combined with pandas for data manipulation, you have a powerful toolkit for data visualization in Python.
See Also
- NumPy Array Operations — Learn NumPy fundamentals for data manipulation
- pandas DataFrames Explained — Master data analysis with pandas
- Data Cleaning with pandas — Prepare your data for visualization