statistics

Updated March 13, 2026 · Modules
stdlib math statistics

The statistics module provides functions for calculating mathematical statistics of numeric data. It covers measures of central location (mean, median, mode) and measures of spread (variance, standard deviation). The module works with int, float, Decimal, and Fraction types.

This module is not a competitor to NumPy or SciPy. It targets the level of graphing calculators, useful for everyday statistical calculations without adding heavy dependencies.

Syntax

import statistics

# Basic usage
statistics.mean(data)
statistics.median(data)
statistics.mode(data)

Averages and Central Location

mean()

The arithmetic mean (commonly called “average”) is the sum of data points divided by the count. It’s sensitive to outliers.

statistics.mean(data)
ParameterTypeDefaultDescription
datasequence or iterablerequiredNumeric data to calculate mean from

Returns: The arithmetic mean (type matches input: int, float, Decimal, or Fraction)

Raises: StatisticsError if data is empty

import statistics

data = [1, 2, 3, 4, 4]
result = statistics.mean(data)
print(result)  # 2.8

# Works with Decimals
from decimal import Decimal
data = [Decimal("0.5"), Decimal("0.75"), Decimal("0.625")]
print(statistics.mean(data))  # 0.625

The mean gives an unbiased estimate of the population mean when working with samples. However, it’s strongly affected by outliers. For a more robust measure, consider median().

median()

The median is the middle value when data is sorted. It’s robust against outliers and gives a better “typical” value when data contains extreme values.

statistics.median(data)
ParameterTypeDefaultDescription
datasequence or iterablerequiredNumeric data to find median from

Returns: The median value. If even number of points, returns the average of the two middle values.

Raises: StatisticsError if data is empty

import statistics

# Odd number of points - returns middle value
print(statistics.median([1, 3, 5]))  # 3

# Even number of points - returns average of middle two
print(statistics.median([1, 3, 5, 7]))  # 4.0

When you have discrete data and want the median to be an actual data point rather than interpolated, use median_low() or median_high().

mode()

The mode returns the most frequently occurring value. It’s the only statistics function that works with nominal (non-numeric) data.

statistics.mode(data)
ParameterTypeDefaultDescription
datasequence or iterablerequiredDiscrete or nominal data

Returns: The most common value

Raises: StatisticsError if data is empty

import statistics

# Most common number
print(statistics.mode([1, 1, 2, 3, 3, 3, 3, 4]))  # 3

# Works with strings (nominal data)
print(statistics.mode(["red", "blue", "blue", "red", "green", "red"]))  # 'red'

If there are multiple modes with the same frequency, mode() returns the first one encountered. Use multimode() to get all modes.

geometric_mean()

The geometric mean uses the product of values rather than their sum. It’s appropriate for data that represents rates or ratios.

statistics.geometric_mean(data)
ParameterTypeDefaultDescription
datasequence or iterablerequiredNumeric data (must be positive)

Returns: float - the geometric mean

Raises: StatisticsError if data is empty, contains zero, or negative values

import statistics

# Growth rates example
rates = [1.05, 1.10, 1.08]  # 5%, 10%, 8% growth
print(statistics.geometric_mean(rates))  # 1.0756... (approximately 7.56% average growth)

harmonic_mean()

The harmonic mean is the reciprocal of the arithmetic mean of reciprocals. It’s appropriate for averaging rates or speeds.

statistics.harmonic_mean(data)
ParameterTypeDefaultDescription
datasequence or iterablerequiredReal-valued numeric data
weightssequence or NoneNoneOptional weights for each value

Returns: float - the harmonic mean

Raises: StatisticsError if data is empty or contains negative values

import statistics

# Average speed example - car travels 10 km at 40 km/h, then 10 km at 60 km/h
speeds = [40, 60]
print(statistics.harmonic_mean(speeds))  # 48.0

# With weights - car travels 5 km at 40 km/h, then 30 km at 60 km/h
print(statistics.harmonic_mean([40, 60], weights=[5, 30]))  # 56.0

Measures of Spread

variance() and stdev()

Variance measures how far data points spread from the mean. Standard deviation is the square root of variance, returning the measure to the original units.

statistics.variance(data, xbar=None)
statistics.stdev(data, xbar=None)
ParameterTypeDefaultDescription
datasequence or iterablerequiredNumeric data
xbarfloat or NoneNoneKnown mean (optional, avoids recalculation)

Returns: float - sample variance or standard deviation

Raises: StatisticsError if data has fewer than 2 values

import statistics

data = [2, 4, 4, 4, 5, 5, 7, 9]

# Sample variance and standard deviation
print(statistics.variance(data))  # 4.571428571428571
print(statistics.stdev(data))      # 2.138...

# If you already know the mean, pass it to avoid recalculation
mean = statistics.mean(data)
print(statistics.variance(data, mean))  # Same result, but faster for large datasets

Use variance() and stdev() when working with a sample from a larger population. Use pvariance() and pstdev() when you have the entire population.

quantiles()

Divides data into intervals with equal probability. Useful for percentile calculations.

statistics.quantiles(data, n=4, method='exclusive')
ParameterTypeDefaultDescription
datasequence or iterablerequiredNumeric data
nint4Number of quantiles to produce
methodstr’exclusive''exclusive’ or ‘inclusive’

Returns: list of floats - the quantile boundaries

import statistics

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Quartiles (4 quantiles = 3 boundaries for 4 groups)
print(statistics.quantiles(data, n=4))
# [2.75, 5.5, 8.25]

# Deciles (10 quantiles)
print(statistics.quantiles(data, n=10))
# [1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]

Common Patterns

Handling missing data (NaN values)

Some statistics functions have unexpected behavior with NaN values. Strip them before processing:

import statistics
from math import isnan
from itertools import filterfalse

data = [20.7, float('nan'), 19.2, 18.3, float('nan'), 14.4]

# Clean the data
clean_data = list(filterfalse(isnan, data))
print(statistics.median(clean_data))  # 18.75

Using with different numeric types

The module preserves numeric types for most functions:

from decimal import Decimal
from fractions import Fraction
import statistics

# Decimals
data = [Decimal("1.1"), Decimal("2.2"), Decimal("3.3")]
print(statistics.mean(data))  # 2.2

# Fractions
data = [Fraction(1, 2), Fraction(3, 2), Fraction(5, 2)]
print(statistics.mean(data))  # 3/2

See Also