Property-Based Testing with Hypothesis
Example-based tests are everywhere. You pick a few inputs, check the outputs, call it done. But example-based testing only ever checks the cases you thought to try. Property-based testing flips this: you specify properties that should hold true for any valid input, and a library generates hundreds of examples to find the ones that break your code.
Hypothesis is the standard property-based testing library for Python. It integrates with pytest, runs your test function thousands of times with generated inputs, and gives you a minimised counterexample when it finds a failure.
Install Hypothesis
pip install hypothesis pytest
Or with uv:
uv add hypothesis pytest
Your First Property Test
The classic example: testing that reversing a list twice gives you back the original list. That’s true for any list — not just the one example you might pick.
from hypothesis import given, settings
import hypothesis.strategies as st
@given(st.lists(st.integers()))
@settings(max_examples=500)
def test_reverse_twice_gives_original(input_list):
reversed_once = list(reversed(input_list))
reversed_twice = list(reversed(reversed_once))
assert reversed_twice == input_list
Run it:
$ pytest test_example.py -v
- 500 examples passed in 0.42s -
Hypothesis generated 500 different lists — of varying lengths, with positive and negative integers — and verified the property holds for all of them. If it finds a failure, Hypothesis automatically shrinks the failing example to the smallest input that still triggers the bug.
How Hypothesis Works
When you decorate a test with @given(...), Hypothesis:
- Generates random values matching the strategy you specified
- Runs your test function with those values
- If the test fails, runs the failing example through a shrinking phase to find the smallest counterexample
- Saves the failure in a database so it’s reported on future runs
The shrinking phase is what makes property-based testing practical. When your test fails with a 500-element list, Hypothesis doesn’t just hand you that massive input — it shrinks it systematically to show you something like [0, -1] that still triggers the bug.
Writing Good Property Tests
A property is something that should be true of your function for all inputs. Some patterns:
1. Round-trip Operations
@given(st.binary())
@settings(max_examples=500)
def test_compress_decompress_round_trip(data):
compressed = zlib.compress(data)
decompressed = zlib.decompress(compressed)
assert decompressed == data
2. Inverse Operations
@given(st.lists(st.integers(min_value=1, max_value=1000)))
@settings(max_examples=500)
def test_sort_inverse(input_list):
sorted_list = sorted(input_list)
assert sorted_list == input_list # idempotent
assert sorted_list == sorted(sorted_list) # already sorted
3. Structural Properties
@given(st.text(min_codepoint=97, max_codepoint=122)) # lowercase a-z
@settings(max_examples=300)
def test_lowercase_strip(input_str):
stripped = input_str.strip()
# stripped string has no leading/trailing whitespace
assert stripped == stripped.strip()
# stripping twice is the same as stripping once
assert stripped == stripped.strip().strip()
4. Symmetry
@given(st.floats(allow_nan=False, allow_infinity=False))
@settings(max_examples=500)
def test_abs_symmetry(x):
assert abs(x) == abs(-x)
@given(st.text())
@settings(max_examples=300)
def test_string_concat_symmetry(a, b):
assert a + b != b + a or a == b # non-commutative unless strings equal
Strategy Composition
Strategies compose to build complex generators:
from hypothesis import given, settings
import hypothesis.strategies as st
# Emails: local part + @ + domain
@given(
local=st.text(min_codepoint=97, max_codepoint=122, min_size=1, max_size=20),
domain=st.text(min_codepoint=97, max_codepoint=122, min_size=3, max_size=30)
)
@settings(max_examples=500)
def test_email_format(local, domain):
email = f"{local}@{domain}.com"
assert "@" in email
assert email.count("@") == 1
assert not email.startswith("@")
Built-in strategies:
| Strategy | Generates |
|---|---|
st.integers() | Any integer |
st.integers(min_value=x, max_value=y) | Integers in range |
st.floats() | Any float |
st.text() | Unicode strings |
st.binary() | Byte strings |
st.lists(element) | Lists of element |
st.dictionaries(key, value) | Dicts |
st.one_of(a, b) | Choice between strategies |
st.fractions() | Fractions between -1 and 1 |
st.complex_numbers() | Complex numbers |
Testing a Sorting Function
Here’s a more complete example — testing a custom sort implementation:
from hypothesis import given, settings, assume
import hypothesis.strategies as st
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = [x for x in arr[1:] if x < pivot]
right = [x for x in arr[1:] if x >= pivot]
return quicksort(left) + [pivot] + quicksort(right)
@given(st.lists(st.integers()))
@settings(max_examples=1000)
def test_quicksort_properties(input_list):
result = quicksort(input_list)
# Property 1: output is sorted
assert all(result[i] <= result[i+1] for i in range(len(result)-1)), "not sorted"
# Property 2: same length as input
assert len(result) == len(input_list)
# Property 3: same elements as input (multiset equality)
assert sorted(result) == sorted(input_list)
@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=500)
def test_quicksort_min_is_first(input_list):
sorted_list = quicksort(input_list)
assert sorted_list[0] == min(input_list)
@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=500)
def test_quicksort_max_is_last(input_list):
sorted_list = quicksort(input_list)
assert sorted_list[-1] == max(input_list)
Combining with Example-Based Tests
Property-based tests are complementary to example-based tests. Use example-based tests to document expected behaviour for specific cases, and property-based tests to catch edge cases you didn’t think of:
import pytest
from hypothesis import given, settings
import hypothesis.strategies as st
# Example-based: document known cases
class TestSort:
def test_empty_list(self):
assert sort([]) == []
def test_single_element(self):
assert sort([1]) == [1]
def test_already_sorted(self):
assert sort([1, 2, 3]) == [1, 2, 3]
def test_reverse_sorted(self):
assert sort([3, 2, 1]) == [1, 2, 3]
# Property-based: catch everything else
@given(st.lists(st.integers()))
@settings(max_examples=1000)
def test_sort_properties(input_list):
result = sort(input_list)
assert len(result) == len(input_list)
assert result == sorted(result)
Debugging Failing Property Tests
When a property test fails, Hypothesis saves the failing example to a database. The next time you run the test, it starts with that example first — useful for reproducing the bug.
To see the failure report:
from hypothesis import given, settings
import hypothesis.strategies as st
@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=100, print_blob=True)
def test_divide_by_length(items):
# Bug: division by zero when list has one element
result = sum(items) / (len(items) - 1)
assert result >= 0
Output:
Falsifying example: items=[0]
File "...", line 15, in test_divide_by_length
result = sum(items) / (len(items) - 1)
ZeroDivisionError: division by zero
Hypothesis shrank [0] — the minimal failing case — from whatever huge random list it first found. You now have a clear bug: when items has one element, len(items) - 1 == 0.
Conditional Skipping with assume()
Use assume() to filter out inputs that don’t apply to your test:
@given(st.lists(st.integers(min_value=1))) # positive only
@settings(max_examples=500)
def test_division_by_n(nonzero_list):
assume(len(nonzero_list) > 0) # filter via strategy, not assume
result = sum(nonzero_list) / len(nonzero_list)
assert result >= min(nonzero_list)
assert result <= max(nonzero_list)
assume() works like a precondition — Hypothesis regenerates when the condition fails, so you don’t get test failures from invalid inputs, just fewer examples tested.
Statistical Testing
Property tests are also useful for checking statistical properties:
from hypothesis import given, settings
import hypothesis.strategies as st
import random
@given(st.integers(min_value=100, max_value=10000))
@settings(max_examples=100)
def test_random_choice_distribution(n):
counts = {}
for _ in range(n):
choice = random.choice(["a", "b", "c"])
counts[choice] = counts.get(choice, 0) + 1
# Each option should appear roughly 1/3 of the time
# Allow 10% tolerance for statistical noise
for option in ["a", "b", "c"]:
proportion = counts.get(option, 0) / n
assert 0.23 <= proportion <= 0.43, f"{option}: {proportion}"
Performance Tips
- Use
@settings(max_examples=N)to balance coverage vs. speed — 100-500 is typical for CI - Narrow your strategies with
min_value,max_value,min_size,max_sizeto avoid trivial failures - Use
assume()to filter before the property check, not after — Hypothesis won’t count discarded examples - Mark slow property tests with
@settings(database=None)to skip the example database on CI if disk I/O matters
See Also
- /guides/pytest-basics/ — pytest fundamentals before moving to property-based testing
- /tutorials/intermediate-python/debugging-with-pdb/ — debugging test failures with the Python debugger
- /guides/mocking-with-pytest/ — structuring tests for maintainability