Property-Based Testing with Hypothesis

April 30, 2026 · 6 min read ·Updated May 20, 2026 ·intermediate

pythontestinghypothesisproperty-based-testingpytestdebuggingci

Example-based tests are everywhere. You pick a few inputs, check the outputs, call it done. But example-based testing only ever checks the cases you thought to try. Property-based testing flips this: you specify properties that should hold true for any valid input, and a library generates hundreds of examples to find the ones that break your code.

Hypothesis is the standard property-based testing library for Python. It integrates with pytest, runs your test function thousands of times with generated inputs, and gives you a minimised counterexample when it finds a failure.

Install Hypothesis

pip install hypothesis pytest

Or with uv:

uv add hypothesis pytest

Your First Property Test

The classic example: testing that reversing a list twice gives you back the original list. That’s true for any list — not just the one example you might pick.

from hypothesis import given, settings
import hypothesis.strategies as st

@given(st.lists(st.integers()))
@settings(max_examples=500)
def test_reverse_twice_gives_original(input_list):
    reversed_once = list(reversed(input_list))
    reversed_twice = list(reversed(reversed_once))
    assert reversed_twice == input_list

Run it:

$ pytest test_example.py -v
- 500 examples passed in 0.42s -

Hypothesis generated 500 different lists — of varying lengths, with positive and negative integers — and verified the property holds for all of them. If it finds a failure, Hypothesis automatically shrinks the failing example to the smallest input that still triggers the bug.

How Hypothesis Works

When you decorate a test with @given(...), Hypothesis:

Generates random values matching the strategy you specified
Runs your test function with those values
If the test fails, runs the failing example through a shrinking phase to find the smallest counterexample
Saves the failure in a database so it’s reported on future runs

The shrinking phase is what makes property-based testing practical. When your test fails with a 500-element list, Hypothesis doesn’t just hand you that massive input — it shrinks it systematically to show you something like [0, -1] that still triggers the bug.

Writing Good Property Tests

A property is something that should be true of your function for all inputs. Some patterns:

1. Round-trip Operations

@given(st.binary())
@settings(max_examples=500)
def test_compress_decompress_round_trip(data):
    compressed = zlib.compress(data)
    decompressed = zlib.decompress(compressed)
    assert decompressed == data

2. Inverse Operations

@given(st.lists(st.integers(min_value=1, max_value=1000)))
@settings(max_examples=500)
def test_sort_inverse(input_list):
    sorted_list = sorted(input_list)
    assert sorted_list == input_list  # idempotent
    assert sorted_list == sorted(sorted_list)  # already sorted

3. Structural Properties

@given(st.text(min_codepoint=97, max_codepoint=122))  # lowercase a-z
@settings(max_examples=300)
def test_lowercase_strip(input_str):
    stripped = input_str.strip()
    # stripped string has no leading/trailing whitespace
    assert stripped == stripped.strip()
    # stripping twice is the same as stripping once
    assert stripped == stripped.strip().strip()

4. Symmetry

@given(st.floats(allow_nan=False, allow_infinity=False))
@settings(max_examples=500)
def test_abs_symmetry(x):
    assert abs(x) == abs(-x)

@given(st.text())
@settings(max_examples=300)
def test_string_concat_symmetry(a, b):
    assert a + b != b + a or a == b  # non-commutative unless strings equal

Strategy Composition

Strategies compose to build complex generators:

from hypothesis import given, settings
import hypothesis.strategies as st

# Emails: local part + @ + domain
@given(
    local=st.text(min_codepoint=97, max_codepoint=122, min_size=1, max_size=20),
    domain=st.text(min_codepoint=97, max_codepoint=122, min_size=3, max_size=30)
)
@settings(max_examples=500)
def test_email_format(local, domain):
    email = f"{local}@{domain}.com"
    assert "@" in email
    assert email.count("@") == 1
    assert not email.startswith("@")

Built-in strategies:

Strategy	Generates
`st.integers()`	Any integer
`st.integers(min_value=x, max_value=y)`	Integers in range
`st.floats()`	Any float
`st.text()`	Unicode strings
`st.binary()`	Byte strings
`st.lists(element)`	Lists of element
`st.dictionaries(key, value)`	Dicts
`st.one_of(a, b)`	Choice between strategies
`st.fractions()`	Fractions between -1 and 1
`st.complex_numbers()`	Complex numbers

Testing a Sorting Function

Here’s a more complete example — testing a custom sort implementation:

from hypothesis import given, settings, assume
import hypothesis.strategies as st

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = [x for x in arr[1:] if x < pivot]
    right = [x for x in arr[1:] if x >= pivot]
    return quicksort(left) + [pivot] + quicksort(right)

@given(st.lists(st.integers()))
@settings(max_examples=1000)
def test_quicksort_properties(input_list):
    result = quicksort(input_list)
    
    # Property 1: output is sorted
    assert all(result[i] <= result[i+1] for i in range(len(result)-1)), "not sorted"
    
    # Property 2: same length as input
    assert len(result) == len(input_list)
    
    # Property 3: same elements as input (multiset equality)
    assert sorted(result) == sorted(input_list)

@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=500)
def test_quicksort_min_is_first(input_list):
    sorted_list = quicksort(input_list)
    assert sorted_list[0] == min(input_list)

@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=500)
def test_quicksort_max_is_last(input_list):
    sorted_list = quicksort(input_list)
    assert sorted_list[-1] == max(input_list)

Combining with Example-Based Tests

Property-based tests are complementary to example-based tests. Use example-based tests to document expected behaviour for specific cases, and property-based tests to catch edge cases you didn’t think of:

import pytest
from hypothesis import given, settings
import hypothesis.strategies as st

# Example-based: document known cases
class TestSort:
    def test_empty_list(self):
        assert sort([]) == []
    
    def test_single_element(self):
        assert sort([1]) == [1]
    
    def test_already_sorted(self):
        assert sort([1, 2, 3]) == [1, 2, 3]
    
    def test_reverse_sorted(self):
        assert sort([3, 2, 1]) == [1, 2, 3]

# Property-based: catch everything else
@given(st.lists(st.integers()))
@settings(max_examples=1000)
def test_sort_properties(input_list):
    result = sort(input_list)
    assert len(result) == len(input_list)
    assert result == sorted(result)

Debugging Failing Property Tests

When a property test fails, Hypothesis saves the failing example to a database. The next time you run the test, it starts with that example first — useful for reproducing the bug.

To see the failure report:

from hypothesis import given, settings
import hypothesis.strategies as st

@given(st.lists(st.integers(), min_size=1))
@settings(max_examples=100, print_blob=True)
def test_divide_by_length(items):
    # Bug: division by zero when list has one element
    result = sum(items) / (len(items) - 1)
    assert result >= 0

Output:

Falsifying example: items=[0]
  File "...", line 15, in test_divide_by_length
    result = sum(items) / (len(items) - 1)
ZeroDivisionError: division by zero

Hypothesis shrank [0] — the minimal failing case — from whatever huge random list it first found. You now have a clear bug: when items has one element, len(items) - 1 == 0.

Conditional Skipping with assume()

Use assume() to filter out inputs that don’t apply to your test:

@given(st.lists(st.integers(min_value=1)))  # positive only
@settings(max_examples=500)
def test_division_by_n(nonzero_list):
    assume(len(nonzero_list) > 0)  # filter via strategy, not assume
    
    result = sum(nonzero_list) / len(nonzero_list)
    assert result >= min(nonzero_list)
    assert result <= max(nonzero_list)

assume() works like a precondition — Hypothesis regenerates when the condition fails, so you don’t get test failures from invalid inputs, just fewer examples tested.

Statistical Testing

Property tests are also useful for checking statistical properties:

from hypothesis import given, settings
import hypothesis.strategies as st
import random

@given(st.integers(min_value=100, max_value=10000))
@settings(max_examples=100)
def test_random_choice_distribution(n):
    counts = {}
    for _ in range(n):
        choice = random.choice(["a", "b", "c"])
        counts[choice] = counts.get(choice, 0) + 1
    
    # Each option should appear roughly 1/3 of the time
    # Allow 10% tolerance for statistical noise
    for option in ["a", "b", "c"]:
        proportion = counts.get(option, 0) / n
        assert 0.23 <= proportion <= 0.43, f"{option}: {proportion}"

Performance Tips

Use @settings(max_examples=N) to balance coverage vs. speed — 100-500 is typical for CI
Narrow your strategies with min_value, max_value, min_size, max_size to avoid trivial failures
Use assume() to filter before the property check, not after — Hypothesis won’t count discarded examples
Mark slow property tests with @settings(database=None) to skip the example database on CI if disk I/O matters