Code Coverage and Mutation Testing
Code coverage tells you which lines of code ran during your tests. High coverage sounds great, but it’s easy to get a false sense of security. Mutation testing goes further — it deliberately breaks your code to check whether your tests catch the changes. This tutorial covers both: measuring coverage with Coverage.py, and using mutation testing to validate that your tests actually verify behaviour.
Install the tools
pip install coverage mutmut pytest
Measuring coverage with coverage.py
Coverage.py tracks which lines execute when you run your test suite. Install it, then run your tests through coverage instead of directly:
coverage run -m pytest
coverage report -m
The -m flag shows line numbers that were never reached:
Name Stmts Miss Cover Missing
-----------------------------------------------------
my_module.py 40 8 80% 14, 22-25, 38
-----------------------------------------------------
TOTAL 40 8 80%
Where to direct coverage
By default, coverage measures everything your program touches. Narrow it to your project source with --source:
coverage run --source=my_module -m pytest
Exclude test files from coverage measurement if you only want to track application code:
coverage run --source=my_module --omit='*/tests/*' -m pytest
Reading the report
The coverage percentage is a ratio of executed statements to total statements. A function that’s missing an early return might show 80% coverage while only exercising half its branches. Look at the Missing column to see exactly which lines were skipped.
Generating HTML reports
For visual inspection, generate an HTML report:
coverage html
Then open htmlcov/index.html in your browser. You’ll see your source code highlighted by coverage — green for executed lines, red for missed ones. This makes it easy to spot functions that need more test cases.
Mutation testing with mutmut
Mutation testing introduces deliberate bugs (“mutations”) into your source code — changing > to <, replacing + with -, swapping and for or — then runs your test suite. If the tests still pass, you have a problem: your tests don’t catch that kind of change.
Run mutation testing
mutmut run
Then see which mutations survived (were not killed by tests):
mutmut results
Surviving mutations are the ones your tests failed to detect. Each surviving mutation is a gap in your test quality.
Apply mutations and view diff
See exactly what mutmut changed:
mutmut apply 2
This applies mutation #2 so you can inspect the change manually.
Generate a HTML report
mutmut html
Open the report to see a breakdown of mutations by file and function, with survival rates per module.
A practical example
Consider a simple validator:
# my_module.py
def validate_age(age):
if age < 0:
raise ValueError("Age cannot be negative")
if age > 150:
raise ValueError("Age is unreasonably high")
return True
# tests/test_my_module.py
def test_validate_age():
assert validate_age(25) == True
Coverage looks good — both branches execute. But test_validate_age never checks boundary conditions. Run mutmut:
mutmut run -- pytest
mutmut results
Mutmut will likely kill mutations like age > 150 changed to age >= 150, because those boundary changes produce different behaviour that the tests don’t catch.
Fixing the gap
Add boundary tests:
def test_validate_age_boundaries():
with pytest.raises(ValueError):
validate_age(-1)
with pytest.raises(ValueError):
validate_age(151)
assert validate_age(0) == True
assert validate_age(150) == True
Now the boundary mutations get killed.
Configuring mutmut
Mutmut uses a pyproject.toml or setup.cfg section:
[mutmut]
paths = my_module.py
runner = pytest
You can also set a backup command to restore mutated files, and configure which mutation types to apply:
[mutmut]
no_http_backup = true
Coverage VS mutation testing
Coverage tells you what code ran. Mutation testing tells you whether your tests would notice if that code was wrong. Both matter.
High coverage with weak tests gives you a percentage that looks good but hides real gaps. Mutation testing forces your tests to prove they can distinguish correct code from broken code.
The workflow:
- Write tests
- Run coverage — find lines with zero tests
- Improve tests until coverage is high
- Run mutation testing — find mutations your tests don’t catch
- Improve tests to kill surviving mutations
Common pitfalls
Using coverage as a quality gate
100% coverage doesn’t mean 100% tested. You can execute every line with a single assertion that only checks the happy path. Mutation testing catches this.
Skipping mutation types
Mutmut supports many mutation types. If you disable too many to reduce noise, you might miss real gaps. Start with all mutations enabled and disable only the ones that produce false positives in your codebase.
Not running mutation testing in ci
Mutation testing is slow, but it catches real bugs. Even running it on a subset of mutations in CI is better than not running it at all.
See also
- /tutorials/python-testing/testing-pytest-basics/ — pytest fundamentals
- /tutorials/python-testing/testing-fixtures-parametrize/ — fixtures and parametrised tests
- /tutorials/python-testing/testing-property-based/ — property-based testing with Hypothesis