pyguides

Code Coverage and Mutation Testing

Code coverage tells you which lines of code ran during your tests. High coverage sounds great, but it’s easy to get a false sense of security. Mutation testing goes further — it deliberately breaks your code to check whether your tests catch the changes. This tutorial covers both: measuring coverage with Coverage.py, and using mutation testing to validate that your tests actually verify behaviour.

Install the tools

pip install coverage mutmut pytest

Measuring coverage with coverage.py

Coverage.py tracks which lines execute when you run your test suite. Install it, then run your tests through coverage instead of directly:

coverage run -m pytest
coverage report -m

The -m flag shows line numbers that were never reached:

Name                Stmts   Miss  Cover   Missing
-----------------------------------------------------
my_module.py           40      8    80%   14, 22-25, 38
-----------------------------------------------------
TOTAL                  40      8    80%

Where to direct coverage

By default, coverage measures everything your program touches. Narrow it to your project source with --source:

coverage run --source=my_module -m pytest

Exclude test files from coverage measurement if you only want to track application code:

coverage run --source=my_module --omit='*/tests/*' -m pytest

Reading the report

The coverage percentage is a ratio of executed statements to total statements. A function that’s missing an early return might show 80% coverage while only exercising half its branches. Look at the Missing column to see exactly which lines were skipped.

Generating HTML reports

For visual inspection, generate an HTML report:

coverage html

Then open htmlcov/index.html in your browser. You’ll see your source code highlighted by coverage — green for executed lines, red for missed ones. This makes it easy to spot functions that need more test cases.

Mutation testing with mutmut

Mutation testing introduces deliberate bugs (“mutations”) into your source code — changing > to <, replacing + with -, swapping and for or — then runs your test suite. If the tests still pass, you have a problem: your tests don’t catch that kind of change.

Run mutation testing

mutmut run

Then see which mutations survived (were not killed by tests):

mutmut results

Surviving mutations are the ones your tests failed to detect. Each surviving mutation is a gap in your test quality.

Apply mutations and view diff

See exactly what mutmut changed:

mutmut apply 2

This applies mutation #2 so you can inspect the change manually.

Generate a HTML report

mutmut html

Open the report to see a breakdown of mutations by file and function, with survival rates per module.

A practical example

Consider a simple validator:

# my_module.py
def validate_age(age):
    if age < 0:
        raise ValueError("Age cannot be negative")
    if age > 150:
        raise ValueError("Age is unreasonably high")
    return True
# tests/test_my_module.py
def test_validate_age():
    assert validate_age(25) == True

Coverage looks good — both branches execute. But test_validate_age never checks boundary conditions. Run mutmut:

mutmut run -- pytest
mutmut results

Mutmut will likely kill mutations like age > 150 changed to age >= 150, because those boundary changes produce different behaviour that the tests don’t catch.

Fixing the gap

Add boundary tests:

def test_validate_age_boundaries():
    with pytest.raises(ValueError):
        validate_age(-1)
    with pytest.raises(ValueError):
        validate_age(151)
    assert validate_age(0) == True
    assert validate_age(150) == True

Now the boundary mutations get killed.

Configuring mutmut

Mutmut uses a pyproject.toml or setup.cfg section:

[mutmut]
paths = my_module.py
runner = pytest

You can also set a backup command to restore mutated files, and configure which mutation types to apply:

[mutmut]
no_http_backup = true

Coverage VS mutation testing

Coverage tells you what code ran. Mutation testing tells you whether your tests would notice if that code was wrong. Both matter.

High coverage with weak tests gives you a percentage that looks good but hides real gaps. Mutation testing forces your tests to prove they can distinguish correct code from broken code.

The workflow:

  1. Write tests
  2. Run coverage — find lines with zero tests
  3. Improve tests until coverage is high
  4. Run mutation testing — find mutations your tests don’t catch
  5. Improve tests to kill surviving mutations

Common pitfalls

Using coverage as a quality gate

100% coverage doesn’t mean 100% tested. You can execute every line with a single assertion that only checks the happy path. Mutation testing catches this.

Skipping mutation types

Mutmut supports many mutation types. If you disable too many to reduce noise, you might miss real gaps. Start with all mutations enabled and disable only the ones that produce false positives in your codebase.

Not running mutation testing in ci

Mutation testing is slow, but it catches real bugs. Even running it on a subset of mutations in CI is better than not running it at all.

See also