100% test coverage is not enough

In Praise of Property-Based Testing

Sep 01, 2023

The Python code below1 has a fatal flaw.

# main.py
def inverse_of(x: int) -> float:
    return 1/x

But it has 100% test coverage:

# tests.py
import pytest
from main import inverse_of


@pytest.mark.parametrize(
    "value,expected",
    [
        (2, 1 / 2),
        (3, 1 / 3),
    ],
)
def test_inverse_of(value, expected):
    assert inverse_of(value) == expected

And just to show it:

$ pytest --cov=main tests.py -q
...                                                                                                      [100%]

---------- coverage: platform linux, python 3.10.11-final-0 ----------
Name      Stmts   Miss  Cover
-----------------------------
main.py       2      0   100%
-----------------------------
TOTAL         2      0   100%

Now let’s say that code exists in some larger system, passes code review, and gets into production. And then weeks later. Someone tries 0 as the input and we pass it all the way down here to this function. It’ll blow up with a ZeroDivisionError.

Since we’ve parametrized this test, it’s not hard to add this new case.

# tests.py
import inspect
import pytest
from main import inverse_of


@pytest.mark.parametrize(
    "value,expected",
    [
        (2, 1 / 2),
        (3, 1 / 3),
        (0, ZeroDivisionError),
    ],
)
def test_inverse_of(value, expected):
    if inspect.isclass(expected) and issubclass(expected, Exception):
        with pytest.raises(Exception) as excinfo:
            assert str(excinfo.value) == expected
    else:
        assert inverse_of(value) == expected

So that’s good. But for all of our care of writing tests and doing a code review, we missed it!

What are our options to have discovered this error before getting it into production?

For certain classes of errors, it may be enough to just add into your PR template another checklist item for code reviewers to look at. However, I prefer to let machines do the boring work like checking for obvious errors.

Compilers?

And when I think of boring work, I think of a compiler. That said, I do not know of a language that will automatically detect a divide by zero error in the compilation step for free—no customization. It may just handle it “nicely” instead.

Just to sort of prove this out, let’s use Rust. It’s mainstream2 and is widely touted as being safe for exactly these kinds of things. Here’s the code.3

use rstest::rstest;
use std::env;


fn inverse_of(x: i32) -> f64 {
    1.0 / x as f64
}


#[rstest]
#[case(2, 0.5)]
#[case(3, 0.3333333333333333)]
fn inverse_of_test(#[case] input: i32, #[case] expected: f64) {
    assert_eq!(expected, inverse_of(input))
}

Now let’s run the tests.

$ cargo test -q

running 2 tests
..
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

So nope, it didn’t catch anything automatically in the build process or tests. So let’s use it and see what happens.

fn main() {
    let args: Vec<String> = env::args().collect();
    if let Some(query) = args.get(1) {
        match query.parse::<i32>() {
            Ok(value) => println!("The inverse of {} is {}", query, inverse_of(value)),
            Err(_) => println!("Invalid input. Please enter a number."),
        }
    } else {
        println!("Please enter a number as an argument.");
    }
}

And let’s run it.

$ cargo run -q 0 
The inverse of 0 is inf

So it’s not a panic, which depending on your point of view is better or worse than the Python version.

It’s better in that the application doesn’t crash completely. But it’s worse in that infinity as a value would likely cause problems in other parts of a larger system.

We probably want some error message returned instead, and we’d rather discover this error now instead of out in production. There must be a better way!

Property-Based Testing FTW!

Let’s return to Python and consider our options.

Since this is a unit test, what we need is some way to explore the space of values to automatically discover this divide by 0 issue for us. Fortunately, we don’t have to invent anything new. This kind of testing has a name: property-based testing.

In Python there is a library to do this called Hypothesis.4 So let’s go back to the prior example and make a new file for it and try it out:

# tests_hypothesis.py
from main import inverse_of
from hypothesis import given
import hypothesis.strategies as st

@given(
  st.integers()
)
def test_inverse_of_hypothesis(value):
    inverse_of(value)

What this code does is try a wide range of integers and pass them into the function during tests. Basically, it’s doing exactly what we needed: exploring the space of values.

If we run tests now, we’ll get a VERY loud error that there’s a failing example. And that’s great to know! We can fix this now so that we hand back something useful to a user.

Also, we can keep this test permanently, and it’ll function as a regression test.

Panacea for Testing?

When introduced to something as useful as this, people try to use it everywhere.

So will this fix everything for you? No, this is a specific a tool to test a wide range of values. It’s sort of the unit test variant of fuzzing. You can’t use it very easily on a large system as an input from the outside since you are trying a very large number of values, and you shouldn’t just use it everywhere because of that.

As with most things in software engineering, there is no perfect solution. No magic bullet. It’s all tradeoffs and options.

You can run this Python code on replit.

Tiobe Index and PyPL Index

You can run this Rust code on repilt too. I’ve used the nice rstest library

Property-based testing was originally popularized by Haskell’s QuickCheck, and many other languages have implementations of property-based testing so this isn’t a Python only thing.