Advanced Unit Tests: 5 Pitfalls and How To Avoid Them

Not all unit tests are equal; some might be harmful

Cartoon adapted from geek-and-poke.com, Licence CC-BY-3.0

Unit tests exist to give you confidence that your software is working as expected, even as the software changes over time.

I have written a lot of tests, and I have read much more. The majority of them helped me to discover bugs early, provided documentation, and prevented regressions. But I also found some tests that failed to do any of that. Instead, they were either so complex you could not figure out what they were testing, would fail at random, or even never fail at all.

This article presents five pitfalls that lead to ineffective unit tests, and how you can fix them.

1. Write One Unit Test per Function

It seems very straightforward. Let’s say you have a small function that does one thing. Let’s say it’s called calculate_average. It is one small unit, it is what unit testing best practices want you to test. So you write a test for it, test_calculate_average.

What is wrong with this? It tests a single unit of code, but it should test for a single behavior of the test. Often this is also phrased as having a single assertion in a test. A much better test would be test_calculate_average_return_0_for_empty_list. Once you have a couple of them, they give you detailed documentation for free.

It also changes your mindset on how to write tests. You have to think about the different behaviors that you expect from a function. Before you know it, you are thinking about the edge cases, and even writing tests for them.

“Write one unit test per unit of functionality, not unit of code.”

I once helped a colleague debug a problem: Our scrubbing logger was not properly scrubbing data. They were suspecting it wouldn’t scrub additional key-value pairs properly. As I wrote that code some time ago I had no idea whether that theory was correct. But I knew I wrote ample tests, and I quickly found one documenting the behavior in question: scrubbing_logger_scrubs_extra_key_values! We could quickly discard our initial assumption and save some valuable time.

2. Write Tests Just for Code Coverage

Tracking test coverage is generally a good idea. Nowadays, many testing frameworks support this, and platforms like codecov or coveralls make it easy to track it over time. So why is it not a good idea to obsess over it?

Code coverage is just a proxy measurement. 100% code coverage does not mean that you have covered all edge cases, it just means all code paths are executed. Here is a quick counterexample that has 100% coverage, but let’s explore what happens when you pass in an empty list?

The fundamental problem with code coverage is that it only measures how many lines of your program are covered. But all programs are state machines; for complete coverage, you would have to cover all states. This is not feasible.

Striving for complete, or at least very high coverage also leads to a lot of tests, and not all of those are that useful. This is especially true for “glue code.” I have seen tests that mock half of a web framework (flask), just to test that registering a function for an endpoint works. This is a lot of effort to test a tiny bit of functionality. If you get it wrong, it will be obvious. Once you get it right, it is unlikely to change in the future.

Instead of striving to cover every line of code, I recommend Martin Fowler's advice. Focus your tests on the risky code. That is code you wrote yourself, rather than frameworks, that is likely to be refactored. Knowing what is risky is difficult, though, but it comes with experience.

“ You should concentrate [your testing efforts] on where the risk is.” — Martin Fowler, Refactoring

3. Heavily Rely on Mocks

Using mocks and stubs is indispensable for unit testing. Most of the time your code under test interacts with other modules, and for the duration of the test, you want to control their behavior. But you can also overdo mocking.

When you have to write 50 or 100 lines of mocks to test a single ten function, then what are you testing? Are you testing your function, or are you testing the mock that you wrote to test the function?

Lots of mocks are also a red flag for code layout. When you need several, very involved mocks to test a single function, chances are this single function does several things. So you might want to refactor it into several functions that do less and can be tested in isolation.

I have seen some pretty convoluted mocks. This is a recreation of one example:

In case you are lost in it, we want to test that we successfully modify a response object in the middleware. In the process, we create an entire app, including mocked endpoint and test client. Instead, you can create mocks much closer around the code under test, such as the following:

Both examples test the same, but the latter is much shorter and requires less setup and custom mocking. Yes, we still use mocks, but they are a lot less obtrusive.

4. Write a Test That Never Fails

Detecting regressions is one reason for unit tests. You write the code, you write passing tests and profit. In case someone breaks the functionality of your code, the tests will pick it up. Or will it? If you are not careful, your test might never fail and you miss the regression.

But how do you end up with a test that never fails? Here is an example:

Now ask yourself: what changes would make this test fail? The most obvious one is changing the mock response. But that does not count, you are not changing the code under test. Even worse, I initially forgot about json.dumps. This bug would not be caught by the test. The only way to make this fail is by messing with the code in line four. Given the sophistication of the test, that is surprisingly little actual coverage.

You can think of it in terms of false and true positives. A lot of changes that you want the tests to catch are not detected. Passing an invalid parameter into get_film is not caught. Forgetting json.dumps is not caught. An error in the Query is not caught. In other words, you get a lot of false positives.

To prevent this, think about what makes your test fail. Even better, start with a failing test, and then write the code until it passes. Before you know it, you are doing Test Driven Development.

5. Using Non-Deterministic Behaviour in Test

This is a well-known fallacy. If your tests or code under test behaves in a non-deterministic way, you will lose confidence in your test. On every failure, you ask: Is my test failing, or will it pass on a rerun? Rerunning introduces friction to your workflow. Too much friction and you want to completely discard the test suite.

The downsides of non-determinism are obvious for tests, so what is still causing this?

Are you using the current time or date in your test? If yes, your tests are running with different data every day. Once you are in the business long enough, you will come across those kinds of tests. They might fail only on the last day of the month, or only when started before midnight, and finish after. Luckily there is an easy solution: control the flow of time. Python, for example, has the freeze-gun module for that.

Are you using randomness to generate example data? There is a Python library called faker, which makes it easy to generate real-looking data like names, addresses, or phone numbers. It is really good for populating a demo environment or smoke tests. For unit tests not so much. It is much more reliable to use hard-coded static examples.

I have heard the argument for non-determinism in tests: over time it will cover more test cases, and potentially find more bugs. Libraries like Haskell's QuickCheck or Python’s Hypothesis incorporate this idea. But those libraries generate multiple examples for a test and provided seeds and examples on a failure. If something fails due to a newly discovered edge case, the libraries will make it obvious and easy to reproduce. Relying on other sources of non-determinism won’t. That’s why my advice is to avoid non-deterministic behavior in your test.

Final Thoughts

There you have it, five pitfalls that prevent you from writing effective unit tests. Now that you know about them, you can avoid them by doing the following:

writing tests for every part of the functionality instead of every function
not obsessing over code coverage, but focusing on testing risky code
minimizing setup and mocking code
making sure your tests can fail
keeping non-determinism out of your tests

This will give you much more confidence that your tests test your software, and well-tested software lets you make changes and deploy quickly with confidence.

All the code examples in this article are on GitHub.