False Positive/Negative Rate: Which Is Better? Why? What About Predictive Value?

Too many questions!

It’s the age-old question of laboratory tests and analyses, “How accurate is this?” The answer to this question is always, “It depends…” Some lengthy explanation follows this answer of what is best for the person being tested. When it comes to individual medical decisions, these discussions are best when had by a healthcare provider and the patient, not the patient and Google. But what about a question at the population level?

Take, for example, influenza surveillance. When I started working at a state health department, one of the first things I did was contact clinical laboratories and ask them to provide the number of rapid influenza tests and their results. This would help me inform the public and public health workers of when and where influenza was active. But I had to remember the performance of these tests, as well as the prevalence (the existing cases of a disease) of influenza in the places where the tests were being done.

The rule of thumb is: If prevalence is low, then the false positive rate will be high. If prevalence is high, then the false negative rate will be high. It’s all based on math, and how that math breaks down on a 2×2 table based on a test’s sensitivity and specificity. Sensitivity is the probability that the test will detect a disease when the disease is there. Specificity is the probability that the test will be negative when there is no disease.

Let’s say a test is 99% sensitive and 99% specific. That’s pretty good, right? It will catch 99% of all true cases with a positive test, and it will rule out 99% of non-cases with a negative test. Know that there are four categories being looked at: TRUE positives, FALSE positives, TRUE negatives and FALSE negatives. As prevalence increases, the chance that a positive test is true increases. You have more true positives. The chance of a false positive decreases. Likewise, the chance of a negative result being a true negative decreases as prevalence increases.

So we go back to the question of what you want to achieve… If you are a physician and want to catch the most cases, then you want the patients you’re testing to be in a group with the highest prevalence. This is why healthcare providers will ask you all sorts of questions before you get tested. They want to make sure you fall into the categories for testing that will yield the highest POSITIVE PREDICTIVE VALUE.

They want that positive test to have the highest chance of being a true positive. They also want to miss the fewest cases possible by increasing the chances that a negative test is negative, or having the highest NEGATIVE PREDICTIVE VALUE. There is a “sweet spot” when it comes to prevalence where this happens, but that’s for a whole other lecture.

Now, if you are an epidemiologist working an Ebola outbreak, you don’t want to have false negatives that end up being sent home to infect others. You want that number low. Do you care about false positives? Well, maybe not if the therapy won’t kill someone, or maybe you do if a positive test means being put into a ward with sick people. It’s a delicate balancing act.

What about pregnancy tests to take at home? You probably don’t worry too much about false negatives (pregnant women who test negative), because those women will still be pregnant and probably take the test again if they continue to miss their period or feel other signs/symptoms of pregnancy. And you maybe care about false positives, because a positive test means a trip to the obstetrician, blood work, and (if you’re anything like me) an ensuing panic of epic proportions for the would-be dad.

If you’re me and want to keep tabs on flu activity, you don’t say the flu has arrived based on a screening test. You use a gold standard test for influenza, like a viral culture or a polymerase chain reaction test. Once the gold standard is positive, you know the virus has arrived, and the chances of screening (aka “rapid”) tests being true influenza cases rise to tolerable levels. Once you stop seeing positives on gold standard tests, or you see that many rapid tests were in people without symptoms, then you stop using it as a marker of influenza activity.

Again, it’s all a balancing act. It’s kind of like the justice system. You want the chances of an innocent person going to jail to be as low as possible, so you set up all sorts of systems. You also want the chances of a guilty person to be as high as possible to protect the population from criminals, so you set up those systems. You’re still going to have innocent people go to jail and criminals get out, but it’s all about minimizing it. (Don’t get me started on how the current justice system in the United States is failing at this.)

Now you know why a test that is 99% accurate (99% sensitive and 99% specific) will still throw out a lot of false positives or false negatives, because it’s about prevalence. If you’re a healthy person in the middle of the summer in the United States, and you haven’t traveled abroad or worked with pigs/chickens, then you probably will not get tested for the flu. There’s a high chance that you’ll test positive when you’re not. On the other hand, if you’re feeling miserable, it’s the middle of winter in the United States, and you have been around other sick people, then you have a very low chance of testing negative when you are indeed sick.

(Of course, the COVID-19 pandemic threw a wrench in when we see influenza-like illness in the United States and other countries in temperate regions of the world.)

These are the kinds of things that one needs to consider carefully when using a screening test or device. But you also need to think about the population you’re testing in general, the individuals you’re testing in particular, how they would benefit or be hurt by the test results, and whether you should use the gold standard or diagnostic (not screening) test instead if your suspicion is high enough to warrant it.

What worries me is a researcher who sees too many false positives or too many false negatives, and gets all riled up over them without seeing the bigger picture. Maybe, in the situation you describe, too many of either is not bad. Maybe the proportion of each (i.e. the Positive/Negative Predictive Value) is what you should be worried about? Context matters when dealing with these things. And context is something epidemiologists need to remember when interpreting the results of their research, especially if they’re calling for any kind of action.

Don’t you love thinking of all the possible scenarios?

I do.

Hey, if you liked what you read just now, and what you read on Medium in general, why not get a membership and support our work? Click here for more information: https://epiren.medium.com/membership Thanks!

René F. Najera, MPH, DrPH, is a doctor of public health, an epidemiologist, amateur photographer, running/cycling/swimming enthusiast, husband, father, and “all-around great guy.” You can find him working as the director of a center for public health, grabbing tacos at your local taquería, teaching at a university in northern Virginia where he is an adjunct in the Department of Global and Community Health, or teaching at the best school of public health in the world where he is an associate in the Department of Epidemiology. All opinions in this blog post are those of Dr. Najera, and do not necessarily represent the opinions of his employers, friends, family, or acquaintances.