I admit the headline sounds like an oxymoron on par with military intelligence or Microsoft Works. But I promise it is actually interesting and surprising. At least it was to me.

Suppose a new deadly disease is making the rounds. There is a cure, but the cure is either very invasive or expensive, or has some huge disadvantage. Luckily, there is a test for the disease that is 99% accurate. You get the test and unfortunately it comes out positive. Should you now take the very invasive cure? Or, in other words, what is the probability that you actually have the disease?

The first intuition would be 99% so better bite the bullet and take that leg-amputating cure, right?

Wrong. The answer is that we don’t have sufficient information to make the decision. We will also need what is called the base rate, i.e., what the probability is we have the disease before we took the test. The base rate is simply the (estimated) number of infected divided by the entire population. Let’s say that one in 100 is infected. That’s roughly 170.000 people in the Netherlands or 55.000 people in Denmark. I assume you can divide by 100 yourself to relativize to another country if you prefer. While one in 100 sounds like very few, 170.000 Dutchmen is actually a large number – compare that the much feared Ebola has infected 30.000 worldwide, SARS 8.100, pig-flu 11.000 and bird-flu 638 and AIDS 36.7 million. All those numbers are much smaller than 55.000, much less 170.000 except for 36.7 million. We have a base rate of 1%.

We can now compute the probability that we have the new fancy deadly disease and compare the odds to the caveat vs benefit of the cure to make an informed decision. Or perhaps it’s a loved one who is identified as a zombie and you have to decide whether to shoot them in the face before they turn and kill other loved ones. Try making a guess before reading on.

To compute the chance that we have the disease, we just compute how many has the disease and are categorized as such, and how many doesn’t yet still are wrongly categorized as infected. Let’s assume a population of 1.000.000 to make the computations less abstract. Of these people 1% * 1.000.000 = 10.000 are infected. That means that 1.000.000 – 10.000 = 990.000 are not infected. The test is 99% trustworthy, so of the 10.000 we would have 99% * 10.000 = 9.900 infected identified as such. However, the test is also 99% trustworthy for the 990.000 uninflected, so it would misidentify (100% – 99%) * 990.000 = 1% * 990.000 = 9.900 healthy people as sick.

That means we belong to a pool of people consisting of 9.900 sick people correctly identified as such and 9.900 miscategorized healthy people. There’s a 50% change we are infected despite the original (and truthful!) claim that the test is 99% accurate.

That’s still fairly high, so probably go with the dick-amputation to cure the disease either way. But what if we tweak the values slightly? If the test only is 95% accurate – that’s still extremely high compared to real accuracy of such tests – we instead have 95% * 10.000 = 9.500 infected and (100% – 95%) * 990.000 = 49.500. Now, the chance we have the disease plummets to 9.500 / (9.500 + 49.500) * 100% = 16.1%.

But here’s where it gets really interesting. The base rate has immense influence on the outcome. Assume that the test is still 99% accurate, but now only one in 200 people are infected. That seems like almost no change, right? Well, now we “only” have 0.5% * 1.000.000 = 5.000 sick (and 995.000 healthy). The test will identify 99% * 5.000 = 4.950 sick people and misidentify (100% – 99%) * 995.000 = 9.950 healthy people. Now, the chance is only 4.950 / (4.950 + 9.950) * 100% = 33% we are infected.

Suppose you were on the Tokyo metro when SARS was the hip thing, and somebody near you coughed. Should you write your last will and go on an insane bender in the expectation you caught SARS? With 8.100 total incidents of SARS registered in all of the world, if we assume they were all in Tokyo, Japan. That would (wildly overestimate) the base rate at 8.100 / 13.600.000 * 100% = 0.06%. If we assume that coughing is 100% effective for identifying SARS, this would mean that all the 8.100 goners would be coughing. Assuming further that everybody gets a cold just once a year and that’s the only reason anybody would ever cough unless they got the death-sentence of SARS, and assuming a cold lasts one day only, that would mean coughing is 1/365 * 100% = 0.27% ineffective for identifying healthy people (well, they have a cold, but that not super-deadly SARS so we consider them healthy) people, i.e., we would misidentify 0.27% * 13.6 million = 37.260 people, making the chance of a coughing person having SARS be 8.100 / (8.100 + 37.260) * 100% = 18%.

Here, we have used different rates for the efficiency at identifying sick people as sick and identifying healthy people as healthy, but otherwise the computation is the same as above. The 18% is computed based on a wildly over-estimated base rate and a wildly underestimated inaccuracy. In fact, Japan had around 50 cases of SARS and perfectly healthy people cough every day, and if we instead assume there’s 50 sick people and coughing is 1% inaccurate, the risk of a coughing person having SARS is 50 / (50 + 136.000) * 100% < 0.04%. You’re probably more likely to die from stressing over SARS than from SARS, even if you go licking every coughing person in the face to get all their delicious germs.

The same computation can be used in many other cases. The lesson is that to interpret the result of a test with a known (or assumed) accuracy, we need to know the base rate. In psychology, this is known as the base rate fallacy, and was (of course, one is tempted to say) studied by Kahneman and Tversky. The fallacy is observed to often cause people to ignore base rate information; the fallacy is related to the famous Monty Hall problem: in a game show, there’s a price (say a car) behind one door out of three. You pick one door, and a host opens another door he knows doesn’t house the car. The host now asks you whether you want to switch doors. The answer to the problem is illustrated below:

https://www.youtube.com/watch?v=anYHixtZdxg

The base rate fallacy is also illustrated in this Veritasium video:

The You Are Not So Smart Podcast had an episode on base rates and Bayes’ Theorem, which is worth a listen:

[soundcloud url=”https://api.soundcloud.com/tracks/257925026″ params=”auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&visual=true” width=”100%” height=”450″ iframe=”true” /]Time person of the year 2006, Nobel Peace Prize winner 2012.

What strikes me as slightly curious, when encountering these masked asians… They always use the normal surgical style masks made to protect the patient from the surgeons, and not the types made specifically to protect user.

Why, I always wonder what these people are trying to protect me from.

I’ve read that the major reason for wearing them is not to protect the wearer, but to protect everybody else. Even the best mask would provide little protection against even something insanely contagious like ebola over just, you know, not licking up their mucus on sight.

Everybody wearing masks, however, especially in crowded places like the subway, has a preventive effect. It’s essentially a vaccination for the common cold and whatever pig/bird flu is popular at the time.

Good writeup. we’re assuming here (I think?) that accuracy for detecting true postives and rejecting false negatives is the same, which (I think) is not the common.

Normally you have a screen test designed to catch the true postives, followed up with another one designed to reject the false negatives.

As for the masks, I live in Tokyo and wear them if I have a really stubborn cold, hoping it would protect from the onslaught of new virus on the commute train. However, in spring many wear them due to pollen allergy.

Thank for the compliments. You’re right that the assumption that I’ve assumed the accuracy is the same for both positives and negatives. That just makes the math simpler. The model can be improved by assuming two different probabilities, and better detection by doing more tests (though one would have to be careful, as two tests for the same wouldn’t be independent so the maths would get more complicated).

Of course, the positive effects of doing more tests would have the negative effect, that after getting the desired result one would often stop doing tests… I guess the entire point is that statistics is hard and dumb 🙂

I didn’t think of using masks against allergies and when already weakened. That makes totally sense!