Imagine that there is a rare genetic disease that affects 1 in every 100 people at random. There is a test for this disease that has a 99% accuracy rate: of every 100 people tested it will give the correct answer to 99 of those people.

**If you have the test, and the result of the test is positive, what is the chance that you have the disease?**

If you think the answer is 99% then you are incorrect; this is because of the base rate fallacy – you have failed to take the *base rate* (of the disease) into account.

In this situation there are four possible outcomes:

Affected by disease | Not affected by disease | |

Test correct | Affected by disease, and test gives correct result. (DC) | Not affected by disease, and test gives correct result. (NC) |

Test incorrect | Affected by disease, and test gives incorrect result. (DI) | Not affected by disease, and test gives incorrect result. (NI) |

This is easier to understand if we map the contents of the probability space using a tree diagram, as shown below.

In two of these cases the result of the test is positive, but in only one of them do you have the disease.

P(DC) = P(Affected) × P(Test correct)

P(DC) = 0.01 × 0.99

P(DC) = 0.0099 = 1 in 101

The other case that results in a positive result, when you don’t have the disease and the test in incorrect has the same 1 in 101 probability: P(NI) = 0.0099.

Of the two remaining cases, not having the disease and getting a correct negative test result takes up the vast majority of the remaining probability space: P(NC) = 0.9801 or 1 in 1.02. The chance of having the disease and getting an incorrect test result is extremely small: P(DI) = 0.0001 or 1 in 10000.