The prosecutor’s fallacy and the base rate fallacy are two related fallacies that involve *conditional probabilities*. I learned about these fallacies in *Calling Bullshit *by Bergstrom and West.

**Click here to read my full summary of Calling Bullshit. You can also buy the book at: Amazon | Kobo (affiliate links)**.

### Conditional probability

A conditional probability is the chance that something is true, if other given information is true.

In writing, it’s expressed like this:

P(test|given)

where:

**test**is the thing you want to know (e.g. innocent or guilty); and**given**is the thing you know (e.g. DNA match).

## Prosecutor’s fallacy

The prosecutor’s fallacy mixes up:

- the probability that someone matches, given their innocence; with
- the probability that someone is innocent, given there is a match.

You don’t care about the first probability – you already know there is a match. You care about the second probability. It is very easy to mix up these two probabilities. Bergstrom and West say that even trained scientists do it when looking at p-values and interpreting results.

**Example – prosecutor’s fallacy**

Your client’s DNA has been matched with the DNA at a crime scene. There is a 1 in 10 million chance that the match would have occurred if he hadn’t been at the scene.

But the police database has 50 million DNA records. You would expect about 5 innocent matches from chance alone (and 1 guilty match, if the guilty person is in the database).

So the chance of your client being guilty is not 1 in 10 million, but about 1 in 6.

## Base rate fallacy

This is basically the same thing, in that it involves a neglect of base rates when the thing you are testing for is very uncommon/very unlikely to be true. Unless your test is extremely accurate, it may be more likely that a positive result is due to the inaccuracy of the test than to the thing you tested for actually being true.

It’s less of an issue if the thing you’re testing for is likely to be true.

**Example – base rate fallacy**

You test a patient for Lyme disease. The test you use is claimed to be “95% accurate” because:

- the test always identifies Lyme disease when the patient has Lyme disease (i.e. false negatives are zero); and
- the test gives a false positive in only 5% of cases where the person does not have Lyme disease (i.e. false positive rate is 5%).

The test for your patient comes back positive. Does this mean there’s a 95% chance the patient has Lyme disease?

No.

What we want to know is the probability that it’s Lyme disease given a positive result – i.e. P(Lyme disease|positive result). What the 95% tells you is the probability that a patient gets a **negative **result if they **don’t **have Lyme disease – i.e. P(negative result|no Lyme disease). These aren’t the same thing.

Lyme disease is very rare, affecting about 0.1% of the population. If you tested 100,000 people you’d expect 100 people to have Lyme disease. But you would expect to get 4,995 false positives (i.e. 100,000 minus the 100 true positives, multiplied by 5%). So, even with a positive result, it’s still far more likely the positive you got was a false positive than a true positive.

The low **base rate** of Lyme disease means that, even with a positive test result, there’s only about a 0.2% chance that the patient has Lyme disease. A test that gives out 5% false positives is not that useful when the base rate is so far below 5% (its 0% false negative rate will reassure you if you get a negative result though). If the base rate is higher, however, you can have more confident that a positive result means the person actually has the disease. Wikipedia has a table here that may help you understand it.

**You may also enjoy:**