In our child footballer story, we described an imaginary test that was very accurate at spotting future champions. Let’s see how that accuracy might have been measured.
People who devised the test tried it on a random sample of, say, 20,000 children. We can imagine the result of the trial was something like this:
Of 20,000 children, 20 of them (0.1%) turned into champions. All champions had a top score (>95), i.e. the test produced no False Negatives. However, there were some False Positives: of the 19,980 children who did not make it, 999 of them (5%) got a top score. So, of the total of 1,019 children with a top score, 20 of them – or 1.96% – became champions, while 98.04% did not. And of the total of 18,981 children with a lower score, none of them became champions. Hence, the posterior probability of becoming a champion, after gaining a top score, was 20/1019=1.96%.
In our notation, if N=20,000, then N∙BR∙TPR=20, N∙(1-BR)∙FPR=999 and N∙BR∙TPR+ N∙(1-BR)∙FPR=1,019. Hence, from Bayes’ Theorem, PP=20/1019=1.96%.
A test is a form of evidence. In general, any sign that can be related to a hypothesis is a form of evidence about the hypothesis. Evidence can come in different shapes. A test provides hard evidence: It is the result of a controlled, replicable experiment, leading to the measurement of hard probabilities, grounded on empirical frequencies. But the coach might have simply said: “I am an experienced coach and I can spot if a child is a champion. In fact, I am infallible champion spotter: give me 100 champions and I will correctly identify all of them. True, I may throw out a few False Alarms, but of 100 normal children, I will correctly identify 95 of them, and I will wrongly identify as champions only 5 of them. So overall I am very accurate. Now, I had a careful look at your child and – congratulations! – I think he is a champion. Remember: I may be wrong but – I don’t think so.”
In principle, the coach’s opinion could also constitute hard evidence, which could be measured as in the frequency table above. In fact, results may well be similar to those in the table, with “Top score/Lower score” replaced by “Coach says Champion/No Champion”. With proper measurement, the resulting probabilities would be as hard as in the test, with a high level of accuracy translating – surprisingly, but unquestionably – into a low level of support for the hypothesis that the child is a champion. Without proper measurement, however, the observation that the coach never misses a champion overshadows the fact that False Positives are much more frequent than True Positives. As the table makes clear, what counts is not the True Positive Rate (100%) versus the False Positive Rate (5%), but the frequency of True Positives and False Positives (20 vs. 999). Neglecting this fact plays right into the hands of the Prior Indifference Fallacy. The confident coach who invites you to rely on his accuracy makes this mistake. While not ignoring False Positives, he compares them to True Negatives (against which they appear small) rather than to True Positives (against which they are large). When he says: “Give me 100 champions and I will correctly identify all of them; give me 100 normal children and I will correctly identify 95 of them”, he is implicitly assuming an equal number of champions and normal children – i.e. he is falling prey to the Prior Indifference Fallacy. Indeed, under prior indifference the Posterior Probability is equal to the ratio of the True Positive Rate over the sum of True and False Positive Rates. Even an honest, scrupulous coach, who correctly notates his track record, may not know what his experience means. Thus, if an accurate coach believes your child is a champion, you trust him: he is the expert. An expert is someone who is supposed to have tested the hypothesis many times before, and therefore has been able to catalogue his experience in the shape of a frequency table. His correct reasoning should be: “Since champions are rare, the probability that this child is a champion, however confident I am that he is, is less than 2%”. But thinking of False Positives as a small rate rather than a large absolute number leads the coach – and you – to identify accuracy with support. The failure to appreciate the nonlinear relationship between accuracy and support when the Base Rate differs from the 50% indifference level means that the expert’s response can be seriously misinterpreted. Even a 50% accurate expert – a useless one, worth as much as a coin toss – may be able to produce a massive shift in probability from a low Base Rate – where the probability should stay and, without the expert response, would stay – to a grossly overestimated indifference between accepting and rejecting the hypothesis under investigation.