Egypt, 2200 BCE, Sixth Dynasty of the Old Kingdom.

Ouser is dead. In his will, he appointed Sebek-hotep to be the guardian of his children and gave him the right to use his goods. Or so claims Sebek-hotep. Because Taou, Ouser’s eldest son, says instead that his father had no such intention and that Sebek-hotep produced a forged document. A tribunal is called upon to settle the dispute. After some deliberation, the judges emit the following sentence:

If Sebek-hotep brings forward three honourable witnesses, in whom we can have confidence, who repeat: “Your power be against Taou, O God, since the truth is that this document conforms to what Ouser’s said on this matter”, then the goods will remain in the house of Sebek-hotep, after he has brought forward these witnesses in whose presence these things were said, and Sebek-hotep will have the right of use over the goods. But if he cannot produce three witnesses in whose presence the words were said, in that case, the goods of Ouser will not stay with him but will be given to his son Taou.

Probability theory started in the 17th century as a study of hard evidence, resulting from controlled, replicable experiments, such as coin or dice tosses. But probability itself is a primal concept which, since the dawn of civilisation, has been inherently tied to the evaluation of soft evidence. While still based on observation, soft evidence is not the result of measurable regularities but the product of the observer’s perception. Hence its accuracy coincides with the observer’s confidence in using it as a sign for assessing the probability of a hypothesis. As such, it is ultimately a matter of trust.

As well argued in James Franklin’s The Science of Conjecture (where the Egyptian story comes from, p. 1), the Law is the original domain in which the concept of probability has been moulded through time. Law needs to reach a decision, i.e. separate true from false, guilt from innocence. To reach a verdict is, literally, verum dicere: to speak the truth. Over the centuries, different legal systems defined different rules of evidence, but relied on the same underlying relationship between beliefs and trust. So, for example, according to a medieval text of 850 CE, 3000 years after the Ouser dispute:

A bishop should not be condemned except with 72 witnesses … a cardinal priest should not be condemned except with 44 witnesses, a cardinal deacon of the city of Rome without 36 witnesses, a subdeacon, acolyte, exorcist, lector, or doorkeeper except with 7 witnesses (ibid. p. 13-14).

Hilarious to our common sense, until we realise that, now as then, our confidence rests on the judgement of reputable authorities. Indeed, the very word probability comes from the Latin probabilis, which means ‘worthy of approbation‘ (see Ian Hacking’s The Emergence of Probability, Chapter 3). Approbation comes from the recognized probity of honest, honourable people, who are thus capable of influencing our beliefs by approving a hypothesis as true or false. Juries in legal trials are based on this principle, known as Condorcet’s Jury Theorem.

As we know, evidence accumulates multiplicatively:

PO = LR1 ∙ LR2 ∙ … ∙ LRN ∙ BO

Convergence to the truth requires a preponderance of confirmative or disconfirmative evidence. Assume a jury has N members, each endowed with a certain degree of accuracy A=(TPR+TNR)/2=0.5+(TPR-FPR)/2 =0.5+(TNR-FNR)/2 in evaluating the hypothesis of Guilt on the basis of the available evidence. TPR is the probability that a juror states that the defendant is guilty, given that he is guilty; TNR is the probability that the juror states the defendant is innocent, given that he is innocent. The Jury Theorem is based on the assumption that each juror’s accuracy is greater than 50%. In that case, TPR-FPR>0 hence LR=TPR/FPR>1: a juror is more likely to state that the defendant is guilty if he is guilty than if he is innocent. Likewise, TNR-FNR>0 hence TNR/FNR>1 and LRN=FNR/TNR<1: the juror is more likely to state that the defendant is innocent if he is innocent than if he is guilty.

To an external observer, an accurate juror stating that the defendant is guilty is a sign of Guilt, in the same sense that Smoke is a sign of Fire: it is confirmative evidence of Guilt (LR>1). And the juror not stating that the defendant is guilty is a sign of no Guilt, in the same sense that no Smoke is a sign of no Fire: it is confirmative evidence of no Guilt (LRN<1). Likewise, an accurate juror stating that the defendant is innocent is a sign of Innocence, in the same sense that Rain is a sign of no Fire: disconfirmative evidence of Guilt (LR<1). And the juror not stating that the defendant is innocent is a sign of Guilt, in the same sense that no Rain is a sign of Fire: disconfirmative evidence of Guilt (LRN>1).

Unlike smoke and rain, however, verdicts are exhaustive and mutually exclusive. So, while in theory it may be argued that not guilty does not exactly mean innocent, in practice a verdict leads to a binary decision: Convict or Acquit. A non-conviction is, for all intents and purposes, an acquittal.

A jury is a tug of war between jurors who consider the defendant guilty (LR>1) and jurors who consider him innocent (LR<1). Let’s call LR the (geometric) average of jurors’ LRi. Then we can write PO = LRN∙BO and therefore:

N is the number of jurors required to reach a certain verdict. N depends on:

1. The standard of proof, PO. For example, a conviction beyond reasonable doubt requires PO>9 (PP>90%) or PO>99 (PP>99%). But a preponderance of evidence (PP>50%) only requires PO>1.
2. The prior odds of Guilt, BO. Presumption of Innocence requires BO to be close, but not equal to 0. Prior indifference means BO=1.
3. Jurors’ average accuracy, as summarized by LR. In particular, LR>1 indicates a preponderance of Guilt-oriented jurors, LR<1 a preponderance of Innocence-oriented jurors, while LR=1 indicates a hung jury.

For example, as shown in the following figure for LR=1.1, if BR=1% then PP>90% implies N>71, which is remarkably close to the number of witnesses required to convict a bishop. Whereas a preponderance of evidence (PP>50%) would only require N>48, close to the number of witnesses required for the conviction of a cardinal priest.

Alternatively, a lower requirement for the latter conviction could be seen as the result of a higher presumption of Guilt: with BR=10% a conviction beyond reasonable doubt would require at least 46 jurors, while with prior indifference (BR=50%) only 23 would suffice – the same as required for a preponderance of evidence and BR=10%. Of course, just one juror would be enough for a conviction on a preponderance of evidence based on prior indifference, while 9 jurors would be required if BR=30% – more than enough for dodgy exorcists.

A heavier preponderance of Guilt-oriented jurors would increase LR and shift the graphs to the left, thus lowering the required N. On the other hand, as shown in the following figure for LR=0.9, a preponderance of Innocence-oriented jurors would lower the probability of Guilt below the prior.

In the case of Sebek-hotep, a presumption of innocence (BR=10% or BR=1%) would only require one juror. But with prior indifference (BR=50%, which is more appropriate in his case), he would need at least 23 jurors to prove his innocence (or bona fide) beyond reasonable doubt (PP<10%). For just three jurors to suffice, their average LR would have to be 0.5.

Notice however that both the medieval and the Egyptian texts do not refer to jurors, who can balance different views about Guilt and Innocence, but to witnesses, who are called upon only to confirm the hypothesis. Sebek-hotep was even allowed to produce his own witnesses, leaving the hapless Taou to depend on their probity. Presumably, the medieval tribunal would summon independent witnesses, but only insofar as, being witness to the offence, they could confirm Guilt. Apparently, bishops and exorcists were not allowed to produce their own disconfirming evidence.

Right along with primal probability came its most pervasive pitfall: the Confirmation Bias.

• Well put. Amazing about the theorem giving some numbers that can be related to the medieval text.
Maybe it’s a little misleading to say that 17th century work on dice etc is about evidence resulting from replicable experiments. Dice throwing is a very bad model of anything to do with evidence, as the stochastic element means that any evidence about the outcome is washed out in the random throw. It takes a lot of effort to make work about stochastics relevant to evaluation of evidence (but the example given above about the jury theorem is a genuine start on it.)