Nothing epitomises the world’s stunned unpreparedness for the fearsome escalation of the coronavirus pandemic better than the lingering dispute about the appropriateness of mass testing.
Until recently, the main objection to mass testing had been a practical one: a scarcity of RT PCR test kits, combined with the complexity and length of the testing procedure, meant that their use needed to be rationed and supervised, with priority given to identifying as many infections as possible, starting from people who showed specific symptoms and were therefore more likely to be infected in the first place.
This was always a weak argument, and it became increasingly surreal as any amount of costs and efforts of getting more tests done paled into insignificance compared to the gargantuan social and economic costs of all other measures enacted around the world. In any case, the point is now being superseded by the appearance of an increasing number of simpler and faster tests, which greatly extend testing capacity. With supply constraints on the way to being removed, a widespread consensus is finally developing about the need to extend testing beyond symptomatic cases, first to healthcare workers and other people more exposed to the risk of infection, then to people with milder symptoms or no symptoms at all, and ultimately to whoever wants to be tested.
There is still little focus, however, on taking advantage of virtually unconstrained testing resources to fulfil the need for randomised testing aimed at measuring and monitoring the virus Base Rate. The benefits of knowing the virus prevalence in the general population are hardly missed. But efforts have so far been concentrated on estimating it through epidemiological models – whose varying conclusions depend on a number of uncertain parameters – rather than on measuring it directly by sampling observation.
A firm empirical grip on the virus Base Rate is the necessary foundation on which evidence can be used to test the infection hypothesis (here is a video primer on the key concepts used henceforth).
A first-line source of evidence of infection is given by symptoms: fever, cough, shortness of breath, fatigue, loss of taste and smell, etc. A person with symptoms has a higher probability of being infected than a person without. We say that P(I|S), the probability of Infection, given Symptoms, is higher than P(I), the prior probability or Base Rate of infection. We call such evidence confirmative.
How much higher? This is measured by Accuracy, which depends on two variables: the True Positive Rate TPR=P(S|I) – the probability of Symptoms, given Infection – and TNR=P(no S|no I) – the probability of no Symptoms, given no Infection. In a clinical context, TPR is known as Sensitivity and TNR as Specificity. A natural measure of overall Accuracy is the average of the two: A=(TPR+TNR)/2. Perfect evidence has maximum Sensitivity (TPR=1) and maximum Specificity (TNR=1), hence maximum Accuracy A=1. Imperfect evidence has TPR<1 and/or TNR<1, hence A<1.
A key relation to notice is that TPR=1-FNR and TNR=1-FPR, where FNR=P(no S|I) is the False Negative Rate – the probability of no Symptoms, given Infection – and FPR=P(S|no I) is the False Positive Rate – the probability of Symptoms, given no Infection. Hence maximum Sensitivity has FNR=0 – no False Negatives – and maximum Specificity has FPR=0 – no False Positives. Notice A=0.5+(TPR-FPR)/2. Also, simple maths shows that evidence is confirmative if TPR/FPR>1 or, likewise, FNR/TNR<1.
Symptoms are confirmative evidence of infection, but they are quite inaccurate. Sensitivity is inherently low: FNR>0 – this is indeed a key issue with the coronavirus: there is a high number of asymptomatic infections. And, in most cases, Specificity is also low: FPR>0 – a fever or a cough do not necessarily imply an infection. Admittedly, the more specific the symptoms, the lower is FPR and the higher is the probability of infection. In the limit, an accumulation of symptoms – fever and cough and cold and shortness of breath etc. – can amount to a Smoking Gun: evidence so specific as to exclude False Positives and provide conclusive evidence of infection. But remember that conclusive evidence is not the same as perfect evidence: absence of pathognomonic symptoms does not prove absence of infection. Accuracy needs high Specificity as well as high Sensitivity.
This point is often missed: it is no use evaluating evidence by its Sensitivity alone or by its Specificity alone. Think of a parrot always shouting: Infected! It would have maximum Sensitivity – no False Negatives – but zero Specificity. Likewise, a parrot always shouting: Healthy! would have maximum Specificity – no False Positives – but zero Sensitivity. More sensibly, think of an airport hand luggage scanner that always beeps, or of an equally useless one that never does.
Symptoms are usually not accurate enough to prove or disprove the infection hypothesis. That’s why we need tests. Tests are not perfect either: they all produce some False Negatives and/or False Positives. But these can be properly measured. Like a good hand luggage scanner, a good test minimises both and optimises their trade-off.
A good scanner needs to have high, ideally maximum Sensitivity, as to avoid False Negatives: it cannot let a gun go through. A perfect scanner would also have maximum Specificity – no False Positives: it would only pick up the bad stuff and never give false alarms. Failing that, however, we obviously prefer Sensitivity to Specificity – we want to make sure that every explosive device is picked up, even if most suspect objects turn out to be innocuous. We tolerate less Specificity to ensure maximum Sensitivity. At the same time, however, we want Specificity to be as high as possible – inspecting every piece of luggage that gives a false alarm would result in massive chaos and missed flights.
Likewise, a good virus test needs to spot every infection, even if that means scaring some people with a false alarm. Such was the test in our story: FNR=0% and FPR=5% – no False Negatives and a small percentage of False Positives. There we saw that the probability of infection, given a positive test result, depends on the Base Rate: despite high accuracy, a low Base Rate implies a low probability – that is why, by the way, we are not flustered when we hear an airport scanner beep: we know it is likely to be a false alarm. And we saw that with a low Base Rate there is a simple way to deal with alarms: repeat the test. One positive result is no reason for concern, two positives draw our attention, three positives are bad news. On the other hand, we have seen that a negative test result at any stage gives us complete peace of mind: maximum Sensitivity means that the probability of infection, given a negative result, is zero, irrespective of the Base Rate.
How good is the standard RT PCR test in detecting the coronavirus? To my surprise, its accuracy does not seem to be a well-known, well established and agreed-upon number. Worse, it is hardly ever a point of discussion – as if the test were just assumed to be perfect. Well, it isn’t. According to some measures, its Sensitivity – the most important side of accuracy – may be as low as 70% or lower. (A horrific story has it that Dr Li Wenliang, the ophthalmologist who first warned about the Wuhan outbreak in January, tested negative several times before dying from the infection a few weeks later). On the other hand, the test seems to be highly specific: a positive result implies an almost certain infection.
Let’s then assume that’s the case and say FNR=30% and FPR=0% – some False Negatives and no False Positives. This is the mirror image of the maximum Sensitivity test in our story. With maximum Specificity, the probability of infection, given a positive test result, is 100%, irrespective of the Base Rate. On the other hand, with Sensitivity at 70% the probability of infection, given a negative test result, is not zero, but depends on the Base Rate. Namely, if the Base rate is low, say 0.1%, the probability is practically zero. But if the Base Rate is higher, it is well above zero. Let’s say for instance that the Base Rate is 50% – a reasonable assumption for the prior probability of infection in a symptomatic person. Then the probability of infection following a negative result is 23%. This is well below the prior probability – the test is confirmative – but is certainly not low enough to exclude infection. To do so, a second test is needed, which would prove infection in case of a positive result, and would lower the probability of infection to 8% in case of a negative result. Hence, for peace of mind we would need a third test, which again would prove infection if positive, and, if negative, would lower the probability of infection to a comfortable 2.6%.
At this level of accuracy, therefore, the RT PCR test is like an enhanced version of an accumulation of specific symptoms: a Smoking Gun that will certainly spot an infection if there is one, but will not prove absence of infection if there isn’t one, unless repeated several times. It follows that, if the hallmark of a good test is to let no infection go undetected – zero False Negatives – a maximum Specificity test is not as good as a maximum Sensitivity test.
This makes little difference if the Base Rate of infection is low. With a negative result, a maximum Sensitivity test guarantees a zero probability of infection whatever the Base Rate, but a maximum Specificity test is almost as good: one negative result is sufficient to reduce the already low Base Rate to almost zero. This is still not good enough if our aim is to avoid a bomb on a plane. But we can live with it if, despite media hype, we accept that a few undetected infections are not as dangerous.
It makes a big difference, however, if the Base Rate is high. In this case, a negative result in a maximum Sensitivity test still guarantees a zero probability of infection, but in a maximum Specificity test it only reduces the probability to what might still be an uncomfortably high level, which could only be lowered by repeating the test several times.
Yet, since the start of the epidemic, RT PCR tests have been targeted on symptomatic cases – people for whom the prior probability of infection was already high before the test. There was a good reason for it: the priority in the early stages was to confirm suspect infections, and isolate and treat the infected. But how many infected people have been ‘cleared’ after one negative test result, and went about infecting others?
RT PCR tests have been used on the wrong targets. They are more appropriate for asymptomatic cases, where the prior probability of infection is low, than for symptomatic cases, where the probability is high. The more specific the symptoms, the higher is the probability of infection. What is the point, then, of testing a symptomatic case just to prove for certain what is already quite likely, while running a high risk of missing a large number of False Negatives?
The most appropriate test for a symptomatic case is not a Smoking Gun, where a positive result proves that the infection hypothesis is true. It is a Barking Dog, where a negative result proves that the hypothesis is false.
Little is known about the degree and type of accuracy of the numerous tests currently being evaluated under the EUA protocol. Ideally, we would like to see both maximum Sensitivity and maximum Specificity tests. Used in conjunction, they would yield a certain answer to the infection hypothesis, irrespective of the Base Rate of infection. Failing that, however, estimating the Base Rate of infection in the general population is a crucial step for a correct interpretation of the test results.
Once we know the test accuracy, as defined by TPR and FPR, the Base Rate BR can be easily derived from
where P(+) is the probability of a positive test result. Hence:
For instance, let’s say we test 10,000 people and 595 of them test positive, hence P(+)=5.95%. If the test accuracy is TPR=100% and FPR=5%, as in the maximum Sensitivity test in our story, then BR=1%. Similarly, if accuracy is TPR=70% and FPR=0%, as in our assumed maximum Specificity RT PCR test, and 70 people test positive, then P(+)=0.70% and again BR=1%.
Notice by the way that this is a general result, valid for any level of accuracy. Say for instance we only have a horribly inaccurate, disconfirmative test, with TPR=30% and FPR=60%. Nevertheless, if we observe that 5970 people test positive, then P(+)=59.7% and again we can conclude that the Base Rate of infection is 1%.
A test with a known level of accuracy is all we need to derive the Base Rate of infection. Crucially, however, this will be the Base Rate of the tested population. Hence, if tests are only performed on symptomatic cases, there will be many more positive results, and the derived BR will be much higher – in fact equal to P(+)/0.7, i.e. 43% higher than the percentage of positives cases, under the assumed accuracy of the RT PCR test. As we saw in the previous post, taking such number as an estimate of the prevalence of infection in the general population would therefore be a gross miscalculation. It would be as if in 2016 Brexit support had been estimated by polling UKIP voters, or Trump support by polling NRA members.
A correct estimate of the true Base Rate of infection can only be obtained by testing a randomly selected, representative cross section of the general population of interest.
The message is finally getting across.