With Italy in lockdown and London about to follow, let’s see what we can say in our framework about the coronavirus pandemic.

Funnily enough, the Blinded By Evidence paper starts with a virus. You hear about it on TV and worry you might have it. So you take a test that will tell you with 100% certainty that you have the virus if you actually have it – False Negative Rate (FNR)=0% – and with 95% certainty that you don’t have the virus if you actually don’t have it – False Positive Rate (FPR)=5%. The test comes back positive and you panic, until you are shown that the probability that you have the virus, given that you tested positive, is not near 100%, as you feared, but less than 2%. The reason is that the Base Rate of the virus – its frequency in the population, giving you the probability that you had the virus before you took the test – is 0.1%. And the reason why you were so off the mark is what in our framework we call the Prior Indifference Fallacy: blinded by the test result, you ignored the Base Rate, until reminded of its importance for a correct interpretation of the evidence.

So what’s happening with the coronavirus?

A major difference between our neat stylised story and the messy reality of coronavirus is in the Base Rate. The Base Rate in the story is a known given number – one in a thousand. But what is the Base Rate of the coronavirus? Nobody knows. All we know is that the virus is highly contagious and is spreading. But how many infected people are out there at any point in time? How are they distributed? How can we spot them? We just don’t know. We only know how many have been spotted, as a number of suspect cases – people exhibiting specific symptoms – have been tested and some of them have come out positive. But what about the others – the infected people who have not been tested because they haven’t shown any symptoms, don’t even know they are carrying the virus and go happily about infecting other people? We have no idea. We can only infer that there must be a positive relationship between spotted and unspotted cases – the good old cockroach theory – but what is the multiple? How many unspotted cases are there for each spotted case? We don’t know.

But that’s what we would like to know. As sorry as we are for the known number of spotted cases, and relieved that they are being identified, isolated and treated, it is the unspotted cases that we worry about. How many are they? How fast are they growing? What is the probability that we will get infected by one of them and join their number? What is the Base Rate of the coronavirus?

Such basic questions, but no answers. And, worse, little interest in finding out. Unlike in our story, the coronavirus Base Rate is unknown. But, just like in our story, we fail to recognise its importance for the purpose of finding a correct answer to our questions.

The reason is the same: we are blinded by evidence.

In the story, our question is: what is the probability that we are infected, given that we tested positive? Blinded by the test result, we neglect to account for the small Base Rate and end up with a gross overestimation of the posterior probability.

With the coronavirus, we would also like to be tested. But we can’t, since the RT PCR test that is being used to detect the virus has been confined to suspect cases and is not available to the general public. Unable to take the test on ourselves, our question becomes: what is the probability that we are infected, given that a number of other people tested positive? As in our story, without a test we are naturally drawn to looking for the virus frequency: how many infected people are there as a percentage of the population we interact with? What is the probability that one of them will infect us? Is it small, like the one in a thousand in our story? Or is it “at least 50%”, as yesterday my friend Enzo warned me it is in Milan, begging me not to go there?

No one tells us. So we try ourselves. We look at the data, and what do we see? One horrible figure: the total number of spotted cases, ominously growing day by day. From there, we infer that the number of unspotted cases must be growing at the same pace if not faster, and that it is an unnervingly unknown but surely large multiple of the spotted cases. And, like the character in our story, we panic. We are blinded by evidence. In the story, the panic is caused by Base Rate neglect. With the coronavirus, it is caused by Base Rate inflation.

Let’s see why. The number of spotted cases is the number of people who tested positive out of the number of people who got tested. Clearly, the more people get tested, the larger is the number of spotted cases. So we look at their ratio. This would be a good estimate of the Base Rate if, and only if, the tested people were a random sample of the population of interest. But they aren’t. The tested sample is mainly composed of suspect cases – people who are tested because they show specific symptoms or because they have been in contact with spotted cases. As such, it is far from being random: the prior probability that a suspect case is infected is much higher than if he was picked at random. Hence the ratio of the number of positives over the number of tests is a gross overestimation of the true Base Rate.

Let’s take for example the latest daily Bulletin from the Italian Health ministry:

And let’s look at Lombardy, where the early cases showed up in February and where almost 50% of cumulative total cases (Casi Totali, in orange) are still concentrated. Total cases in Lombardy amount to 19,884, out of 52,244 tested people (Tamponi, in grey). Their ratio, 38%, is the percentage of tested people who turned out positive. Does it mean that that almost one in four of 10 million Lombards are infected? Obviously not. Likewise, the true Base Rate of infections is not 8% in Veneto or 22% in the whole of Italy.

What is it then? We don’t know. In principle, however, estimating the coronavirus Base Rate would be quite simple. Take an unbiased, well stratified, random sample of the population of interest – a routine statistical technique commonly used in opinion polls and market research – and test them. Provided the test is sufficiently accurate, the percentage of positives is a good estimate of the Base Rate.

Crucially, the tested sample would have to be a fair representation of the general population, and therefore include symptomatic *as well as* asymptomatic people. This is in contrast with the current practice of confining tests to suspect cases – a reasonable procedure when priority must be given to identifying and securing as many infected people as possible, but an erroneous one, as we have seen, when the goal is to estimate the extent of the virus spread.

The advantage of having a detailed, localised and regularly updated map of coronavirus Base Rates should be obvious. It would give us a basic idea of the frequency of infection in different places and its evolution over time, thus helping us – at an individual level as well as at a public policy level – to modulate our response, focusing it more in areas where the Base Rate is higher and growing, and less in areas where it is lower and stable.

At an individual level, it would help our apprehension to know that the Base Rate in our area is, say, 1%, rather than the imaginary multiple perceived by mask-wearing people. Before you say 1% is too low, think that it would mean 100,000 infections in Lombardy – about five times the current number of spotted cases – and more than 600,000 in Italy – about fifteen times the spotted cases. If it is higher we worry a bit more, if it is lower we worry a bit less. But it would benefit our health to know what it is, and that it is far lower than the hyperbolic figures implied by Base Rate inflation.

At the policy level, the benefits of a differentiated approach versus the blanket lockdowns being imposed in Italy and other countries should also be evident, in terms of increased focus where focus is mostly needed and a reduction of the huge social and economic costs currently imposed on everyone.

So the question is: why is not done?

One answer is that the standard RT PCR test requires a complicated and lengthy procedure and does not lend itself to mass testing – hence the priority set on testing suspect cases. But then the South Korean experience has shown us that mass testing is possible, and that it can be very useful. Similar evidence has come from a small town in Veneto. In addition, several companies, including Roche and the Italian Diasorin, have recently developed cheaper and faster tests.

Another objection is that random testing would produce volatile results, as e.g. one negative case today may turn positive tomorrow. But that is in the very nature of all testing, where variability is dealt with by averaging results on properly sized randomised samples, which do not have to be very large to represent much larger populations with a small margin of error. It is just like any poll, say a Leave/Remain Brexit poll (remember?). In fact, making sense of that variability is the very reason why polls are taken and retaken over time.

A third objection is the one in our story: if the Base Rate is small, even very accurate tests can produce a large number of False Positives and False Negatives. But we know the answer to that: repeat the tests – one positive is unreliable, two positives is dependable, three positives almost certain.

So my answer is: Base Rate testing should be done, and I echo WHO Director-General’s ‘simple message for all countries: test, test, test’.

But randomise.