A cynic’s definition of a value investor: someone who seeks to buy at 40 cents a business that is worth a dollar and to invest in a business that is able to charge a dollar for what is worth 40 cents.
What does a value investor do when, walking around the aisles of a Tesco supermarket, he is faced with the choice of buying a 2-litre bottle of Coca Cola at £1.59 or, right next to it, a 2-litre bottle of Tesco Cola for £0.50?
I recently faced the same question when my friend Sherri and I were looking to buy the ingredients for a proper ‘full English’ breakfast for Sunday morning – sausages, bacon, eggs, mushrooms, hash browns and baked beans. When it came to the latter, we had to choose between buying a can of Heinz baked beans for £0.85 or an otherwise identical can of Tesco baked beans for £0.30.
As I value-investingly fetched the Tesco can and put it in the basket, Sherri looked at me with an air of amused disapproval: “Come on, I think we can afford a can of Heinz – it’s the weekend!” “What do you mean, it’s the same stuff, isn’t it?” – I replied, playing on and launching into a tale of my visits to La Doria – the Italian firm where the Tesco beans probably came from – which somehow failed to raise her interest. “I bet you can’t tell the difference”. “Of course I can – said Sherri – there is a reason why Heinz beans cost more: they are better quality and they taste better.” “Okay, we’ll see” – I said, as I put both cans in the basket and started to savour the opportunity to run my own version of the most famous experiment in the history of statistics.
Back home, I asked Sherri to bear with me and wait in the living room while I prepared the experiment in the kitchen. I opened both cans and distributed some of their content onto 8 small plates, 4 with Heinz beans and 4 with Tesco beans, and displayed them in two rows.
Then I took Sherri, blindfolded her to eliminate the chance that she would spot a visual difference – the beans looked the same to me but you never know – and asked her to sit in front of the plates.
“There are 4 pairs of plates in front of you. For each pair, one plate contains Heinz beans and the other has Tesco beans. Each plate has its own spoon to avoid contamination. I would like to ask you to taste some beans from each plate, and for each of the 4 pairs tell me which one is Heinz and which one is Tesco”. “Okay” – said Sherri, anticipating a quick dash to victory. She tasted the first pair, and after a few seconds, over which I could see her realise that the task was not as easy as she had thought, she indicated which was which. She was right. “Okay, onto the second pair” – I said, with the acquired taste of a proud scientist enjoying the chance of being proven wrong. After sipping some water to clear her palate, Sherri proceeded and, after a few more seconds of hesitation, made her choice. “Wrong” – I said, with as soft a tone as I could muster to avoid hurting her feelings. “Okay okay, that can happen” – she retorted. “Sure it can” – I said, as I placed the third pair in front of her, this time actually hoping to be proven wrong. Alas, she was wrong again. And so she was in the fourth and final choice.
“All right all right, mister, you’ve proven your point” – she said with a smile, taking the blind off her beautiful eyes. “Yep” – I said, myself surprised by the embarrassing abundance of confirmatory evidence in favour of my hypothesis of interest. 2-2 would have been a better result – still proving my point but leaving her with some sense of dignified achievement.
“Okay, all done? Off we go” – said Sherri, as I cleared the table and she started to prepare her succulent ‘full English’. I transferred the beans into two bowls, one for Heinz and one for Tesco, and added the rest of the cans to each. One can was enough for the breakfast. So I asked Sherri which bowl she wanted me to put in the microwave – the Heinz, which she had just chosen as the better tasting 1 out of 4 times, or the Tesco, which she had chosen 3 out of 4 times?
“Heinz please” – she answered with a wry smile, aware of the inconsistency of her choice but happy to go along with her habitual preferences. I obliged and sat down, in quiet contemplation of the sheer power of established franchises and the value of brand moats that make Kraft Heinz a $42 billion company.
Lauretta – why people like online anonymity never ceases to boggle me (sorry!) – referred to an article of mine on MOI Global, resulting from the combination of two posts here and here. She was struck by a line in the initial dialogue between the layman and the professionals, where the layman cuts his conversation with the indexer saying he should be ashamed of himself. She asked the Forum:
‘Now, shame is a very powerful negative emotion, at least in my culture (I am Italian and so is the author of that article, and I think he uses that word intentionally, to signal that it’s something very bad, something one should feel bad about).
What is the response you would give to his objections, so that you can feel good about indexing (and about people like Bogle who promoted it?)’
Wow. Talk about striking. Did you freeze there, Lauretta, or finish the article? I assure you there is a lot more to it than a passing tongue-in-cheek jibe. I also invite you to take a look at this presentation, in particular slide 14.
I must say, however, to Laura’s praise, that at least she wondered – Thaumazein is a powerful force. But what about some of the answers of the Bogleheads that came to her rescue?
Runner 3081: ‘I view and quickly discard what I just read.’
Way to go, Runner. Can’t wait to read your other 3410 pearls of wisdom.
adamthesmythe: ‘Must one respond? (I don’t think so).’
Kenkat: ‘I think it is a passionate attempt to appeal to emotion in order to separate you from your money’.
Nice to meet you too, Kenkat.
MathIsMyWayr: ‘If you take a big spoonful of salty water from a bowl, does the salt concentration of water remaining in the bowl change?’
More to the point:
Steve321: ‘I looked at the blog you refer to; he is managing or co-managing a fund apparently, called Atomo Made in Italy. I looked it up, the ISIN is LU1391064661. Current fees are4.53% and they are underperforming. So they he (sic) is probably feeling ashamed too.’
Awright Steve. Underperforming what? The Made in Italy Fund has recently had its four-year anniversary. Its return, after all expenses, is 19.2%, versus 1.1% for its most comparable ETF – the Lyxor FTSE Italian Mid Cap. Volatility is 14.8% versus 17.6%. Have a look here – prove me wrong and I’ll buy you another pint.
whereskyle: ‘I would retort and send the shame right back at him (I also have Italian ancestry): Anyone who charges money for advice about individual stock picking should be ashamed of themselves. Virtually all academics subscribe to the idea that the most reliable strategy in picking stocks is to buy the entire market because most market moves are determined by nothing more than a 50/50 coin flip (See Malkiel, A Random Walk Down Wall Street.) The slightest suggestion that one should pay someone to buy their stocks for them according to “a system” that creates market-beating value for the actual investor is highly spurious, almost certainly wrong, and harmful to most investors who might hear it.’
May I ask you: Where-have you been in the last thirty years-skyle? As a refresher, I invite you to read this (or here and here). The presentation mentioned above would also do.
This is the best:
zarci: ‘The fact that the author is an active investment fund manager, and uses a term that is super sensitive in Italian would imply, at least to me, that I would not let this person babysit my children.’
Surprisingly, however, zarci does have something intelligent to say:
First, a market in which the majority of investors are indexers cannot exist in reality. There are too many institutional investors, bankers and other players for that reality to manisfest (sic). The world bank releases figures that show the distribution of investors, might try to find that…
Secondly, there is an incredible overgrowth of different types of ETFs ranging from factors, to small cap, to tech stocks. The list goes on. Even amongst people that invest in a very broadly diversified slice of the market, there is still an incredible amount of bias that provides movement in asset prices.
Just looking at the Bogleheads board topics during the plunge earlier this year would suggest that even the members that subscribe to this board could drive some interesting price action.
Ok, skip the first point. But the second is important. typical.investor nails it:
‘Every single indexer makes active choices that contributes to price discovery when we choose how much to allocate to 1) risky vs. safe assets 2) US vs Dev Intl vs Em 3) and market cap size (large/mid only or also small).
Many indexers go beyond that and make active choices to purse 1) various factors such as value, momentum, quality etc. and 2) certain industries such as REITS, health care, tech, energy etc.
Indexers also make active decisions on whether to rebalance and the timing of it.’
Right. My point is that passive funds are free riders and contribute nothing to price discovery. But this is not the same as saying that investors who allocate among them do not influence asset prices. They do, obviously and evidently. What is as obvious and as evident to me, however, is the inconsistency of saying, on the one hand: There is no point discussing whether Apple, Tesla or any other stocks are over or under valued – just buy their index weight. While, on the other hand, arguing over the relative valuation of US vs. Emerging Markets stocks, or Value vs. Momentum, Large Caps vs. Small Caps, Technology vs. Healthcare etc., and allocate accordingly through passive funds.
Why should market efficiency hold across stocks but not across ‘asset classes’?
This a logical, not an ethical argument. The ethical aspect – if one wants to look at it – is to ignore the inconsistency or quietly brush it aside, thus belittling stock pickers while validating asset allocators.
And, yes, I believe that owning a stock just because it is part of an index is ethically absurd. But you know, Italians…
Before getting the result, I had called the Ipsos MORI helpline to see if they could give me more information about the test and its accuracy. I must not have been the first one to enquire about accuracy, because the helpful operator had a prompt answer: ‘If you’re positive, you definitely have the virus; if you’re negative, you most probably don’t have it, but you can’t be certain’. He was not as clued-up about the test manufacturer, but he came back to me after checking with his supervisor: ‘I believe it is called Wuxi‘.
So apparently I have taken a maximum Specificity Smoking Gun test: a positive result would have been conclusive proof of infection, irrespective of the Base Rate. But I came out negative – as I almost surely expected, given that, without symptoms, I could safely assume my prior probability of infection to be as low as the ONS Base Rate estimate. In the meantime, this had gone up to 0.09% (with a 95% confidence interval of 0.04% – 0.19%), or 1 in 1100 – curiously almost identical to the assumption in my original virus story:
(Strangely enough, given the media penchant for alarming but meaningless statistics, such ‘50% increase’ in the infection rate from a week earlier remained unreported).
However small my priors, seeing them further reduced to near zero in the light of a negative test result was a good feeling. Me being me, however, I called the helpline a second time after the results and asked the same questions. Lo and behold, I got… different answers. This time the operator – a different person – while reassuring me that the test was ‘quite accurate’, would not commit to giving ‘percentages’. And the reported manufacturer was different – ‘either Wondfo or Orientgene‘.
Oh well. None of the three Chinese manufacturers report any accuracy information on their websites. But as long as their tests are ‘quite accurate’ – i.e. somewhat confirmative – a negative result from a low Base Rate gives me, and people around me, virtual certainty that I am not infected.
But what if the result had turned out to be positive? In that case, whether the first operator was right would have mattered a great deal. A positive result from a maximum Specificity test means certain infection. But with a low Base Rate of infection even a small deviation from 100% Specificity means that a positive result is very likely to be a False Positive.
Say for example that, as in the Table below, Specificity is not 100% but 95% – still very high. And say that Sensitivity is 70%. With the current ONS Base Rate of 0.09%, 9 out of 10,000 people have the virus. Of these, 6 will test positive and 3 will test negative. Whereas of the 9,991 people who do not have the virus, 500 will test positive and 9,491 will test negative. It follows that PP, the probability of infection given a positive test result, is as low as 6/506=1.25% (allow for rounding). Whereas NP, the probability of infection after a negative test result, is 3/9,494=0.03%.
In other words, of the 506 people who test positive, only 6 are True Positives – 1 out of 80 – and 500 are False Positives. Whereas of the 9,494 people who test negative, 9,491 are True Negatives and only 3 – 1 out of 3,516 – are False Negatives.
You can play with the blue numbers on this spreadsheet. You will see that even with a 99% Specificity PP remains small at less than 6% – 1 out of 17. Whereas NP is still approximately one third of the Base Rate – 1 out of 3,664.
Only with maximum 100% Specificity will PP jump all the way to 100% – no False Positives – whereas NP is even smaller at 1 out of 3,701.
You can also see that results are not very sensitive to Base Rate variations. 0.09% is the average infection rate in England, but the ONS estimates that it is currently higher (56% higher!) in London, at 0.14% (with a 95% confidence interval of 0.04% – 0.32%):
Plug 0.14% or even 0.32% in the BR cell and you will see that the resulting increases in PP and NP are small. That is why, although I was pleased with the negative result, it was what I almost surely expected – just as I would almost surely expect to draw a white ball from an urn containing 1100 white balls and 1 black ball – or even 313 white balls, if I plug the upper bound of the London confidence interval. After the test, my urn contains many more white balls, but there were plenty before.
Obviously, all the numbers above rest on the ONS Base Rate estimate, which is the right prior assumption in the absence of symptoms. Raise BR to, say, 50% – which would be a reasonable assumption if I had sufficiently specific symptoms – and the numbers are entirely different: PP is 93% and, crucially, NP is 24% – a 1 in 4 chance of a False Negative.
This raises the question: what is the accuracy of the tests used in the ONS study? The answer is in Paragraph 10 of their methodology guide: “we think the sensitivity of the test that the pilot study uses is plausibly between 85% and 95% (with around 95% probability) and the specificity of the test above 95%”. There is no information about the test manufacturers but, assuming they are the same or similar to the ones used by Ipsos MORI, then the first operator was wrong: the test I took is not a Smoking Gun. Based on BR=0.09%, a test with, say, 90% Sensitivity and 97% Specificity further reduces NP to 0.01% – 1 out of 10,769 – which pleases me even more. But PP is not 100%: it is 2.6%.
Think about it: 10,000 people are tested and 308 unlucky ones come out positive. But most of them – all but 8 – are False Positives. The ONS can account for test inaccuracy and cut the 3.08% positive rate down to arrive at the 0.09% Base Rate. But what do they tell the positives? What are they counted as? The same is true for Ipsos MORI and whoever is testing asymptomatic people in a low Base Rate population. How many of the reported cases we hear about every day are False Positives wrongly counted as True Positives?
Anyway, I am a happy negative. Yes, I might still be the 1 in 10,000 unlucky False Negative (or 3 in 10,000 if BR=0.32%). And let’s add to it the chance that, despite dutifully following precise instructions, I might have bungled my self-test – a tricky affair: I was wary about the nose poking, but nudging my tonsils and the nasty gagging reflex that came with it was worse.
But overall it’s a tiny risk, much smaller than other risks I happily live with every day.
Obviously, not being infected today does not mean that I cannot get infected tomorrow. So I will continue my social distancing and hand washing. But I will again run the risk I took in questioning the rationale of blanket lockdowns. Call me a Palm Beach crackpot – what’s wrong with the place? – but now that I know I am not an asymptomatic carrier merrily going about infecting other people, I won’t wear a mask if I don’t have to.
Leaving effectiveness aside, there is no point bearing the cost of reducing a risk that is small enough already.
A positive effect of the coronavirus pandemonium has been the bare exposure of the naïve view of science as the repository of certainty. As humdrum media kept informing the public about ‘what science says’ and governments stuck to the mantra of ‘being driven by science’, scientists themselves staged a dazzling display of varying views, findings, recommendations and guidance.
The treacherous misconception according to which science knows the truth and scientists impart it received a mighty blow – I’d dare to say final, but I won’t. People know that economists disagree, and are used to it – whence the vacuous question: Is economics a science? Even more so for finance experts, where their different views and opinions are the very essence of financial markets – in the face of a standard academic theory still based on the hyperuranian assumption of common knowledge. But when it comes to ‘real’ sciences, people expect experts to reach incontrovertible conclusions firmly grounded on objective evidence – the opposite of what they got from virologists, epidemiologists and assorted scientists around the world in the last few months.
Scientific disagreement should be no surprise: far from being the preserve of certainty, science is the realm of uncertainty. Scientists pursue certainty by asking appropriate questions, but are entirely comfortable with the uncertainty of provisional answers. It is not up to them to decide what to do with their findings.
What is surprising, however, is that most scientists at work on the pandemic anywhere in the world have failed not only to answer but to even ask a most basic question: how many people are infected? ‘A lot’ may have been an understandably quick answer in the initial stage of the tsunami, when all frantic efforts were focused on identifying and treating as many infections as possible. But when, by the beginning of March, the time came to take vital decisions on how best to contain the virus spread, hardly anyone pointed out that a more precise answer was necessary. John Ioannidis did it in mid-March; Giorgio Alleva et al. did it a little later, also providing an outstanding description of the operational framework required to overcome ‘convenience sampling’. A few others did, but no one heard. Instead, starting from Italy on 9 March, one country after another decided to impose blanket lockdowns, varying to some degree in intensity and scope, but all uniformly applied across the entire national territory, irrespective of what would have surely emerged as wide geographical variations in the Base Rate of infections.
Yinon Weiss’ trilogy spares me the task of expounding on what happened next – I agree with virtually everything he wrote. I add two observations. One, there is a stark parallel with the 2008 Great Financial Crisis, where fear of dread drove attention to the gloomiest scenarios of the most hyperbolic doomsayers. This had the disastrous effect of swaying many investors into locking in heavy losses and missing the 2009 turnaround. In the coronavirus panic, the direst predictions persuaded people to willingly acquiesce to unprecedented living conditions for the greater good of saving lives, while being largely oblivious to any consideration of future costs. Second, I hardly need to specify that questioning the appropriateness of lockdown measures has nothing to do with the foolish nonsense of virus deniers and assorted lunatics, no matter how they may attempt to hijack the arguments. Discussing the lockdowns does not mean rejecting their effectiveness in stemming the virus spread, let alone doubting their necessity in specific circumstances. It means assessing their impact vis à vis a full evaluation of their costs and alternative courses of action.
In this regard, as infections have started to recede, a major question currently being asked is what could have been done better with the benefit of hindsight. Unsurprisingly, the common answer is more of the same: earlier and stricter lockdowns. One notable exception, however, came from the UK Chief Medical Officer Chris Whitty, who recently admitted his regret for failing to increase testing capacity earlier on. “Many of the problems we had came because we were unable to work out exactly where we were, and we were trying to see our way through the fog.”
Indeed. Only at the end of April the Office of National Statistics started to produce the Coronavirus Infection Survey Pilot, reporting an estimate of the number of people infected with coronavirus in the community population of England, excluding infections reported in hospitals, care homes and other institutional settings. The Base Rate, finally! The first reported number was 148,000 infections, equal to 0.27% of the population – 1 in 370. Since then the number has been trending down, and according to the latest report of 12 June is 33,000, equal to 0.06% of the population – 1 in 1667.
Curiously, on the same day I was invited to take part in a COVID-19 testing research study (Wave 2) conducted by Imperial College London and Ipsos MORI. ‘The study will help the Government work out how many people may have COVID-19 in different areas of the country. The test may indicate whether you currently have the COVID-19 virus. We have chosen your name at random, and participation is completely voluntary’.
Better late than never, I guess. But the question remains: Why did it take so long? Why wade through the fog for five months only guided by rickety models full of crude assumptions? Why guess the virus spread through a highly abstract number rather than actually measure it on the ground?
We will never know what the infection rate was back in January and February – in the UK or anywhere else – and how it varied through time, across different areas, age groups, sex, and other cohorts – the kind of data that Ipsos MORI and other statistical research agencies routinely inundate us with, ahead of elections and in myriads other circumstances. Sure, a viral test is not as easy to carry out as a telephone interview. And, despite earlier warnings at the highest levels, testing capacity back then was widely insufficient. But the mystery is that random testing was nowhere even considered as an option, including – as far as I can tell – in biostatistics and statistical epidemiology departments. The only option on the table were blanket lockdowns, with national governments left to decide their intensity and people left to dread their worst nightmares and bear all costs, in the name of a comforting but misleading precautionary principle.
It is entirely possible that, despite showing cross-sectional and temporal variation, Base Rate data would have been judged too high to leave any alternative to the adopted lockdown policies. But the point is: what is too high? Is the current infection rate in England too high? Presumably not, given that lockdown measures are being relaxed. As the rate has been coming down since late April, it is reasonable to presume that is was higher earlier on. But how high? Was it 1%? 5%? 10%? We’ll never know. And, crucially, whatever it was, it was an average number, higher in certain areas and lower in others, higher for certain cohorts and lower for others, and varying through time. Such critical information would have been of great help in modulating restriction policies, intensifying them where needed and diminishing or even excluding them elsewhere.
Oh well, too late. But the point seems to be finally coming across. Hopefully, there won’t be a Wave 2. But, just in case, random testing will provide more visibility to navigate through its containment.
I am looking forward to taking my test. Thanks to the ONS Base Rate estimate, and not having any symptoms, I am almost sure I will come out negative. The letter does not specify the test’s accuracy – it just says in the Additional Information overleaf that ‘test results are not 100% accurate’. As we have seen, Base Rate estimation does not require high accuracy: as long as its accuracy level is known, any test would do (the same point is made here). But of course accuracy is important at the individual level. So what will happen in the unlikely event that I result positive? It depends. It would be bad news is the test has maximum Specificity – a Smoking Gun: FPR=0%. If not, however, a positive result will very likely be a False Positive. Hence it would be wrong to interpret it as proving that I am infected. Before reaching that conclusion, I would want to repeat the test and, if I am positive again, repeat it a third time.
I hope that this point will be well clarified to the unlucky positives and that they will not be rushed into isolation.
Nothing epitomises the world’s stunned unpreparedness for the fearsome escalation of the coronavirus pandemic better than the lingering dispute about the appropriateness of mass testing.
Until recently, the main objection to mass testing had been a practical one: a scarcity of RT PCR test kits, combined with the complexity and length of the testing procedure, meant that their use needed to be rationed and supervised, with priority given to identifying as many infections as possible, starting from people who showed specific symptoms and were therefore more likely to be infected in the first place.
This was always a weak argument, and it became increasingly surreal as any amount of costs and efforts of getting more tests done paled into insignificance compared to the gargantuan social and economic costs of all other measures enacted around the world. In any case, the point is now being superseded by the appearance of an increasing number of simpler and faster tests, which greatly extend testing capacity. With supply constraints on the way to being removed, a widespread consensus is finally developing about the need to extend testing beyond symptomatic cases, first to healthcare workers and other people more exposed to the risk of infection, then to people with milder symptoms or no symptoms at all, and ultimately to whoever wants to be tested.
There is still little focus, however, on taking advantage of virtually unconstrained testing resources to fulfil the need for randomised testing aimed at measuring and monitoring the virus Base Rate. The benefits of knowing the virus prevalence in the general population are hardly missed. But efforts have so far been concentrated on estimating it through epidemiological models – whose varying conclusions depend on a number of uncertain parameters – rather than on measuring it directly by sampling observation.
A firm empirical grip on the virus Base Rate is the necessary foundation on which evidence can be used to test the infection hypothesis (here is a video primer on the key concepts used henceforth).
A first-line source of evidence of infection is given by symptoms: fever, cough, shortness of breath, fatigue, loss of taste and smell, etc. A person with symptoms has a higher probability of being infected than a person without. We say that P(I|S), the probability of Infection, given Symptoms, is higher than P(I), the prior probability or Base Rate of infection. We call such evidence confirmative.
How much higher? This is measured by Accuracy, which depends on two variables: the True Positive Rate TPR=P(S|I) – the probability of Symptoms, given Infection – and TNR=P(no S|no I) – the probability of no Symptoms, given no Infection. In a clinical context, TPR is known as Sensitivity and TNR as Specificity. A natural measure of overall Accuracy is the average of the two: A=(TPR+TNR)/2. Perfect evidence has maximum Sensitivity (TPR=1) and maximum Specificity (TNR=1), hence maximum Accuracy A=1. Imperfect evidence has TPR<1 and/or TNR<1, hence A<1.
A key relation to notice is that TPR=1-FNR and TNR=1-FPR, where FNR=P(no S|I) is the False Negative Rate – the probability of no Symptoms, given Infection – and FPR=P(S|no I) is the False Positive Rate – the probability of Symptoms, given no Infection. Hence maximum Sensitivity has FNR=0 – no False Negatives – and maximum Specificity has FPR=0 – no False Positives. Notice A=0.5+(TPR-FPR)/2. Also, simple maths shows that evidence is confirmative if TPR/FPR>1 or, likewise, FNR/TNR<1.
Symptoms are confirmative evidence of infection, but they are quite inaccurate. Sensitivity is inherently low: FNR>0 – this is indeed a key issue with the coronavirus: there is a high number of asymptomatic infections. And, in most cases, Specificity is also low: FPR>0 – a fever or a cough do not necessarily imply an infection. Admittedly, the more specific the symptoms, the lower is FPR and the higher is the probability of infection. In the limit, an accumulation of symptoms – fever and cough and cold and shortness of breath etc. – can amount to a Smoking Gun: evidence so specific as to exclude False Positives and provide conclusive evidence of infection. But remember that conclusive evidence is not the same as perfect evidence: absence of pathognomonic symptoms does not prove absence of infection. Accuracy needs high Specificity as well as high Sensitivity.
This point is often missed: it is no use evaluating evidence by its Sensitivity alone or by its Specificity alone. Think of a parrot always shouting: Infected! It would have maximum Sensitivity – no False Negatives – but zero Specificity. Likewise, a parrot always shouting: Healthy! would have maximum Specificity – no False Positives – but zero Sensitivity. More sensibly, think of an airport hand luggage scanner that always beeps, or of an equally useless one that never does.
Symptoms are usually not accurate enough to prove or disprove the infection hypothesis. That’s why we need tests. Tests are not perfect either: they all produce some False Negatives and/or False Positives. But these can be properly measured. Like a good hand luggage scanner, a good test minimises both and optimises their trade-off.
A good scanner needs to have high, ideally maximum Sensitivity, as to avoid False Negatives: it cannot let a gun go through. A perfect scanner would also have maximum Specificity – no False Positives: it would only pick up the bad stuff and never give false alarms. Failing that, however, we obviously prefer Sensitivity to Specificity – we want to make sure that every explosive device is picked up, even if most suspect objects turn out to be innocuous. We tolerate less Specificity to ensure maximum Sensitivity. At the same time, however, we want Specificity to be as high as possible – inspecting every piece of luggage that gives a false alarm would result in massive chaos and missed flights.
Likewise, a good virus test needs to spot every infection, even if that means scaring some people with a false alarm. Such was the test in our story: FNR=0% and FPR=5% – no False Negatives and a small percentage of False Positives. There we saw that the probability of infection, given a positive test result, depends on the Base Rate: despite high accuracy, a low Base Rate implies a low probability – that is why, by the way, we are not flustered when we hear an airport scanner beep: we know it is likely to be a false alarm. And we saw that with a low Base Rate there is a simple way to deal with alarms: repeat the test. One positive result is no reason for concern, two positives draw our attention, three positives are bad news. On the other hand, we have seen that a negative test result at any stage gives us complete peace of mind: maximum Sensitivity means that the probability of infection, given a negative result, is zero, irrespective of the Base Rate.
How good is the standard RT PCR test in detecting the coronavirus? To my surprise, its accuracy does not seem to be a well-known, well established and agreed-upon number. Worse, it is hardly ever a point of discussion – as if the test were just assumed to be perfect. Well, it isn’t. According to some measures, its Sensitivity – the most important side of accuracy – may be as low as 70% or lower. (A horrific story has it that Dr Li Wenliang, the ophthalmologist who first warned about the Wuhan outbreak in January, tested negative several times before dying from the infection a few weeks later). On the other hand, the test seems to be highly specific: a positive result implies an almost certain infection.
Let’s then assume that’s the case and say FNR=30% and FPR=0% – some False Negatives and no False Positives. This is the mirror image of the maximum Sensitivity test in our story. With maximum Specificity, the probability of infection, given a positive test result, is 100%, irrespective of the Base Rate. On the other hand, with Sensitivity at 70% the probability of infection, given a negative test result, is not zero, but depends on the Base Rate. Namely, if the Base rate is low, say 0.1%, the probability is practically zero. But if the Base Rate is higher, it is well above zero. Let’s say for instance that the Base Rate is 50% – a reasonable assumption for the prior probability of infection in a symptomatic person. Then the probability of infection following a negative result is 23%. This is well below the prior probability – the test is confirmative – but is certainly not low enough to exclude infection. To do so, a second test is needed, which would prove infection in case of a positive result, and would lower the probability of infection to 8% in case of a negative result. Hence, for peace of mind we would need a third test, which again would prove infection if positive, and, if negative, would lower the probability of infection to a comfortable 2.6%.
At this level of accuracy, therefore, the RT PCR test is like an enhanced version of an accumulation of specific symptoms: a Smoking Gun that will certainly spot an infection if there is one, but will not prove absence of infection if there isn’t one, unless repeated several times. It follows that, if the hallmark of a good test is to let no infection go undetected – zero False Negatives – a maximum Specificity test is not as good as a maximum Sensitivity test.
This makes little difference if the Base Rate of infection is low. With a negative result, a maximum Sensitivity test guarantees a zero probability of infection whatever the Base Rate, but a maximum Specificity test is almost as good: one negative result is sufficient to reduce the already low Base Rate to almost zero. This is still not good enough if our aim is to avoid a bomb on a plane. But we can live with it if, despite media hype, we accept that a few undetected infections are not as dangerous.
It makes a big difference, however, if the Base Rate is high. In this case, a negative result in a maximum Sensitivity test still guarantees a zero probability of infection, but in a maximum Specificity test it only reduces the probability to what might still be an uncomfortably high level, which could only be lowered by repeating the test several times.
Yet, since the start of the epidemic, RT PCR tests have been targeted on symptomatic cases – people for whom the prior probability of infection was already high before the test. There was a good reason for it: the priority in the early stages was to confirm suspect infections, and isolate and treat the infected. But how many infected people have been ‘cleared’ after one negative test result, and went about infecting others?
RT PCR tests have been used on the wrong targets. They are more appropriate for asymptomatic cases, where the prior probability of infection is low, than for symptomatic cases, where the probability is high. The more specific the symptoms, the higher is the probability of infection. What is the point, then, of testing a symptomatic case just to prove for certain what is already quite likely, while running a high risk of missing a large number of False Negatives?
The most appropriate test for a symptomatic case is not a Smoking Gun, where a positive result proves that the infection hypothesis is true. It is a Barking Dog, where a negative result proves that the hypothesis is false.
Little is known about the degree and type of accuracy of the numerous tests currently being evaluated under the EUA protocol. Ideally, we would like to see both maximum Sensitivity and maximum Specificity tests. Used in conjunction, they would yield a certain answer to the infection hypothesis, irrespective of the Base Rate of infection. Failing that, however, estimating the Base Rate of infection in the general population is a crucial step for a correct interpretation of the test results.
Once we know the test accuracy, as defined by TPR and FPR, the Base Rate BR can be easily derived from
where P(+) is the probability of a positive test result. Hence:
For instance, let’s say we test 10,000 people and 595 of them test positive, hence P(+)=5.95%. If the test accuracy is TPR=100% and FPR=5%, as in the maximum Sensitivity test in our story, then BR=1%. Similarly, if accuracy is TPR=70% and FPR=0%, as in our assumed maximum Specificity RT PCR test, and 70 people test positive, then P(+)=0.70% and again BR=1%.
Notice by the way that this is a general result, valid for any level of accuracy. Say for instance we only have a horribly inaccurate, disconfirmative test, with TPR=30% and FPR=60%. Nevertheless, if we observe that 5970 people test positive, then P(+)=59.7% and again we can conclude that the Base Rate of infection is 1%.
A test with a known level of accuracy is all we need to derive the Base Rate of infection. Crucially, however, this will be the Base Rate of the tested population. Hence, if tests are only performed on symptomatic cases, there will be many more positive results, and the derived BR will be much higher – in fact equal to P(+)/0.7, i.e. 43% higher than the percentage of positives cases, under the assumed accuracy of the RT PCR test. As we saw in the previous post, taking such number as an estimate of the prevalence of infection in the general population would therefore be a gross miscalculation. It would be as if in 2016 Brexit support had been estimated by polling UKIP voters, or Trump support by polling NRA members.
A correct estimate of the true Base Rate of infection can only be obtained by testing a randomly selected, representative cross section of the general population of interest.
With Italy in lockdown and London about to follow, let’s see what we can say in our framework about the coronavirus pandemic.
Funnily enough, the Blinded By Evidence paper starts with a virus. You hear about it on TV and worry you might have it. So you take a test that will tell you with 100% certainty that you have the virus if you actually have it – False Negative Rate (FNR)=0% – and with 95% certainty that you don’t have the virus if you actually don’t have it – False Positive Rate (FPR)=5%. The test comes back positive and you panic, until you are shown that the probability that you have the virus, given that you tested positive, is not near 100%, as you feared, but less than 2%. The reason is that the Base Rate of the virus – its frequency in the population, giving you the probability that you had the virus before you took the test – is 0.1%. And the reason why you were so off the mark is what in our framework we call the Prior Indifference Fallacy: blinded by the test result, you ignored the Base Rate, until reminded of its importance for a correct interpretation of the evidence.
So what’s happening with the coronavirus?
A major difference between our neat stylised story and the messy reality of coronavirus is in the Base Rate. The Base Rate in the story is a known given number – one in a thousand. But what is the Base Rate of the coronavirus? Nobody knows. All we know is that the virus is highly contagious and is spreading. But how many infected people are out there at any point in time? How are they distributed? How can we spot them? We just don’t know. We only know how many have been spotted, as a number of suspect cases – people exhibiting specific symptoms – have been tested and some of them have come out positive. But what about the others – the infected people who have not been tested because they haven’t shown any symptoms, don’t even know they are carrying the virus and go happily about infecting other people? We have no idea. We can only infer that there must be a positive relationship between spotted and unspotted cases – the good old cockroach theory – but what is the multiple? How many unspotted cases are there for each spotted case? We don’t know.
But that’s what we would like to know. As sorry as we are for the known number of spotted cases, and relieved that they are being identified, isolated and treated, it is the unspotted cases that we worry about. How many are they? How fast are they growing? What is the probability that we will get infected by one of them and join their number? What is the Base Rate of the coronavirus?
Such basic questions, but no answers. And, worse, little interest in finding out. Unlike in our story, the coronavirus Base Rate is unknown. But, just like in our story, we fail to recognise its importance for the purpose of finding a correct answer to our questions.
The reason is the same: we are blinded by evidence.
In the story, our question is: what is the probability that we are infected, given that we tested positive? Blinded by the test result, we neglect to account for the small Base Rate and end up with a gross overestimation of the posterior probability.
With the coronavirus, we would also like to be tested. But we can’t, since the RT PCR test that is being used to detect the virus has been confined to suspect cases and is not available to the general public. Unable to take the test on ourselves, our question becomes: what is the probability that we are infected, given that a number of other people tested positive? As in our story, without a test we are naturally drawn to looking for the virus frequency: how many infected people are there as a percentage of the population we interact with? What is the probability that one of them will infect us? Is it small, like the one in a thousand in our story? Or is it “at least 50%”, as yesterday my friend Enzo warned me it is in Milan, begging me not to go there?
No one tells us. So we try ourselves. We look at the data, and what do we see? One horrible figure: the total number of spotted cases, ominously growing day by day. From there, we infer that the number of unspotted cases must be growing at the same pace if not faster, and that it is an unnervingly unknown but surely large multiple of the spotted cases. And, like the character in our story, we panic. We are blinded by evidence. In the story, the panic is caused by Base Rate neglect. With the coronavirus, it is caused by Base Rate inflation.
Let’s see why. The number of spotted cases is the number of people who tested positive out of the number of people who got tested. Clearly, the more people get tested, the larger is the number of spotted cases. So we look at their ratio. This would be a good estimate of the Base Rate if, and only if, the tested people were a random sample of the population of interest. But they aren’t. The tested sample is mainly composed of suspect cases – people who are tested because they show specific symptoms or because they have been in contact with spotted cases. As such, it is far from being random: the prior probability that a suspect case is infected is much higher than if he was picked at random. Hence the ratio of the number of positives over the number of tests is a gross overestimation of the true Base Rate.
Let’s take for example the latest daily Bulletin from the Italian Health ministry:
And let’s look at Lombardy, where the early cases showed up in February and where almost 50% of cumulative total cases (Casi Totali, in orange) are still concentrated. Total cases in Lombardy amount to 19,884, out of 52,244 tested people (Tamponi, in grey). Their ratio, 38%, is the percentage of tested people who turned out positive. Does it mean that that almost one in four of 10 million Lombards are infected? Obviously not. Likewise, the true Base Rate of infections is not 8% in Veneto or 22% in the whole of Italy.
What is it then? We don’t know. In principle, however, estimating the coronavirus Base Rate would be quite simple. Take an unbiased, well stratified, random sample of the population of interest – a routine statistical technique commonly used in opinion polls and market research – and test them. Provided the test is sufficiently accurate, the percentage of positives is a good estimate of the Base Rate.
Crucially, the tested sample would have to be a fair representation of the general population, and therefore include symptomatic as well as asymptomatic people. This is in contrast with the current practice of confining tests to suspect cases – a reasonable procedure when priority must be given to identifying and securing as many infected people as possible, but an erroneous one, as we have seen, when the goal is to estimate the extent of the virus spread.
The advantage of having a detailed, localised and regularly updated map of coronavirus Base Rates should be obvious. It would give us a basic idea of the frequency of infection in different places and its evolution over time, thus helping us – at an individual level as well as at a public policy level – to modulate our response, focusing it more in areas where the Base Rate is higher and growing, and less in areas where it is lower and stable.
At an individual level, it would help our apprehension to know that the Base Rate in our area is, say, 1%, rather than the imaginary multiple perceived by mask-wearing people. Before you say 1% is too low, think that it would mean 100,000 infections in Lombardy – about five times the current number of spotted cases – and more than 600,000 in Italy – about fifteen times the spotted cases. If it is higher we worry a bit more, if it is lower we worry a bit less. But it would benefit our health to know what it is, and that it is far lower than the hyperbolic figures implied by Base Rate inflation.
At the policy level, the benefits of a differentiated approach versus the blanket lockdowns being imposed in Italy and other countries should also be evident, in terms of increased focus where focus is mostly needed and a reduction of the huge social and economic costs currently imposed on everyone.
So the question is: why is not done?
One answer is that the standard RT PCR test requires a complicated and lengthy procedure and does not lend itself to mass testing – hence the priority set on testing suspect cases. But then the South Korean experience has shown us that mass testing is possible, and that it can be very useful. Similar evidence has come from a small town in Veneto. In addition, several companies, including Roche and the Italian Diasorin, have recently developed cheaper and faster tests.
Another objection is that random testing would produce volatile results, as e.g. one negative case today may turn positive tomorrow. But that is in the very nature of all testing, where variability is dealt with by averaging results on properly sized randomised samples, which do not have to be very large to represent much larger populations with a small margin of error. It is just like any poll, say a Leave/Remain Brexit poll (remember?). In fact, making sense of that variability is the very reason why polls are taken and retaken over time.
A third objection is the one in our story: if the Base Rate is small, even very accurate tests can produce a large number of False Positives and False Negatives. But we know the answer to that: repeat the tests – one positive is unreliable, two positives is dependable, three positives almost certain.
So my answer is: Base Rate testing should be done, and I echo WHO Director-General’s ‘simple message for all countries: test, test, test’.
By the time I started writing my DPhil thesis, I had pretty much come to the conclusion that academic life was not for me. So I decided to try and see what it was like to work in the City, and got a summer job at James Capel. Subsequently bought by HSBC, James Capel was then a prominent UK stockbroker and one for the few to pioneer into European equity research. So it was that, overnight, I became their ‘Italian Equity Strategist’.
I wanted to dip a toe in the water – I got a breath-taking full-body plunge into the wide ocean. In no time I was talking to all sorts of ‘clients’ about all things Italy – a true life shaping experience. I still remember – or was it a nightmare? – being in front of a big shot from the ‘Danish Pension Fund’, trying to answer as best as I could his full cartridge of very detailed questions.
It didn’t last long. First, being at work at 7am was definitely not my thing. Besides that, I soon realised I wanted to be on the other side – the buy side, not the sell side. A fund manager, not a stockbroker. So when my friend Bruno got me an interview at JP Morgan Investment Management, where he was working as a company analyst – ‘I’m there at 9am and I can manage my time quite flexibly, as long as I get the job done’ – I was all for it.
But before leaving James Capel I wrote my final piece for their European Equity Strategy publication. It resurfaced recently in a house move. Reading it again after such a long time (yes, the London phone code was 01) made me laugh out loud:
The Italian stock market has gone up 9% by the end of July since the beginning of 1988. This relatively poor performance can essentially be ascribed to fundamental market uncertainty on the critical issues of political stability and fiscal policy, which constitute both the primary target and the key test for the new coalition government headed by the Christian Democratic leader Mr de Mita.
A global reform of the institutional and administrative apparatus of the Italian state is another major concern of the de Mita government. The aim is to make legislation a less lengthy and cumbersome process and to increase the efficiency of the Public Administration.
Political uncertainty – which has kept foreign investors out of Italy for two years – is certainly among the key factors which explain the poor relative growth of the Italian market and the low level of current valuations relative to the performance and prospects of the Italian quoted companies.
As Bruce Hornsby had been singing a couple of years earlier, ‘That’s just the way it is – Some things will never change’.
Since the launch of the Made in Italy Fund, now more than three years ago, I have been banging on this point. Viewed from a top-down, macro perspective, Italy has always looked like an unattractive place to invest. Unstable governments, inefficient public services, bulky debt, higher bond yields and, before the euro, a chronically weak currency. Add for a good measure a few evergreens, such as corruption, the South backwardness and organised crime. And, from a stock market point of view, a limited number of quoted companies – currently about 350, against more than 800 each in France and Germany – mainly concentrated in banking and finance, utilities, oils and a few consumers. The whole lot worth about 600 billion euro – less than Apple. Who would want to invest there?
So common is this ‘country’ way of thinking that it takes some unlearning to realise how fundamentally wrong it is.
Investors do not buy countries. They buy companies – companies that happen to be based in a certain country and are therefore, in most cases, quoted on that country’s Stock Exchange.
But what does that mean? Is Microsoft a US company? Is Nestlé a Swiss company? Yes, that’s where they are headquartered and quoted. But no, not in the sense that their performance is related in any meaningful way to the performance and vicissitudes of their country of origin. What is the relationship between LVMH and the growth of the French economy? Or Ferrari and the stability of the Italian government?
The national dimension of equity investing is largely a remnant of a long-gone past, when most businesses were predominantly domestic. This is clearly not the case today, and not only for the big global corporations, but also, and increasingly so, for smaller firms selling their products and services around the world. To think that there is any direct link between these companies and the economic conditions of their country of origin is lazy at best.
There are still of course many companies whose business is mainly domestic. For these, the linkage to the state of the national economy may be stronger – but it is far from being linear, stable or reliable. Indeed, for some companies a weak economy may create opportunities to gain market share from competitors or to introduce new products and services.
So it is never as simple as economy=stock market. This is so in general, but it’s especially true for Italy, where the sector composition of the market bears no resemblance to the country’s economic reality.
Then what’s the point of the Made in Italy Fund? Isn’t its very name meant to evoke the same national dimension that I am saying makes no sense?
No. The Fund does not invest in Italy as a country. It invests in Italian companies with a market capitalisation of less than one billion euro, quoted on the Milan Stock Exchange.
Why only those and why only there? Two reasons:
It is a good place for finding pearls – companies with high growth prospects, strong and sustainable profitability and attractive valuations. Many of them are smaller companies, leaders in specific market niches, where good management and Italian flair allow them to build and maintain a solid competitive advantage in Italy and abroad. Of course, there are many good companies elsewhere. Buy in Italy they tend to be cheaper. Why? Precisely because investors snub Italy as a country! This is clearly true for many foreign investors, indolently clinging to their ‘country’ way of thinking. But in the last few decades it has been increasingly true also for domestic investors, who in a post-euro, pan-European world have been shedding a sane home-country bias in favour of a snobbish xenophilia.
Soon after I joined JPMIM after James Capel, I started managing the Italian slice of their international equity and balanced portfolios. This was – hard to believe – thirty years ago. Since then I have done many other things, but my involvement with the Italian stock market has hardly ever stopped. I am – I fear to say – a veteran. As such, I like to believe that my experience, together with my ‘Italianness’ – in language, culture and mores – make me especially suited to spotting Italian pearls and, as importantly, avoiding Italian pebbles and duds.
Italy is my country. Like most Italians, I have a complex love-hate relationship with it. Di Maio or de Mita, its politics has always been messy, its public finances rickety, its international credibility regularly in the balance. In my thirty years as an Italian fund manager, I have never been able to build a credible top-down investment case for Italy as a country (incidentally, can one do so for France or Germany or any other developed nation?). But when I flip it around and look bottom-up at Italian companies, especially the smaller ones that form the backbone of the Italian economy, I have no hesitation. In a universe of around 280 companies with less than one billion market cap – now steadily increasing through a sustained flow of new IPOs – I have no trouble selecting thirty or so to include in the Made in Italy Fund. If anything, the problem is to keep track of all the opportunities.
So my attitude to chronic Italian bears is, with Bruce Hornsby: ‘Ah, but don’t you believe them’. Country allocation should not be about countries. It should be about finding pots of value around the globe, and focused managers able to extract them.
P.S. I invite subscribers who haven’t yet done so to also subscribe to the Bayes Investments website, where they will find information and updates on the Made in Italy Fund.
Wootton does a marvellous job explaining mankind’s transition from a worldview based on authority to one based on evidence.
As reprised in Steven Pinker’s latest book (p. 9), a typical well-educated Englishman in 1600 believed in demons, witches, werewolves, magicians, alchemy, astrology and other nonsense (p. 6). But a mere century and a quarter later his whole perspective had changed:
Between 1600 and 1733 (or so – the process was more advanced in England than elsewhere) the intellectual world of the educated elite changed more rapidly than at any other time in previous history, and perhaps than at any time before the twentieth century. Magic was replaced by science, myth by fact, the philosophy and science of ancient Greece by something that is still recognizably our philosophy and our science, with the result that my account of an imaginary person in 1600 is automatically couched in terms of ‘belief’, while I speak of such a person in 1733 in terms of ‘knowledge’. (p. 11-12).
Commonly referred to as the ‘Scientific Revolution’, this transition is not easy to understand. The images we have in mind are of sinister cardinals persecuting Galileo and of barmy philosophers refusing to look into his telescope. In the same vein, Wootton quotes Joseph Glanvill, an early advocate of the revolution, who derided the view that telescopes and microscopes were
all deceitful and fallacious. Which Answer minds me of the good Woman, who when her Husband urged in an occasion of difference, I saw it, and shall I not believe my own Eyes? replied briskly, Will you believe your own Eyes, before your own dear Wife? (p. 74, Italics and bold in the original).
(I find this particularly funny, wondering about how essentially the same joke found its way down to Richard Pryor, through Groucho Marx’s Duck Soup. An equivalent joke my friend Peter told me many years ago is that of the English aristocrat, which I used here).
Obviously, such hilarious caricatures leave much to explain. Educated people in 1600 and earlier were no dimwits. So why did they hold what to our eyes seem such outrageously weird beliefs? This is a focal theme in the Bayes blog. Hence I was intrigued to find out that Wootton’s book is centred on the same key concepts.
Following Aristotle, a seventeenth century educated person was taught to think deductively: draw necessary conclusions from undisputable premises. It would be a mistake, however, to imply that he ignored evidence. As we have seen, there is no such thing as a priori knowledge, independent of evidence. Knowledge cannot but be based on some form of evidence – empirical, as it is plain to our eyes; or axiomatic, as it was common before the Scientific Revolution, all the way back to ancient Greece and beyond. Episteme was absolute, irrefutable, self-evident knowledge. And even the wackiest myths and legends of primordial peoples were not haphazard fantasies but elaborations of authoritative evidence, perhaps in the form of dreams by elderly sages and wise men, who interpreted them as divine revelations they were called upon to proclaim and propagate.
Aristotelian principles were self-evident truths. Such as: All bodies move towards their natural place. Therefore, as stars rotate around it and every object falls towards its core, the earth must be the centre of the universe. Or: Heavier objects fall faster than lighter ones. Therefore, a two-kilo bag of sugar falls faster than a one-kilo bag (Wootton, p. 70). Or: Hard substances are denser and heavier than soft substances. Therefore, ice is heavier than water (p. 71).
These are what we call extreme priors: beliefs that are seen as so obviously self-evident that it is considered pointless to test them through menial experimentation (p. 319). As obviously, however, they are – they cannot but be – the product of evidence. I see stars rotate around the earth and objects fall towards its core: therefore, I infer that all bodies move towards their natural place. I see that a two-kilo bag of sugar falls faster than a one-kilo bag: therefore, I infer that heavier objects fall faster than lighter ones. I see that ice is heavier than water: therefore, I infer that hard substances are heavier than soft ones. The evidence is all wrong, hence the inferences are wrong. But how do I know that? Remember: the closer our priors are to the extreme boundaries of Faith, the stronger must be the evidence required to change them. And, as with Glanvill’s husband, little it matters if the evidence is right in front of our eyes. It is plain to see, for instance, that ice floats on water and, as Archimedes – whose writings had been translated in Latin since the twelfth century – had found out in 250 BCE, this is only possible if ice is lighter than the water it displaces. But hey, who is a mere mathematician compared to the supreme father of natural philosophy? Aristotle had figured out that hard substances are heavier. So there must be another reason why ice floats. Well, it is because of its shape: flat objects cannot penetrate water and therefore remain on the surface. Galileo would patiently prove this was nonsense (p. 315), but philosophers remained unimpressed. In the same vein, when Galileo asked his philosopher friend and colleague Cremonini to look at the mountains on the moon through his telescope, Cremonini refused, not because he was a blockhead – far from it: he was a highly respected professor of natural philosophy for sixty years and earned twice as much as Galileo – but because he did not trust the evidence: he did not regard it as strong enough to dent his Aristotelian belief that the moon was a perfect, unblemished sphere.
The idea that Aristotle had it all figured out and that all ‘natural philosophy’ logically descended from his principles was at the core of the seventeenth century’s worldview. As Wootton puts it (reprising Borges), Shakespeare had no real sense of progress. He treated his characters in the Roman plays as if they were his contemporaries. ‘History did not exist for him’ (p. 5). The governing assumption was that, as in Ecclesiastes (1:9), there was ‘nothing new under the sun’ (p. 63). The event that triggered a seismic change in this view and initiated the Scientific Revolution was the discovery of America at the end of the fifteenth century. That’s where Wootton places what he expressively calls ‘the discovery of discovery’ (Chapter 3). There is arguably no better way to convey this concept than through Hamlet’s immortal words to Horatio, which Wootton does not quote, probably because they are so well-known and overused – although he hints at them in the title of Part One. So I will do it for him: ‘There are more things in heaven and earth, Horatio, Than are dreamt in your philosophy’ (Act I, Scene V).
The discovery of the New World showed mankind that in fact there was plenty new under the sun (including black swans, although for those we had to wait until the end of the seventeenth century) and gave rise to an explosive search for new evidence, which continues unabated, in fact accelerating, to our days. Over the following two centuries, curiosity – which theologians, reigning supreme above philosophers in the hierarchy on medieval science, regarded as a sin – became the mighty fuel of progress that it still is.
From their perspective, theologians were right: as long as knowledge is anchored to the two extreme boundaries of Faith, it remains impervious to evidence. Episteme above Doxa, truth above opinion, knowledge above experience, demonstration above persuasion. The discovery of discovery changed all that: it instilled in the minds of educated people ‘the idea that experience isn’t simply useful because it can teach you things that other people already know: experience can actually teach you that what other people know is wrong. It is experience in this sense – experience as the path to discovery – that was scarcely recognized before the discovery of America’ (p. 81).
This is the true sense of experience: exposure to the peril of being wrong. As curiosity compelled people to leave the secure shores of Aristotelian self-evidence, it encouraged them to embrace Cromwell’s rule, which we might as well rename Glanvill’s rule: Believe Your Own Eyes. This was no blanket surrender to evidence at face value. People remained wary – as we are – that evidence can be deceitful. But they opened their mind to the possibility that, in the right amount and shape, it might be capable of changing and even overturning their prior beliefs. Like Cremonini, they still suspected – and rightly so – that eyes can lie. But, unlike him, they gave them a chance: they were ready to answer Popper’s question.
This was the task that natural philosophers – as they were commonly known until the nineteenth century, when William Whewell coined the term ‘scientist’ (p. 28) – set out to accomplish: accumulate enough evidence to prove hypotheses true or false. They did so through carefully crafted experiments, which – precisely because they were well aware of the fallibility of evidence – they persistently reproduced, shared and challenged, provando e riprovando (p. 300), with the ultimate goal of devising the experimentum crucis (p. 381) which, by yielding conclusive evidence (p. 194), could allow them to proclaim a consensual winner of the evidential tug of war. Thus Truth, until then the preserve of infallible self-evident axioms, became a destination, to be travelled to through fallible empirical evidence. Prior Faith became posterior Certainty.
Reverend Thomas Bayes was born in the midst of this journey and lived through it a quiet and secluded life. He was by no means a protagonist of the Scientific Revolution – so much so that he doesn’t even earn a mention in Wootton’s book. Yet he was very much a man of his time, and his theorem encapsulates so well the ethos of the revolution that we can surely call the journey’s destination ‘Bayesland’.
(Wootton does mention Laplace’s dictum, attributing it to The Logic of Port-Royal, which ‘had acknowledged that the more unlikely an event the stronger the evidence in favour of it would have to be in order to ensure that it was more unlikely that the evidence should be false than that the event should not have occurred’ (p. 465)).
Bayesland is where we live and where we have always lived – Archimedes and Aristotle, Galileo and Cremonini, Shakespeare and Groucho Marx, you and I and all living creatures. We learn by experience, updating our beliefs through a multiplicative accumulation of evidence. We all are and have always been Bayesian.
This has been the Scientific Revolution’s greatest achievement: to show mankind that the way we have always learnt in practice was also valid in theory. Progress started when we stopped wasting time thinking we were doing something else. The effect of such a seemingly simple conceptual clarification has been breathtaking:
Of course, it was far from simple – as Wootton brilliantly shows. His book is a pleasure to read from beginning to end, including his thick jungle of notes. I warmly recommend it.
We have seen where the word Science comes from: scire means to cut, split (as in scissors), separate, decide true from false. We, like other living creatures, do so on the basis of evidence – what we see there is. We use evidence to update our beliefs. We are all Bayesian.
Despite Kant’s grand attempt to salvage some of it, there is no such thing as a priori knowledge. What may appear to us as transcendent knowledge, emanating from pure reason independent of evidence, is and can only be based on notions – concepts, principles, axioms – that we regard as self-evident.
Such notions are the subject of Metaphysics. The word came about, apparently, to denote the collection of Aristotle’s treaties that his late editors arranged to place after (meta) his Physics. Whereas Aristotle himself had not called them Metaphysics, actually referring to them as ‘first philosophy’, dealing with concepts that came before Physics in importance and generality.
Be that as it may, we can think of metaphysics as the area we enter once we start running out of answers to our Why questions. Answers are local explanations built on our own hard evidence or, most often, on soft evidence emanating from trusted sources. We learn to accept local explanations and live with them, but every answer begets new questions, in a seemingly endless why-chain whose infinity we find impossible to accept. Explanations cannot go on forever. At some point, even the cleverest dad succumbs to the urge to end his child’s relentless barrage of whys with a resounding last answer: ‘because that’s the way it is!’
But, to the undaunted child, dad’s last answer turns into the ultimate question: What is the way it is? Once we set out to answer this question we have entered the land of metaphysics. Metaphysics is mankind’s effort to establish the absolute, unquestionable and irrefutable episteme that stands firm above Physics. Episteme is knowledge that does not need evidence because it is self-evident, certain without experiment and secure from the perils of experience.
How can we achieve such knowledge? Clearly, we can’t reach it from the side of experience, whence we can only expect an infinite regress of explanations. So it must come from the other side. But what’s on the other side? Clearly, we know nothing about it – if we did, we would have already gone past the answer we are looking for. As Immanuel Kant put it, noumena are on the other side – things-in-themselves, absolutely unknowable and irremediably inaccessible to our mind. All we can know are phenomena – things as they appear to us in the light of evidence.
Metaphysics is the boundary between phenomena and noumena – a boundary that mankind would love to cross but can only push forward, unfolding and accumulating new and better explanations of phenomena. Such is the love at the root of philosophia – the ever-burning, insatiable desire for sophia, the supreme wisdom in whose full light we would finally be able to contemplate the way it is. But the light of philosophy is the same light that illuminates phenomena. Metaphysics is and can only be on the side of phenomena – the side of experience and evidence. In the words of Arthur Schopenhauer:
Metaphysics thus remains immanent, and does not become transcendent; for it never tears itself entirely from experience, but remains the mere interpretation and explanation thereof, as it never speaks of the thing-in-itself otherwise than in its relation to the phenomenon. (Will, Volume II, p. 183).
Metaphysics is not and cannot be a priori knowledge, independent of evidence. Its value does not rest on its being beyond evidence, but on being based on notions that we regard as self-evident. Like mathematics and geometry, metaphysics is an axiomatic system – true insofar as its axioms are true. An axiom is that which is thought worthy, weighty, and thus bears authority – a concept interestingly close to the original meaning of probability. Axioms are statements assumed to be self-evidently true, thus requiring no proof or demonstration. Given the axioms, the theorems built on them using truth-preserving rules of inference are demonstrably true.
As such, the validity of an axiomatic system depends on the weight of its axioms. The more precise, clear, obvious, intuitive, indubitable the axioms, the stronger the system. Take Euclid’s Elements, which, as we know, is built on five axioms (or postulates). As we have seen, one can argue about the fifth. But not about the first: A straight line can be drawn joining any two points. Or the second: A finite straight segment can be extended indefinitely into a straight line. The third: From any straight segment a circle can be drawn having the segment as radius and one endpoint as centre. And the fourth: all right angles are equal. A geometry in which any of these four axioms is untrue is even hard to imagine. They are glaringly, unquestionably self-evident.
Now let’s compare it to Spinoza’s Ethics, which he explicitly wrote along the lines of Euclid’s Elements.
Here is its first axiom: ‘Everything which exists, exists either in itself or in something else’. The second: ‘That which cannot be conceived through anything else must be conceived through itself’. And the third, which we have encountered as the Principle of Sufficient Reason: ‘From a given definite cause an effect necessarily follows; and, on the other hand, if no definite cause be granted, it is impossible that an effect can follow’. And so on. One may or may not agree with any of these statements – provided that he truly understand what they mean. But it would be at least preposterous to regard them as self-evident.
And what about Definitions, which in Elements as well as in Ethics precede the Axioms? Let’s take the first three. In Elements they are: 1) ‘A point is that which has no part’. 2) ‘A line is breathless length’. 3) ‘The ends of lines are points’. Hard to disagree. But in Ethics: 1) ‘By that which is self-caused, I mean that of which the essence involves existence, or that of which the nature is only conceivable as existent’. 2) ‘A thing is called finite after its kind, when it can be limited by another thing of the same nature; for instance, a body is called finite because we always conceive another greater body. So, also, a thought is limited by another thought, but a body is not limited by thought, nor a thought by body’. 3) (we have seen this one) ‘By substance, I mean that which is in itself, and is conceived through itself: in other words, that of which a conception can be formed independently of any other conception’.
Whaaat? Definitions and axioms can only be as clear as the terms that compose them. We all know and agree on what a point, a straight line and a circle are. But what about essence and existence, cause and substance? They are much more complex, vaguer and harder concepts to define and comprehend. It’s no wonder, then, that all the ensuing Propositions in Ethics are, let’s say, less cogent than Pythagoras’s theorem. Take, for instance, Proposition XI, Part I:
God, or substance, consisting of infinite attributes, of which each expresses eternal and infinite essentiality, necessarily exists.
Here is the proof:
If this be denied, conceive, if possible, that God does not exist: then his essence does not involve existence. But this (Prop. VII) is absurd. Therefore God necessarily exists. Q.E.D.
Uhm. And what is Proposition VII?
Existence belongs to the nature of substances.
and its proof:
Substance cannot be produced by anything external (Corollary, Prop. VI), it must, therefore, be its own cause – that is, its essence necessarily involves existence, or existence belongs to its nature. Q.E.D.
Oh well. I spare you Proposition VI and its Corollary. Spinoza was a great philosopher and an admirable man, and his Ethics is a trove of powerful thoughts and ideas. But its metaphysical value can only be as compelling as its murky foundations.
This is metaphysics’ typical pitfall. While usually conceived as the product of pure reason, standing above physics and unrestrained by experience, metaphysics can’t really be nothing else than a more or less coherent inferential system which is in fact so entwined with evidence as to be entirely based on supposedly self-evident foundations.
The trouble is that self-evidence is in the eye of the beholder. And – as we have seen repeatedly throughout this blog – it is amazing what different people, from the dimmest to the supremely intelligent, come to regard as self-evident. Once one is satisfied that he has made all the way through why-chains to answering the ultimate question, and that he finally knows the way it is, it is tempting to invert direction and reinterpret reality in the light of his newfound metaphysical principles.