A cynic’s definition of a value investor: someone who seeks to buy at 40 cents a business that is worth a dollar and to invest in a business that is able to charge a dollar for what is worth 40 cents.
What does a value investor do when, walking around the aisles of a Tesco supermarket, he is faced with the choice of buying a 2-litre bottle of Coca Cola at £1.59 or, right next to it, a 2-litre bottle of Tesco Cola for £0.50?
I recently faced the same question when my friend Sherri and I were looking to buy the ingredients for a proper ‘full English’ breakfast for Sunday morning – sausages, bacon, eggs, mushrooms, hash browns and baked beans. When it came to the latter, we had to choose between buying a can of Heinz baked beans for £0.85 or an otherwise identical can of Tesco baked beans for £0.30.
As I value-investingly fetched the Tesco can and put it in the basket, Sherri looked at me with an air of amused disapproval: “Come on, I think we can afford a can of Heinz – it’s the weekend!” “What do you mean, it’s the same stuff, isn’t it?” – I replied, playing on and launching into a tale of my visits to La Doria – the Italian firm where the Tesco beans probably came from – which somehow failed to raise her interest. “I bet you can’t tell the difference”. “Of course I can – said Sherri – there is a reason why Heinz beans cost more: they are better quality and they taste better.” “Okay, we’ll see” – I said, as I put both cans in the basket and started to savour the opportunity to run my own version of the most famous experiment in the history of statistics.
Back home, I asked Sherri to bear with me and wait in the living room while I prepared the experiment in the kitchen. I opened both cans and distributed some of their content onto 8 small plates, 4 with Heinz beans and 4 with Tesco beans, and displayed them in two rows.
Then I took Sherri, blindfolded her to eliminate the chance that she would spot a visual difference – the beans looked the same to me but you never know – and asked her to sit in front of the plates.
“There are 4 pairs of plates in front of you. For each pair, one plate contains Heinz beans and the other has Tesco beans. Each plate has its own spoon to avoid contamination. I would like to ask you to taste some beans from each plate, and for each of the 4 pairs tell me which one is Heinz and which one is Tesco”. “Okay” – said Sherri, anticipating a quick dash to victory. She tasted the first pair, and after a few seconds, over which I could see her realise that the task was not as easy as she had thought, she indicated which was which. She was right. “Okay, onto the second pair” – I said, with the acquired taste of a proud scientist enjoying the chance of being proven wrong. After sipping some water to clear her palate, Sherri proceeded and, after a few more seconds of hesitation, made her choice. “Wrong” – I said, with as soft a tone as I could muster to avoid hurting her feelings. “Okay okay, that can happen” – she retorted. “Sure it can” – I said, as I placed the third pair in front of her, this time actually hoping to be proven wrong. Alas, she was wrong again. And so she was in the fourth and final choice.
“All right all right, mister, you’ve proven your point” – she said with a smile, taking the blind off her beautiful eyes. “Yep” – I said, myself surprised by the embarrassing abundance of confirmatory evidence in favour of my hypothesis of interest. 2-2 would have been a better result – still proving my point but leaving her with some sense of dignified achievement.
“Okay, all done? Off we go” – said Sherri, as I cleared the table and she started to prepare her succulent ‘full English’. I transferred the beans into two bowls, one for Heinz and one for Tesco, and added the rest of the cans to each. One can was enough for the breakfast. So I asked Sherri which bowl she wanted me to put in the microwave – the Heinz, which she had just chosen as the better tasting 1 out of 4 times, or the Tesco, which she had chosen 3 out of 4 times?
“Heinz please” – she answered with a wry smile, aware of the inconsistency of her choice but happy to go along with her habitual preferences. I obliged and sat down, in quiet contemplation of the sheer power of established franchises and the value of brand moats that make Kraft Heinz a $42 billion company.
Lauretta – why people like online anonymity never ceases to boggle me (sorry!) – referred to an article of mine on MOI Global, resulting from the combination of two posts here and here. She was struck by a line in the initial dialogue between the layman and the professionals, where the layman cuts his conversation with the indexer saying he should be ashamed of himself. She asked the Forum:
‘Now, shame is a very powerful negative emotion, at least in my culture (I am Italian and so is the author of that article, and I think he uses that word intentionally, to signal that it’s something very bad, something one should feel bad about).
What is the response you would give to his objections, so that you can feel good about indexing (and about people like Bogle who promoted it?)’
Wow. Talk about striking. Did you freeze there, Lauretta, or finish the article? I assure you there is a lot more to it than a passing tongue-in-cheek jibe. I also invite you to take a look at this presentation, in particular slide 14.
I must say, however, to Laura’s praise, that at least she wondered – Thaumazein is a powerful force. But what about some of the answers of the Bogleheads that came to her rescue?
Runner 3081: ‘I view and quickly discard what I just read.’
Way to go, Runner. Can’t wait to read your other 3410 pearls of wisdom.
adamthesmythe: ‘Must one respond? (I don’t think so).’
Kenkat: ‘I think it is a passionate attempt to appeal to emotion in order to separate you from your money’.
Nice to meet you too, Kenkat.
MathIsMyWayr: ‘If you take a big spoonful of salty water from a bowl, does the salt concentration of water remaining in the bowl change?’
More to the point:
Steve321: ‘I looked at the blog you refer to; he is managing or co-managing a fund apparently, called Atomo Made in Italy. I looked it up, the ISIN is LU1391064661. Current fees are4.53% and they are underperforming. So they he (sic) is probably feeling ashamed too.’
Awright Steve. Underperforming what? The Made in Italy Fund has recently had its four-year anniversary. Its return, after all expenses, is 19.2%, versus 1.1% for its most comparable ETF – the Lyxor FTSE Italian Mid Cap. Volatility is 14.8% versus 17.6%. Have a look here – prove me wrong and I’ll buy you another pint.
whereskyle: ‘I would retort and send the shame right back at him (I also have Italian ancestry): Anyone who charges money for advice about individual stock picking should be ashamed of themselves. Virtually all academics subscribe to the idea that the most reliable strategy in picking stocks is to buy the entire market because most market moves are determined by nothing more than a 50/50 coin flip (See Malkiel, A Random Walk Down Wall Street.) The slightest suggestion that one should pay someone to buy their stocks for them according to “a system” that creates market-beating value for the actual investor is highly spurious, almost certainly wrong, and harmful to most investors who might hear it.’
May I ask you: Where-have you been in the last thirty years-skyle? As a refresher, I invite you to read this (or here and here). The presentation mentioned above would also do.
This is the best:
zarci: ‘The fact that the author is an active investment fund manager, and uses a term that is super sensitive in Italian would imply, at least to me, that I would not let this person babysit my children.’
Surprisingly, however, zarci does have something intelligent to say:
First, a market in which the majority of investors are indexers cannot exist in reality. There are too many institutional investors, bankers and other players for that reality to manisfest (sic). The world bank releases figures that show the distribution of investors, might try to find that…
Secondly, there is an incredible overgrowth of different types of ETFs ranging from factors, to small cap, to tech stocks. The list goes on. Even amongst people that invest in a very broadly diversified slice of the market, there is still an incredible amount of bias that provides movement in asset prices.
Just looking at the Bogleheads board topics during the plunge earlier this year would suggest that even the members that subscribe to this board could drive some interesting price action.
Ok, skip the first point. But the second is important. typical.investor nails it:
‘Every single indexer makes active choices that contributes to price discovery when we choose how much to allocate to 1) risky vs. safe assets 2) US vs Dev Intl vs Em 3) and market cap size (large/mid only or also small).
Many indexers go beyond that and make active choices to purse 1) various factors such as value, momentum, quality etc. and 2) certain industries such as REITS, health care, tech, energy etc.
Indexers also make active decisions on whether to rebalance and the timing of it.’
Right. My point is that passive funds are free riders and contribute nothing to price discovery. But this is not the same as saying that investors who allocate among them do not influence asset prices. They do, obviously and evidently. What is as obvious and as evident to me, however, is the inconsistency of saying, on the one hand: There is no point discussing whether Apple, Tesla or any other stocks are over or under valued – just buy their index weight. While, on the other hand, arguing over the relative valuation of US vs. Emerging Markets stocks, or Value vs. Momentum, Large Caps vs. Small Caps, Technology vs. Healthcare etc., and allocate accordingly through passive funds.
Why should market efficiency hold across stocks but not across ‘asset classes’?
This a logical, not an ethical argument. The ethical aspect – if one wants to look at it – is to ignore the inconsistency or quietly brush it aside, thus belittling stock pickers while validating asset allocators.
And, yes, I believe that owning a stock just because it is part of an index is ethically absurd. But you know, Italians…
Before getting the result, I had called the Ipsos MORI helpline to see if they could give me more information about the test and its accuracy. I must not have been the first one to enquire about accuracy, because the helpful operator had a prompt answer: ‘If you’re positive, you definitely have the virus; if you’re negative, you most probably don’t have it, but you can’t be certain’. He was not as clued-up about the test manufacturer, but he came back to me after checking with his supervisor: ‘I believe it is called Wuxi‘.
So apparently I have taken a maximum Specificity Smoking Gun test: a positive result would have been conclusive proof of infection, irrespective of the Base Rate. But I came out negative – as I almost surely expected, given that, without symptoms, I could safely assume my prior probability of infection to be as low as the ONS Base Rate estimate. In the meantime, this had gone up to 0.09% (with a 95% confidence interval of 0.04% – 0.19%), or 1 in 1100 – curiously almost identical to the assumption in my original virus story:
(Strangely enough, given the media penchant for alarming but meaningless statistics, such ‘50% increase’ in the infection rate from a week earlier remained unreported).
However small my priors, seeing them further reduced to near zero in the light of a negative test result was a good feeling. Me being me, however, I called the helpline a second time after the results and asked the same questions. Lo and behold, I got… different answers. This time the operator – a different person – while reassuring me that the test was ‘quite accurate’, would not commit to giving ‘percentages’. And the reported manufacturer was different – ‘either Wondfo or Orientgene‘.
Oh well. None of the three Chinese manufacturers report any accuracy information on their websites. But as long as their tests are ‘quite accurate’ – i.e. somewhat confirmative – a negative result from a low Base Rate gives me, and people around me, virtual certainty that I am not infected.
But what if the result had turned out to be positive? In that case, whether the first operator was right would have mattered a great deal. A positive result from a maximum Specificity test means certain infection. But with a low Base Rate of infection even a small deviation from 100% Specificity means that a positive result is very likely to be a False Positive.
Say for example that, as in the Table below, Specificity is not 100% but 95% – still very high. And say that Sensitivity is 70%. With the current ONS Base Rate of 0.09%, 9 out of 10,000 people have the virus. Of these, 6 will test positive and 3 will test negative. Whereas of the 9,991 people who do not have the virus, 500 will test positive and 9,491 will test negative. It follows that PP, the probability of infection given a positive test result, is as low as 6/506=1.25% (allow for rounding). Whereas NP, the probability of infection after a negative test result, is 3/9,494=0.03%.
In other words, of the 506 people who test positive, only 6 are True Positives – 1 out of 80 – and 500 are False Positives. Whereas of the 9,494 people who test negative, 9,491 are True Negatives and only 3 – 1 out of 3,516 – are False Negatives.
You can play with the blue numbers on this spreadsheet. You will see that even with a 99% Specificity PP remains small at less than 6% – 1 out of 17. Whereas NP is still approximately one third of the Base Rate – 1 out of 3,664.
Only with maximum 100% Specificity will PP jump all the way to 100% – no False Positives – whereas NP is even smaller at 1 out of 3,701.
You can also see that results are not very sensitive to Base Rate variations. 0.09% is the average infection rate in England, but the ONS estimates that it is currently higher (56% higher!) in London, at 0.14% (with a 95% confidence interval of 0.04% – 0.32%):
Plug 0.14% or even 0.32% in the BR cell and you will see that the resulting increases in PP and NP are small. That is why, although I was pleased with the negative result, it was what I almost surely expected – just as I would almost surely expect to draw a white ball from an urn containing 1100 white balls and 1 black ball – or even 313 white balls, if I plug the upper bound of the London confidence interval. After the test, my urn contains many more white balls, but there were plenty before.
Obviously, all the numbers above rest on the ONS Base Rate estimate, which is the right prior assumption in the absence of symptoms. Raise BR to, say, 50% – which would be a reasonable assumption if I had sufficiently specific symptoms – and the numbers are entirely different: PP is 93% and, crucially, NP is 24% – a 1 in 4 chance of a False Negative.
This raises the question: what is the accuracy of the tests used in the ONS study? The answer is in Paragraph 10 of their methodology guide: “we think the sensitivity of the test that the pilot study uses is plausibly between 85% and 95% (with around 95% probability) and the specificity of the test above 95%”. There is no information about the test manufacturers but, assuming they are the same or similar to the ones used by Ipsos MORI, then the first operator was wrong: the test I took is not a Smoking Gun. Based on BR=0.09%, a test with, say, 90% Sensitivity and 97% Specificity further reduces NP to 0.01% – 1 out of 10,769 – which pleases me even more. But PP is not 100%: it is 2.6%.
Think about it: 10,000 people are tested and 308 unlucky ones come out positive. But most of them – all but 8 – are False Positives. The ONS can account for test inaccuracy and cut the 3.08% positive rate down to arrive at the 0.09% Base Rate. But what do they tell the positives? What are they counted as? The same is true for Ipsos MORI and whoever is testing asymptomatic people in a low Base Rate population. How many of the reported cases we hear about every day are False Positives wrongly counted as True Positives?
Anyway, I am a happy negative. Yes, I might still be the 1 in 10,000 unlucky False Negative (or 3 in 10,000 if BR=0.32%). And let’s add to it the chance that, despite dutifully following precise instructions, I might have bungled my self-test – a tricky affair: I was wary about the nose poking, but nudging my tonsils and the nasty gagging reflex that came with it was worse.
But overall it’s a tiny risk, much smaller than other risks I happily live with every day.
Obviously, not being infected today does not mean that I cannot get infected tomorrow. So I will continue my social distancing and hand washing. But I will again run the risk I took in questioning the rationale of blanket lockdowns. Call me a Palm Beach crackpot – what’s wrong with the place? – but now that I know I am not an asymptomatic carrier merrily going about infecting other people, I won’t wear a mask if I don’t have to.
Leaving effectiveness aside, there is no point bearing the cost of reducing a risk that is small enough already.
A positive effect of the coronavirus pandemonium has been the bare exposure of the naïve view of science as the repository of certainty. As humdrum media kept informing the public about ‘what science says’ and governments stuck to the mantra of ‘being driven by science’, scientists themselves staged a dazzling display of varying views, findings, recommendations and guidance.
The treacherous misconception according to which science knows the truth and scientists impart it received a mighty blow – I’d dare to say final, but I won’t. People know that economists disagree, and are used to it – whence the vacuous question: Is economics a science? Even more so for finance experts, where their different views and opinions are the very essence of financial markets – in the face of a standard academic theory still based on the hyperuranian assumption of common knowledge. But when it comes to ‘real’ sciences, people expect experts to reach incontrovertible conclusions firmly grounded on objective evidence – the opposite of what they got from virologists, epidemiologists and assorted scientists around the world in the last few months.
Scientific disagreement should be no surprise: far from being the preserve of certainty, science is the realm of uncertainty. Scientists pursue certainty by asking appropriate questions, but are entirely comfortable with the uncertainty of provisional answers. It is not up to them to decide what to do with their findings.
What is surprising, however, is that most scientists at work on the pandemic anywhere in the world have failed not only to answer but to even ask a most basic question: how many people are infected? ‘A lot’ may have been an understandably quick answer in the initial stage of the tsunami, when all frantic efforts were focused on identifying and treating as many infections as possible. But when, by the beginning of March, the time came to take vital decisions on how best to contain the virus spread, hardly anyone pointed out that a more precise answer was necessary. John Ioannidis did it in mid-March; Giorgio Alleva et al. did it a little later, also providing an outstanding description of the operational framework required to overcome ‘convenience sampling’. A few others did, but no one heard. Instead, starting from Italy on 9 March, one country after another decided to impose blanket lockdowns, varying to some degree in intensity and scope, but all uniformly applied across the entire national territory, irrespective of what would have surely emerged as wide geographical variations in the Base Rate of infections.
Yinon Weiss’ trilogy spares me the task of expounding on what happened next – I agree with virtually everything he wrote. I add two observations. One, there is a stark parallel with the 2008 Great Financial Crisis, where fear of dread drove attention to the gloomiest scenarios of the most hyperbolic doomsayers. This had the disastrous effect of swaying many investors into locking in heavy losses and missing the 2009 turnaround. In the coronavirus panic, the direst predictions persuaded people to willingly acquiesce to unprecedented living conditions for the greater good of saving lives, while being largely oblivious to any consideration of future costs. Second, I hardly need to specify that questioning the appropriateness of lockdown measures has nothing to do with the foolish nonsense of virus deniers and assorted lunatics, no matter how they may attempt to hijack the arguments. Discussing the lockdowns does not mean rejecting their effectiveness in stemming the virus spread, let alone doubting their necessity in specific circumstances. It means assessing their impact vis à vis a full evaluation of their costs and alternative courses of action.
In this regard, as infections have started to recede, a major question currently being asked is what could have been done better with the benefit of hindsight. Unsurprisingly, the common answer is more of the same: earlier and stricter lockdowns. One notable exception, however, came from the UK Chief Medical Officer Chris Whitty, who recently admitted his regret for failing to increase testing capacity earlier on. “Many of the problems we had came because we were unable to work out exactly where we were, and we were trying to see our way through the fog.”
Indeed. Only at the end of April the Office of National Statistics started to produce the Coronavirus Infection Survey Pilot, reporting an estimate of the number of people infected with coronavirus in the community population of England, excluding infections reported in hospitals, care homes and other institutional settings. The Base Rate, finally! The first reported number was 148,000 infections, equal to 0.27% of the population – 1 in 370. Since then the number has been trending down, and according to the latest report of 12 June is 33,000, equal to 0.06% of the population – 1 in 1667.
Curiously, on the same day I was invited to take part in a COVID-19 testing research study (Wave 2) conducted by Imperial College London and Ipsos MORI. ‘The study will help the Government work out how many people may have COVID-19 in different areas of the country. The test may indicate whether you currently have the COVID-19 virus. We have chosen your name at random, and participation is completely voluntary’.
Better late than never, I guess. But the question remains: Why did it take so long? Why wade through the fog for five months only guided by rickety models full of crude assumptions? Why guess the virus spread through a highly abstract number rather than actually measure it on the ground?
We will never know what the infection rate was back in January and February – in the UK or anywhere else – and how it varied through time, across different areas, age groups, sex, and other cohorts – the kind of data that Ipsos MORI and other statistical research agencies routinely inundate us with, ahead of elections and in myriads other circumstances. Sure, a viral test is not as easy to carry out as a telephone interview. And, despite earlier warnings at the highest levels, testing capacity back then was widely insufficient. But the mystery is that random testing was nowhere even considered as an option, including – as far as I can tell – in biostatistics and statistical epidemiology departments. The only option on the table were blanket lockdowns, with national governments left to decide their intensity and people left to dread their worst nightmares and bear all costs, in the name of a comforting but misleading precautionary principle.
It is entirely possible that, despite showing cross-sectional and temporal variation, Base Rate data would have been judged too high to leave any alternative to the adopted lockdown policies. But the point is: what is too high? Is the current infection rate in England too high? Presumably not, given that lockdown measures are being relaxed. As the rate has been coming down since late April, it is reasonable to presume that is was higher earlier on. But how high? Was it 1%? 5%? 10%? We’ll never know. And, crucially, whatever it was, it was an average number, higher in certain areas and lower in others, higher for certain cohorts and lower for others, and varying through time. Such critical information would have been of great help in modulating restriction policies, intensifying them where needed and diminishing or even excluding them elsewhere.
Oh well, too late. But the point seems to be finally coming across. Hopefully, there won’t be a Wave 2. But, just in case, random testing will provide more visibility to navigate through its containment.
I am looking forward to taking my test. Thanks to the ONS Base Rate estimate, and not having any symptoms, I am almost sure I will come out negative. The letter does not specify the test’s accuracy – it just says in the Additional Information overleaf that ‘test results are not 100% accurate’. As we have seen, Base Rate estimation does not require high accuracy: as long as its accuracy level is known, any test would do (the same point is made here). But of course accuracy is important at the individual level. So what will happen in the unlikely event that I result positive? It depends. It would be bad news is the test has maximum Specificity – a Smoking Gun: FPR=0%. If not, however, a positive result will very likely be a False Positive. Hence it would be wrong to interpret it as proving that I am infected. Before reaching that conclusion, I would want to repeat the test and, if I am positive again, repeat it a third time.
I hope that this point will be well clarified to the unlucky positives and that they will not be rushed into isolation.
Nothing epitomises the world’s stunned unpreparedness for the fearsome escalation of the coronavirus pandemic better than the lingering dispute about the appropriateness of mass testing.
Until recently, the main objection to mass testing had been a practical one: a scarcity of RT PCR test kits, combined with the complexity and length of the testing procedure, meant that their use needed to be rationed and supervised, with priority given to identifying as many infections as possible, starting from people who showed specific symptoms and were therefore more likely to be infected in the first place.
This was always a weak argument, and it became increasingly surreal as any amount of costs and efforts of getting more tests done paled into insignificance compared to the gargantuan social and economic costs of all other measures enacted around the world. In any case, the point is now being superseded by the appearance of an increasing number of simpler and faster tests, which greatly extend testing capacity. With supply constraints on the way to being removed, a widespread consensus is finally developing about the need to extend testing beyond symptomatic cases, first to healthcare workers and other people more exposed to the risk of infection, then to people with milder symptoms or no symptoms at all, and ultimately to whoever wants to be tested.
There is still little focus, however, on taking advantage of virtually unconstrained testing resources to fulfil the need for randomised testing aimed at measuring and monitoring the virus Base Rate. The benefits of knowing the virus prevalence in the general population are hardly missed. But efforts have so far been concentrated on estimating it through epidemiological models – whose varying conclusions depend on a number of uncertain parameters – rather than on measuring it directly by sampling observation.
A firm empirical grip on the virus Base Rate is the necessary foundation on which evidence can be used to test the infection hypothesis (here is a video primer on the key concepts used henceforth).
A first-line source of evidence of infection is given by symptoms: fever, cough, shortness of breath, fatigue, loss of taste and smell, etc. A person with symptoms has a higher probability of being infected than a person without. We say that P(I|S), the probability of Infection, given Symptoms, is higher than P(I), the prior probability or Base Rate of infection. We call such evidence confirmative.
How much higher? This is measured by Accuracy, which depends on two variables: the True Positive Rate TPR=P(S|I) – the probability of Symptoms, given Infection – and TNR=P(no S|no I) – the probability of no Symptoms, given no Infection. In a clinical context, TPR is known as Sensitivity and TNR as Specificity. A natural measure of overall Accuracy is the average of the two: A=(TPR+TNR)/2. Perfect evidence has maximum Sensitivity (TPR=1) and maximum Specificity (TNR=1), hence maximum Accuracy A=1. Imperfect evidence has TPR<1 and/or TNR<1, hence A<1.
A key relation to notice is that TPR=1-FNR and TNR=1-FPR, where FNR=P(no S|I) is the False Negative Rate – the probability of no Symptoms, given Infection – and FPR=P(S|no I) is the False Positive Rate – the probability of Symptoms, given no Infection. Hence maximum Sensitivity has FNR=0 – no False Negatives – and maximum Specificity has FPR=0 – no False Positives. Notice A=0.5+(TPR-FPR)/2. Also, simple maths shows that evidence is confirmative if TPR/FPR>1 or, likewise, FNR/TNR<1.
Symptoms are confirmative evidence of infection, but they are quite inaccurate. Sensitivity is inherently low: FNR>0 – this is indeed a key issue with the coronavirus: there is a high number of asymptomatic infections. And, in most cases, Specificity is also low: FPR>0 – a fever or a cough do not necessarily imply an infection. Admittedly, the more specific the symptoms, the lower is FPR and the higher is the probability of infection. In the limit, an accumulation of symptoms – fever and cough and cold and shortness of breath etc. – can amount to a Smoking Gun: evidence so specific as to exclude False Positives and provide conclusive evidence of infection. But remember that conclusive evidence is not the same as perfect evidence: absence of pathognomonic symptoms does not prove absence of infection. Accuracy needs high Specificity as well as high Sensitivity.
This point is often missed: it is no use evaluating evidence by its Sensitivity alone or by its Specificity alone. Think of a parrot always shouting: Infected! It would have maximum Sensitivity – no False Negatives – but zero Specificity. Likewise, a parrot always shouting: Healthy! would have maximum Specificity – no False Positives – but zero Sensitivity. More sensibly, think of an airport hand luggage scanner that always beeps, or of an equally useless one that never does.
Symptoms are usually not accurate enough to prove or disprove the infection hypothesis. That’s why we need tests. Tests are not perfect either: they all produce some False Negatives and/or False Positives. But these can be properly measured. Like a good hand luggage scanner, a good test minimises both and optimises their trade-off.
A good scanner needs to have high, ideally maximum Sensitivity, as to avoid False Negatives: it cannot let a gun go through. A perfect scanner would also have maximum Specificity – no False Positives: it would only pick up the bad stuff and never give false alarms. Failing that, however, we obviously prefer Sensitivity to Specificity – we want to make sure that every explosive device is picked up, even if most suspect objects turn out to be innocuous. We tolerate less Specificity to ensure maximum Sensitivity. At the same time, however, we want Specificity to be as high as possible – inspecting every piece of luggage that gives a false alarm would result in massive chaos and missed flights.
Likewise, a good virus test needs to spot every infection, even if that means scaring some people with a false alarm. Such was the test in our story: FNR=0% and FPR=5% – no False Negatives and a small percentage of False Positives. There we saw that the probability of infection, given a positive test result, depends on the Base Rate: despite high accuracy, a low Base Rate implies a low probability – that is why, by the way, we are not flustered when we hear an airport scanner beep: we know it is likely to be a false alarm. And we saw that with a low Base Rate there is a simple way to deal with alarms: repeat the test. One positive result is no reason for concern, two positives draw our attention, three positives are bad news. On the other hand, we have seen that a negative test result at any stage gives us complete peace of mind: maximum Sensitivity means that the probability of infection, given a negative result, is zero, irrespective of the Base Rate.
How good is the standard RT PCR test in detecting the coronavirus? To my surprise, its accuracy does not seem to be a well-known, well established and agreed-upon number. Worse, it is hardly ever a point of discussion – as if the test were just assumed to be perfect. Well, it isn’t. According to some measures, its Sensitivity – the most important side of accuracy – may be as low as 70% or lower. (A horrific story has it that Dr Li Wenliang, the ophthalmologist who first warned about the Wuhan outbreak in January, tested negative several times before dying from the infection a few weeks later). On the other hand, the test seems to be highly specific: a positive result implies an almost certain infection.
Let’s then assume that’s the case and say FNR=30% and FPR=0% – some False Negatives and no False Positives. This is the mirror image of the maximum Sensitivity test in our story. With maximum Specificity, the probability of infection, given a positive test result, is 100%, irrespective of the Base Rate. On the other hand, with Sensitivity at 70% the probability of infection, given a negative test result, is not zero, but depends on the Base Rate. Namely, if the Base rate is low, say 0.1%, the probability is practically zero. But if the Base Rate is higher, it is well above zero. Let’s say for instance that the Base Rate is 50% – a reasonable assumption for the prior probability of infection in a symptomatic person. Then the probability of infection following a negative result is 23%. This is well below the prior probability – the test is confirmative – but is certainly not low enough to exclude infection. To do so, a second test is needed, which would prove infection in case of a positive result, and would lower the probability of infection to 8% in case of a negative result. Hence, for peace of mind we would need a third test, which again would prove infection if positive, and, if negative, would lower the probability of infection to a comfortable 2.6%.
At this level of accuracy, therefore, the RT PCR test is like an enhanced version of an accumulation of specific symptoms: a Smoking Gun that will certainly spot an infection if there is one, but will not prove absence of infection if there isn’t one, unless repeated several times. It follows that, if the hallmark of a good test is to let no infection go undetected – zero False Negatives – a maximum Specificity test is not as good as a maximum Sensitivity test.
This makes little difference if the Base Rate of infection is low. With a negative result, a maximum Sensitivity test guarantees a zero probability of infection whatever the Base Rate, but a maximum Specificity test is almost as good: one negative result is sufficient to reduce the already low Base Rate to almost zero. This is still not good enough if our aim is to avoid a bomb on a plane. But we can live with it if, despite media hype, we accept that a few undetected infections are not as dangerous.
It makes a big difference, however, if the Base Rate is high. In this case, a negative result in a maximum Sensitivity test still guarantees a zero probability of infection, but in a maximum Specificity test it only reduces the probability to what might still be an uncomfortably high level, which could only be lowered by repeating the test several times.
Yet, since the start of the epidemic, RT PCR tests have been targeted on symptomatic cases – people for whom the prior probability of infection was already high before the test. There was a good reason for it: the priority in the early stages was to confirm suspect infections, and isolate and treat the infected. But how many infected people have been ‘cleared’ after one negative test result, and went about infecting others?
RT PCR tests have been used on the wrong targets. They are more appropriate for asymptomatic cases, where the prior probability of infection is low, than for symptomatic cases, where the probability is high. The more specific the symptoms, the higher is the probability of infection. What is the point, then, of testing a symptomatic case just to prove for certain what is already quite likely, while running a high risk of missing a large number of False Negatives?
The most appropriate test for a symptomatic case is not a Smoking Gun, where a positive result proves that the infection hypothesis is true. It is a Barking Dog, where a negative result proves that the hypothesis is false.
Little is known about the degree and type of accuracy of the numerous tests currently being evaluated under the EUA protocol. Ideally, we would like to see both maximum Sensitivity and maximum Specificity tests. Used in conjunction, they would yield a certain answer to the infection hypothesis, irrespective of the Base Rate of infection. Failing that, however, estimating the Base Rate of infection in the general population is a crucial step for a correct interpretation of the test results.
Once we know the test accuracy, as defined by TPR and FPR, the Base Rate BR can be easily derived from
where P(+) is the probability of a positive test result. Hence:
For instance, let’s say we test 10,000 people and 595 of them test positive, hence P(+)=5.95%. If the test accuracy is TPR=100% and FPR=5%, as in the maximum Sensitivity test in our story, then BR=1%. Similarly, if accuracy is TPR=70% and FPR=0%, as in our assumed maximum Specificity RT PCR test, and 70 people test positive, then P(+)=0.70% and again BR=1%.
Notice by the way that this is a general result, valid for any level of accuracy. Say for instance we only have a horribly inaccurate, disconfirmative test, with TPR=30% and FPR=60%. Nevertheless, if we observe that 5970 people test positive, then P(+)=59.7% and again we can conclude that the Base Rate of infection is 1%.
A test with a known level of accuracy is all we need to derive the Base Rate of infection. Crucially, however, this will be the Base Rate of the tested population. Hence, if tests are only performed on symptomatic cases, there will be many more positive results, and the derived BR will be much higher – in fact equal to P(+)/0.7, i.e. 43% higher than the percentage of positives cases, under the assumed accuracy of the RT PCR test. As we saw in the previous post, taking such number as an estimate of the prevalence of infection in the general population would therefore be a gross miscalculation. It would be as if in 2016 Brexit support had been estimated by polling UKIP voters, or Trump support by polling NRA members.
A correct estimate of the true Base Rate of infection can only be obtained by testing a randomly selected, representative cross section of the general population of interest.
With Italy in lockdown and London about to follow, let’s see what we can say in our framework about the coronavirus pandemic.
Funnily enough, the Blinded By Evidence paper starts with a virus. You hear about it on TV and worry you might have it. So you take a test that will tell you with 100% certainty that you have the virus if you actually have it – False Negative Rate (FNR)=0% – and with 95% certainty that you don’t have the virus if you actually don’t have it – False Positive Rate (FPR)=5%. The test comes back positive and you panic, until you are shown that the probability that you have the virus, given that you tested positive, is not near 100%, as you feared, but less than 2%. The reason is that the Base Rate of the virus – its frequency in the population, giving you the probability that you had the virus before you took the test – is 0.1%. And the reason why you were so off the mark is what in our framework we call the Prior Indifference Fallacy: blinded by the test result, you ignored the Base Rate, until reminded of its importance for a correct interpretation of the evidence.
So what’s happening with the coronavirus?
A major difference between our neat stylised story and the messy reality of coronavirus is in the Base Rate. The Base Rate in the story is a known given number – one in a thousand. But what is the Base Rate of the coronavirus? Nobody knows. All we know is that the virus is highly contagious and is spreading. But how many infected people are out there at any point in time? How are they distributed? How can we spot them? We just don’t know. We only know how many have been spotted, as a number of suspect cases – people exhibiting specific symptoms – have been tested and some of them have come out positive. But what about the others – the infected people who have not been tested because they haven’t shown any symptoms, don’t even know they are carrying the virus and go happily about infecting other people? We have no idea. We can only infer that there must be a positive relationship between spotted and unspotted cases – the good old cockroach theory – but what is the multiple? How many unspotted cases are there for each spotted case? We don’t know.
But that’s what we would like to know. As sorry as we are for the known number of spotted cases, and relieved that they are being identified, isolated and treated, it is the unspotted cases that we worry about. How many are they? How fast are they growing? What is the probability that we will get infected by one of them and join their number? What is the Base Rate of the coronavirus?
Such basic questions, but no answers. And, worse, little interest in finding out. Unlike in our story, the coronavirus Base Rate is unknown. But, just like in our story, we fail to recognise its importance for the purpose of finding a correct answer to our questions.
The reason is the same: we are blinded by evidence.
In the story, our question is: what is the probability that we are infected, given that we tested positive? Blinded by the test result, we neglect to account for the small Base Rate and end up with a gross overestimation of the posterior probability.
With the coronavirus, we would also like to be tested. But we can’t, since the RT PCR test that is being used to detect the virus has been confined to suspect cases and is not available to the general public. Unable to take the test on ourselves, our question becomes: what is the probability that we are infected, given that a number of other people tested positive? As in our story, without a test we are naturally drawn to looking for the virus frequency: how many infected people are there as a percentage of the population we interact with? What is the probability that one of them will infect us? Is it small, like the one in a thousand in our story? Or is it “at least 50%”, as yesterday my friend Enzo warned me it is in Milan, begging me not to go there?
No one tells us. So we try ourselves. We look at the data, and what do we see? One horrible figure: the total number of spotted cases, ominously growing day by day. From there, we infer that the number of unspotted cases must be growing at the same pace if not faster, and that it is an unnervingly unknown but surely large multiple of the spotted cases. And, like the character in our story, we panic. We are blinded by evidence. In the story, the panic is caused by Base Rate neglect. With the coronavirus, it is caused by Base Rate inflation.
Let’s see why. The number of spotted cases is the number of people who tested positive out of the number of people who got tested. Clearly, the more people get tested, the larger is the number of spotted cases. So we look at their ratio. This would be a good estimate of the Base Rate if, and only if, the tested people were a random sample of the population of interest. But they aren’t. The tested sample is mainly composed of suspect cases – people who are tested because they show specific symptoms or because they have been in contact with spotted cases. As such, it is far from being random: the prior probability that a suspect case is infected is much higher than if he was picked at random. Hence the ratio of the number of positives over the number of tests is a gross overestimation of the true Base Rate.
Let’s take for example the latest daily Bulletin from the Italian Health ministry:
And let’s look at Lombardy, where the early cases showed up in February and where almost 50% of cumulative total cases (Casi Totali, in orange) are still concentrated. Total cases in Lombardy amount to 19,884, out of 52,244 tested people (Tamponi, in grey). Their ratio, 38%, is the percentage of tested people who turned out positive. Does it mean that that almost one in four of 10 million Lombards are infected? Obviously not. Likewise, the true Base Rate of infections is not 8% in Veneto or 22% in the whole of Italy.
What is it then? We don’t know. In principle, however, estimating the coronavirus Base Rate would be quite simple. Take an unbiased, well stratified, random sample of the population of interest – a routine statistical technique commonly used in opinion polls and market research – and test them. Provided the test is sufficiently accurate, the percentage of positives is a good estimate of the Base Rate.
Crucially, the tested sample would have to be a fair representation of the general population, and therefore include symptomatic as well as asymptomatic people. This is in contrast with the current practice of confining tests to suspect cases – a reasonable procedure when priority must be given to identifying and securing as many infected people as possible, but an erroneous one, as we have seen, when the goal is to estimate the extent of the virus spread.
The advantage of having a detailed, localised and regularly updated map of coronavirus Base Rates should be obvious. It would give us a basic idea of the frequency of infection in different places and its evolution over time, thus helping us – at an individual level as well as at a public policy level – to modulate our response, focusing it more in areas where the Base Rate is higher and growing, and less in areas where it is lower and stable.
At an individual level, it would help our apprehension to know that the Base Rate in our area is, say, 1%, rather than the imaginary multiple perceived by mask-wearing people. Before you say 1% is too low, think that it would mean 100,000 infections in Lombardy – about five times the current number of spotted cases – and more than 600,000 in Italy – about fifteen times the spotted cases. If it is higher we worry a bit more, if it is lower we worry a bit less. But it would benefit our health to know what it is, and that it is far lower than the hyperbolic figures implied by Base Rate inflation.
At the policy level, the benefits of a differentiated approach versus the blanket lockdowns being imposed in Italy and other countries should also be evident, in terms of increased focus where focus is mostly needed and a reduction of the huge social and economic costs currently imposed on everyone.
So the question is: why is not done?
One answer is that the standard RT PCR test requires a complicated and lengthy procedure and does not lend itself to mass testing – hence the priority set on testing suspect cases. But then the South Korean experience has shown us that mass testing is possible, and that it can be very useful. Similar evidence has come from a small town in Veneto. In addition, several companies, including Roche and the Italian Diasorin, have recently developed cheaper and faster tests.
Another objection is that random testing would produce volatile results, as e.g. one negative case today may turn positive tomorrow. But that is in the very nature of all testing, where variability is dealt with by averaging results on properly sized randomised samples, which do not have to be very large to represent much larger populations with a small margin of error. It is just like any poll, say a Leave/Remain Brexit poll (remember?). In fact, making sense of that variability is the very reason why polls are taken and retaken over time.
A third objection is the one in our story: if the Base Rate is small, even very accurate tests can produce a large number of False Positives and False Negatives. But we know the answer to that: repeat the tests – one positive is unreliable, two positives is dependable, three positives almost certain.
So my answer is: Base Rate testing should be done, and I echo WHO Director-General’s ‘simple message for all countries: test, test, test’.
By the time I started writing my DPhil thesis, I had pretty much come to the conclusion that academic life was not for me. So I decided to try and see what it was like to work in the City, and got a summer job at James Capel. Subsequently bought by HSBC, James Capel was then a prominent UK stockbroker and one for the few to pioneer into European equity research. So it was that, overnight, I became their ‘Italian Equity Strategist’.
I wanted to dip a toe in the water – I got a breath-taking full-body plunge into the wide ocean. In no time I was talking to all sorts of ‘clients’ about all things Italy – a true life shaping experience. I still remember – or was it a nightmare? – being in front of a big shot from the ‘Danish Pension Fund’, trying to answer as best as I could his full cartridge of very detailed questions.
It didn’t last long. First, being at work at 7am was definitely not my thing. Besides that, I soon realised I wanted to be on the other side – the buy side, not the sell side. A fund manager, not a stockbroker. So when my friend Bruno got me an interview at JP Morgan Investment Management, where he was working as a company analyst – ‘I’m there at 9am and I can manage my time quite flexibly, as long as I get the job done’ – I was all for it.
But before leaving James Capel I wrote my final piece for their European Equity Strategy publication. It resurfaced recently in a house move. Reading it again after such a long time (yes, the London phone code was 01) made me laugh out loud:
The Italian stock market has gone up 9% by the end of July since the beginning of 1988. This relatively poor performance can essentially be ascribed to fundamental market uncertainty on the critical issues of political stability and fiscal policy, which constitute both the primary target and the key test for the new coalition government headed by the Christian Democratic leader Mr de Mita.
A global reform of the institutional and administrative apparatus of the Italian state is another major concern of the de Mita government. The aim is to make legislation a less lengthy and cumbersome process and to increase the efficiency of the Public Administration.
Political uncertainty – which has kept foreign investors out of Italy for two years – is certainly among the key factors which explain the poor relative growth of the Italian market and the low level of current valuations relative to the performance and prospects of the Italian quoted companies.
As Bruce Hornsby had been singing a couple of years earlier, ‘That’s just the way it is – Some things will never change’.
Since the launch of the Made in Italy Fund, now more than three years ago, I have been banging on this point. Viewed from a top-down, macro perspective, Italy has always looked like an unattractive place to invest. Unstable governments, inefficient public services, bulky debt, higher bond yields and, before the euro, a chronically weak currency. Add for a good measure a few evergreens, such as corruption, the South backwardness and organised crime. And, from a stock market point of view, a limited number of quoted companies – currently about 350, against more than 800 each in France and Germany – mainly concentrated in banking and finance, utilities, oils and a few consumers. The whole lot worth about 600 billion euro – less than Apple. Who would want to invest there?
So common is this ‘country’ way of thinking that it takes some unlearning to realise how fundamentally wrong it is.
Investors do not buy countries. They buy companies – companies that happen to be based in a certain country and are therefore, in most cases, quoted on that country’s Stock Exchange.
But what does that mean? Is Microsoft a US company? Is Nestlé a Swiss company? Yes, that’s where they are headquartered and quoted. But no, not in the sense that their performance is related in any meaningful way to the performance and vicissitudes of their country of origin. What is the relationship between LVMH and the growth of the French economy? Or Ferrari and the stability of the Italian government?
The national dimension of equity investing is largely a remnant of a long-gone past, when most businesses were predominantly domestic. This is clearly not the case today, and not only for the big global corporations, but also, and increasingly so, for smaller firms selling their products and services around the world. To think that there is any direct link between these companies and the economic conditions of their country of origin is lazy at best.
There are still of course many companies whose business is mainly domestic. For these, the linkage to the state of the national economy may be stronger – but it is far from being linear, stable or reliable. Indeed, for some companies a weak economy may create opportunities to gain market share from competitors or to introduce new products and services.
So it is never as simple as economy=stock market. This is so in general, but it’s especially true for Italy, where the sector composition of the market bears no resemblance to the country’s economic reality.
Then what’s the point of the Made in Italy Fund? Isn’t its very name meant to evoke the same national dimension that I am saying makes no sense?
No. The Fund does not invest in Italy as a country. It invests in Italian companies with a market capitalisation of less than one billion euro, quoted on the Milan Stock Exchange.
Why only those and why only there? Two reasons:
It is a good place for finding pearls – companies with high growth prospects, strong and sustainable profitability and attractive valuations. Many of them are smaller companies, leaders in specific market niches, where good management and Italian flair allow them to build and maintain a solid competitive advantage in Italy and abroad. Of course, there are many good companies elsewhere. Buy in Italy they tend to be cheaper. Why? Precisely because investors snub Italy as a country! This is clearly true for many foreign investors, indolently clinging to their ‘country’ way of thinking. But in the last few decades it has been increasingly true also for domestic investors, who in a post-euro, pan-European world have been shedding a sane home-country bias in favour of a snobbish xenophilia.
Soon after I joined JPMIM after James Capel, I started managing the Italian slice of their international equity and balanced portfolios. This was – hard to believe – thirty years ago. Since then I have done many other things, but my involvement with the Italian stock market has hardly ever stopped. I am – I fear to say – a veteran. As such, I like to believe that my experience, together with my ‘Italianness’ – in language, culture and mores – make me especially suited to spotting Italian pearls and, as importantly, avoiding Italian pebbles and duds.
Italy is my country. Like most Italians, I have a complex love-hate relationship with it. Di Maio or de Mita, its politics has always been messy, its public finances rickety, its international credibility regularly in the balance. In my thirty years as an Italian fund manager, I have never been able to build a credible top-down investment case for Italy as a country (incidentally, can one do so for France or Germany or any other developed nation?). But when I flip it around and look bottom-up at Italian companies, especially the smaller ones that form the backbone of the Italian economy, I have no hesitation. In a universe of around 280 companies with less than one billion market cap – now steadily increasing through a sustained flow of new IPOs – I have no trouble selecting thirty or so to include in the Made in Italy Fund. If anything, the problem is to keep track of all the opportunities.
So my attitude to chronic Italian bears is, with Bruce Hornsby: ‘Ah, but don’t you believe them’. Country allocation should not be about countries. It should be about finding pots of value around the globe, and focused managers able to extract them.
P.S. I invite subscribers who haven’t yet done so to also subscribe to the Bayes Investments website, where they will find information and updates on the Made in Italy Fund.
Wootton does a marvellous job explaining mankind’s transition from a worldview based on authority to one based on evidence.
As reprised in Steven Pinker’s latest book (p. 9), a typical well-educated Englishman in 1600 believed in demons, witches, werewolves, magicians, alchemy, astrology and other nonsense (p. 6). But a mere century and a quarter later his whole perspective had changed:
Between 1600 and 1733 (or so – the process was more advanced in England than elsewhere) the intellectual world of the educated elite changed more rapidly than at any other time in previous history, and perhaps than at any time before the twentieth century. Magic was replaced by science, myth by fact, the philosophy and science of ancient Greece by something that is still recognizably our philosophy and our science, with the result that my account of an imaginary person in 1600 is automatically couched in terms of ‘belief’, while I speak of such a person in 1733 in terms of ‘knowledge’. (p. 11-12).
Commonly referred to as the ‘Scientific Revolution’, this transition is not easy to understand. The images we have in mind are of sinister cardinals persecuting Galileo and of barmy philosophers refusing to look into his telescope. In the same vein, Wootton quotes Joseph Glanvill, an early advocate of the revolution, who derided the view that telescopes and microscopes were
all deceitful and fallacious. Which Answer minds me of the good Woman, who when her Husband urged in an occasion of difference, I saw it, and shall I not believe my own Eyes? replied briskly, Will you believe your own Eyes, before your own dear Wife? (p. 74, Italics and bold in the original).
(I find this particularly funny, wondering about how essentially the same joke found its way down to Richard Pryor, through Groucho Marx’s Duck Soup. An equivalent joke my friend Peter told me many years ago is that of the English aristocrat, which I used here).
Obviously, such hilarious caricatures leave much to explain. Educated people in 1600 and earlier were no dimwits. So why did they hold what to our eyes seem such outrageously weird beliefs? This is a focal theme in the Bayes blog. Hence I was intrigued to find out that Wootton’s book is centred on the same key concepts.
Following Aristotle, a seventeenth century educated person was taught to think deductively: draw necessary conclusions from undisputable premises. It would be a mistake, however, to imply that he ignored evidence. As we have seen, there is no such thing as a priori knowledge, independent of evidence. Knowledge cannot but be based on some form of evidence – empirical, as it is plain to our eyes; or axiomatic, as it was common before the Scientific Revolution, all the way back to ancient Greece and beyond. Episteme was absolute, irrefutable, self-evident knowledge. And even the wackiest myths and legends of primordial peoples were not haphazard fantasies but elaborations of authoritative evidence, perhaps in the form of dreams by elderly sages and wise men, who interpreted them as divine revelations they were called upon to proclaim and propagate.
Aristotelian principles were self-evident truths. Such as: All bodies move towards their natural place. Therefore, as stars rotate around it and every object falls towards its core, the earth must be the centre of the universe. Or: Heavier objects fall faster than lighter ones. Therefore, a two-kilo bag of sugar falls faster than a one-kilo bag (Wootton, p. 70). Or: Hard substances are denser and heavier than soft substances. Therefore, ice is heavier than water (p. 71).
These are what we call extreme priors: beliefs that are seen as so obviously self-evident that it is considered pointless to test them through menial experimentation (p. 319). As obviously, however, they are – they cannot but be – the product of evidence. I see stars rotate around the earth and objects fall towards its core: therefore, I infer that all bodies move towards their natural place. I see that a two-kilo bag of sugar falls faster than a one-kilo bag: therefore, I infer that heavier objects fall faster than lighter ones. I see that ice is heavier than water: therefore, I infer that hard substances are heavier than soft ones. The evidence is all wrong, hence the inferences are wrong. But how do I know that? Remember: the closer our priors are to the extreme boundaries of Faith, the stronger must be the evidence required to change them. And, as with Glanvill’s husband, little it matters if the evidence is right in front of our eyes. It is plain to see, for instance, that ice floats on water and, as Archimedes – whose writings had been translated in Latin since the twelfth century – had found out in 250 BCE, this is only possible if ice is lighter than the water it displaces. But hey, who is a mere mathematician compared to the supreme father of natural philosophy? Aristotle had figured out that hard substances are heavier. So there must be another reason why ice floats. Well, it is because of its shape: flat objects cannot penetrate water and therefore remain on the surface. Galileo would patiently prove this was nonsense (p. 315), but philosophers remained unimpressed. In the same vein, when Galileo asked his philosopher friend and colleague Cremonini to look at the mountains on the moon through his telescope, Cremonini refused, not because he was a blockhead – far from it: he was a highly respected professor of natural philosophy for sixty years and earned twice as much as Galileo – but because he did not trust the evidence: he did not regard it as strong enough to dent his Aristotelian belief that the moon was a perfect, unblemished sphere.
The idea that Aristotle had it all figured out and that all ‘natural philosophy’ logically descended from his principles was at the core of the seventeenth century’s worldview. As Wootton puts it (reprising Borges), Shakespeare had no real sense of progress. He treated his characters in the Roman plays as if they were his contemporaries. ‘History did not exist for him’ (p. 5). The governing assumption was that, as in Ecclesiastes (1:9), there was ‘nothing new under the sun’ (p. 63). The event that triggered a seismic change in this view and initiated the Scientific Revolution was the discovery of America at the end of the fifteenth century. That’s where Wootton places what he expressively calls ‘the discovery of discovery’ (Chapter 3). There is arguably no better way to convey this concept than through Hamlet’s immortal words to Horatio, which Wootton does not quote, probably because they are so well-known and overused – although he hints at them in the title of Part One. So I will do it for him: ‘There are more things in heaven and earth, Horatio, Than are dreamt in your philosophy’ (Act I, Scene V).
The discovery of the New World showed mankind that in fact there was plenty new under the sun (including black swans, although for those we had to wait until the end of the seventeenth century) and gave rise to an explosive search for new evidence, which continues unabated, in fact accelerating, to our days. Over the following two centuries, curiosity – which theologians, reigning supreme above philosophers in the hierarchy on medieval science, regarded as a sin – became the mighty fuel of progress that it still is.
From their perspective, theologians were right: as long as knowledge is anchored to the two extreme boundaries of Faith, it remains impervious to evidence. Episteme above Doxa, truth above opinion, knowledge above experience, demonstration above persuasion. The discovery of discovery changed all that: it instilled in the minds of educated people ‘the idea that experience isn’t simply useful because it can teach you things that other people already know: experience can actually teach you that what other people know is wrong. It is experience in this sense – experience as the path to discovery – that was scarcely recognized before the discovery of America’ (p. 81).
This is the true sense of experience: exposure to the peril of being wrong. As curiosity compelled people to leave the secure shores of Aristotelian self-evidence, it encouraged them to embrace Cromwell’s rule, which we might as well rename Glanvill’s rule: Believe Your Own Eyes. This was no blanket surrender to evidence at face value. People remained wary – as we are – that evidence can be deceitful. But they opened their mind to the possibility that, in the right amount and shape, it might be capable of changing and even overturning their prior beliefs. Like Cremonini, they still suspected – and rightly so – that eyes can lie. But, unlike him, they gave them a chance: they were ready to answer Popper’s question.
This was the task that natural philosophers – as they were commonly known until the nineteenth century, when William Whewell coined the term ‘scientist’ (p. 28) – set out to accomplish: accumulate enough evidence to prove hypotheses true or false. They did so through carefully crafted experiments, which – precisely because they were well aware of the fallibility of evidence – they persistently reproduced, shared and challenged, provando e riprovando (p. 300), with the ultimate goal of devising the experimentum crucis (p. 381) which, by yielding conclusive evidence (p. 194), could allow them to proclaim a consensual winner of the evidential tug of war. Thus Truth, until then the preserve of infallible self-evident axioms, became a destination, to be travelled to through fallible empirical evidence. Prior Faith became posterior Certainty.
Reverend Thomas Bayes was born in the midst of this journey and lived through it a quiet and secluded life. He was by no means a protagonist of the Scientific Revolution – so much so that he doesn’t even earn a mention in Wootton’s book. Yet he was very much a man of his time, and his theorem encapsulates so well the ethos of the revolution that we can surely call the journey’s destination ‘Bayesland’.
(Wootton does mention Laplace’s dictum, attributing it to The Logic of Port-Royal, which ‘had acknowledged that the more unlikely an event the stronger the evidence in favour of it would have to be in order to ensure that it was more unlikely that the evidence should be false than that the event should not have occurred’ (p. 465)).
Bayesland is where we live and where we have always lived – Archimedes and Aristotle, Galileo and Cremonini, Shakespeare and Groucho Marx, you and I and all living creatures. We learn by experience, updating our beliefs through a multiplicative accumulation of evidence. We all are and have always been Bayesian.
This has been the Scientific Revolution’s greatest achievement: to show mankind that the way we have always learnt in practice was also valid in theory. Progress started when we stopped wasting time thinking we were doing something else. The effect of such a seemingly simple conceptual clarification has been breathtaking:
Of course, it was far from simple – as Wootton brilliantly shows. His book is a pleasure to read from beginning to end, including his thick jungle of notes. I warmly recommend it.
We have seen where the word Science comes from: scire means to cut, split (as in scissors), separate, decide true from false. We, like other living creatures, do so on the basis of evidence – what we see there is. We use evidence to update our beliefs. We are all Bayesian.
Despite Kant’s grand attempt to salvage some of it, there is no such thing as a priori knowledge. What may appear to us as transcendent knowledge, emanating from pure reason independent of evidence, is and can only be based on notions – concepts, principles, axioms – that we regard as self-evident.
Such notions are the subject of Metaphysics. The word came about, apparently, to denote the collection of Aristotle’s treaties that his late editors arranged to place after (meta) his Physics. Whereas Aristotle himself had not called them Metaphysics, actually referring to them as ‘first philosophy’, dealing with concepts that came before Physics in importance and generality.
Be that as it may, we can think of metaphysics as the area we enter once we start running out of answers to our Why questions. Answers are local explanations built on our own hard evidence or, most often, on soft evidence emanating from trusted sources. We learn to accept local explanations and live with them, but every answer begets new questions, in a seemingly endless why-chain whose infinity we find impossible to accept. Explanations cannot go on forever. At some point, even the cleverest dad succumbs to the urge to end his child’s relentless barrage of whys with a resounding last answer: ‘because that’s the way it is!’
But, to the undaunted child, dad’s last answer turns into the ultimate question: What is the way it is? Once we set out to answer this question we have entered the land of metaphysics. Metaphysics is mankind’s effort to establish the absolute, unquestionable and irrefutable episteme that stands firm above Physics. Episteme is knowledge that does not need evidence because it is self-evident, certain without experiment and secure from the perils of experience.
How can we achieve such knowledge? Clearly, we can’t reach it from the side of experience, whence we can only expect an infinite regress of explanations. So it must come from the other side. But what’s on the other side? Clearly, we know nothing about it – if we did, we would have already gone past the answer we are looking for. As Immanuel Kant put it, noumena are on the other side – things-in-themselves, absolutely unknowable and irremediably inaccessible to our mind. All we can know are phenomena – things as they appear to us in the light of evidence.
Metaphysics is the boundary between phenomena and noumena – a boundary that mankind would love to cross but can only push forward, unfolding and accumulating new and better explanations of phenomena. Such is the love at the root of philosophia – the ever-burning, insatiable desire for sophia, the supreme wisdom in whose full light we would finally be able to contemplate the way it is. But the light of philosophy is the same light that illuminates phenomena. Metaphysics is and can only be on the side of phenomena – the side of experience and evidence. In the words of Arthur Schopenhauer:
Metaphysics thus remains immanent, and does not become transcendent; for it never tears itself entirely from experience, but remains the mere interpretation and explanation thereof, as it never speaks of the thing-in-itself otherwise than in its relation to the phenomenon. (Will, Volume II, p. 183).
Metaphysics is not and cannot be a priori knowledge, independent of evidence. Its value does not rest on its being beyond evidence, but on being based on notions that we regard as self-evident. Like mathematics and geometry, metaphysics is an axiomatic system – true insofar as its axioms are true. An axiom is that which is thought worthy, weighty, and thus bears authority – a concept interestingly close to the original meaning of probability. Axioms are statements assumed to be self-evidently true, thus requiring no proof or demonstration. Given the axioms, the theorems built on them using truth-preserving rules of inference are demonstrably true.
As such, the validity of an axiomatic system depends on the weight of its axioms. The more precise, clear, obvious, intuitive, indubitable the axioms, the stronger the system. Take Euclid’s Elements, which, as we know, is built on five axioms (or postulates). As we have seen, one can argue about the fifth. But not about the first: A straight line can be drawn joining any two points. Or the second: A finite straight segment can be extended indefinitely into a straight line. The third: From any straight segment a circle can be drawn having the segment as radius and one endpoint as centre. And the fourth: all right angles are equal. A geometry in which any of these four axioms is untrue is even hard to imagine. They are glaringly, unquestionably self-evident.
Now let’s compare it to Spinoza’s Ethics, which he explicitly wrote along the lines of Euclid’s Elements.
Here is its first axiom: ‘Everything which exists, exists either in itself or in something else’. The second: ‘That which cannot be conceived through anything else must be conceived through itself’. And the third, which we have encountered as the Principle of Sufficient Reason: ‘From a given definite cause an effect necessarily follows; and, on the other hand, if no definite cause be granted, it is impossible that an effect can follow’. And so on. One may or may not agree with any of these statements – provided that he truly understand what they mean. But it would be at least preposterous to regard them as self-evident.
And what about Definitions, which in Elements as well as in Ethics precede the Axioms? Let’s take the first three. In Elements they are: 1) ‘A point is that which has no part’. 2) ‘A line is breathless length’. 3) ‘The ends of lines are points’. Hard to disagree. But in Ethics: 1) ‘By that which is self-caused, I mean that of which the essence involves existence, or that of which the nature is only conceivable as existent’. 2) ‘A thing is called finite after its kind, when it can be limited by another thing of the same nature; for instance, a body is called finite because we always conceive another greater body. So, also, a thought is limited by another thought, but a body is not limited by thought, nor a thought by body’. 3) (we have seen this one) ‘By substance, I mean that which is in itself, and is conceived through itself: in other words, that of which a conception can be formed independently of any other conception’.
Whaaat? Definitions and axioms can only be as clear as the terms that compose them. We all know and agree on what a point, a straight line and a circle are. But what about essence and existence, cause and substance? They are much more complex, vaguer and harder concepts to define and comprehend. It’s no wonder, then, that all the ensuing Propositions in Ethics are, let’s say, less cogent than Pythagoras’s theorem. Take, for instance, Proposition XI, Part I:
God, or substance, consisting of infinite attributes, of which each expresses eternal and infinite essentiality, necessarily exists.
Here is the proof:
If this be denied, conceive, if possible, that God does not exist: then his essence does not involve existence. But this (Prop. VII) is absurd. Therefore God necessarily exists. Q.E.D.
Uhm. And what is Proposition VII?
Existence belongs to the nature of substances.
and its proof:
Substance cannot be produced by anything external (Corollary, Prop. VI), it must, therefore, be its own cause – that is, its essence necessarily involves existence, or existence belongs to its nature. Q.E.D.
Oh well. I spare you Proposition VI and its Corollary. Spinoza was a great philosopher and an admirable man, and his Ethics is a trove of powerful thoughts and ideas. But its metaphysical value can only be as compelling as its murky foundations.
This is metaphysics’ typical pitfall. While usually conceived as the product of pure reason, standing above physics and unrestrained by experience, metaphysics can’t really be nothing else than a more or less coherent inferential system which is in fact so entwined with evidence as to be entirely based on supposedly self-evident foundations.
The trouble is that self-evidence is in the eye of the beholder. And – as we have seen repeatedly throughout this blog – it is amazing what different people, from the dimmest to the supremely intelligent, come to regard as self-evident. Once one is satisfied that he has made all the way through why-chains to answering the ultimate question, and that he finally knows the way it is, it is tempting to invert direction and reinterpret reality in the light of his newfound metaphysical principles.
I had never spent much time thinking about Bitcoin. After reading a couple of articles to figure out what it was, I associated it with the muddled assemblage of Austrian devotees ranting against central banking, fiat currency, ‘big government’, ‘the elites’ and ‘the establishment’, and left it at that.
But then the other day, when my basketball buddy Adam, who is trading cryptocurrencies, asked me what I thought about them, I realised I needed a proper answer. Fighting my flippancy impulse – last year I had lashed out at Ed on Brexit and at John on Hillary Clinton ‘corruptness’ (Bernie Sanders’ flavour, not Trump’s) – I just told Adam that I hadn’t given it much thought. But that was not acceptable either. I had to have a closer look.
Luckily (HT @manualofideas) I soon found this recent post on Aswath Damodaran’s blog, which, in typical crystal clarity, makes all the relevant points. The post, written with Bitcoin at $6,100, should be read alongside an earlier post, written only a few months ago, when Bitcoin was priced at $2,800, and a later post, written a few days ago in response to critics. In a nutshell:
Bitcoin is not an asset, because it does not generate future cash flows. As such, it does not have a value.
Bitcoin is a currency, enabling the exchange of goods and services. As such, it has a price, relative to other currencies.
The relative price of a currency depends on its quality as a unit of account, a medium of exchange and a store of value.
One can invest in assets, based on an estimation of their intrinsic value, but can only trade in currencies, based on the anticipation of their future price movements. Buying Bitcoin is not an investment.
What I didn’t know is how many cryptocurrencies there are beside Bitcoin: 1221 of them at the last count – with fancy names like Ripple, IOTA, Qtum, Stellar Lumens – for a total market cap of $169 billion! Each has its own website, detailing how different and better they are compared to the others, and each can be traded on dozens ‘exchanges’ – with other fancy names like Bithumb, Coinone, YoBit, Poloniex. Most of them have explosive price charts, and Adam feels very good about it – he’s been buying more beer rounds. But what will be the dollar price of IOTA a year from now? Like Damodaran, I am not saying it will be zero. But I can’t see how anybody could have any idea.
I will let Adam ponder upon Damodaran’s analysis. As an addition to his considerations, I see his table contrasting the Pricing Game and the Value Game as a striking illustration of the ruinous influence of the Efficient Market Theory.
By collapsing Value into Price, the EMT turns an honourable intellectual pursuit into a vacuous guessing game, where thinking is overruled by action, patience by speed and brains by guts. If prices are always where they should be, and only new information can change them, then success is determined by how quickly one is able to collect and react to news. High-frequency, algorithmic and other types of ‘quant’ trading are a direct offspring of the EMT. And so is home-made online trading, as well as its mirror image, index funds. They all make a mockery of the noble art of investing.
On the way back to London from Italy earlier this month, I decided to stop in Basel. It was mid-way and it had long been on the list of cities I wanted to visit. Why it was on that list started to surface as I picked a hotel on Trivago. Euler Hotel – definitely. We arrived in the evening and the boys were keen to get back home. So I only had half a day the following morning.
Basel’s old town centre is quite small and its main landmark is the Münster, a Romanesque church with a long and interesting history. As we waited for its doors to open at 10, I started touring the adjacent cloister. One of the highlights of the place is that Erasmus was buried there in 1536 – a sudden death following an attack of dysentery. But while looking for the grave in the cloister, wandering among tombs and commemorative plates of the city’s notables, one of them gave me a jolt:
Jacob Bernoulli, of course. He was born and lived in Basel his whole life, and died there on 16 August 1705 – morbo chronico, mente ad extremum integra – at the age of 50 years and 7 months.
Jacob – the eldest scion of the prodigious Bernoulli family – is one of my heroes. The author of the greatest masterwork in early probability theory, Ars Conjectandi, he is also credited as the first to discover the relationship between continuous compound interest and Euler’s number e, the base of natural logarithms. There – I suddenly realised – was a big piece of my subconscious attraction to Basel. Enchanted by my discovery, I asked my second child to pose for a photo next to the tombstone – my elder son was wandering somewhere else, supremely bored and impatiently waiting for lunch and departure.
After leaving the cloister, unable to come up with anything intelligible to say about Bernoulli, I told the kids about Erasmus and Paracelsus – another illustrious Basler. At 10 we visited the church – Erasmus’s grave is inside – and shortly after I realised my time was up – the children would have killed me if I had proposed any more ‘history stuff’. So we walked back to the Euler Hotel – Leonhard Euler was born in Basel two years after Jacob Bernoulli’s death. He was the first to use the letter e for the base of natural logarithms, apparently as the first letter of ‘exponential’, rather than of ‘Euler’. He also established the notation for π and for the imaginary number i, all beautifully joined together in Euler’s identity eiπ+1=0.
On the road to London, I kept thinking with delight at my semi-serendipitous encounter with Bernoulli. Then it struck me: I had seen that tombstone before. Back home, I checked. I was right: it was in one of the best books I have ever read, Eli Maor’s e: The Story of a Number.
As I reopened the book, it all came back to me: the Spira Mirabilis.
The logarithmic spiral is the curve r=aebθ in polar coordinates (r is the radius from the origin, θ is the angle between the radius and the horizontal axis, and a and b are parametric constants). Bernoulli had a lifelong fascination with the self-similar properties of the spiral:
But since this marvellous spiral, by such a singular and wonderful peculiarity, pleases me so much that I can scarce be satisfied with thinking about it, I have thought that it might be not inelegantly used for a symbolic representation of various matters. For since it always produces a spiral similar to itself, indeed precisely the same spiral, however it may be involved or evolved, or reflected or refracted, it may be taken as an emblem of a progeny always in all things like the parent, simillima filia matri. Or, if it is not forbidden to compare a theorem of eternal truth to the mysteries of our faith, it may be taken as an emblem of the eternal generation of the Son, who as an image of the Father, emanating from him, as light of light, remains ὁμοούσιος [consubstantial] with him, howsoever overshadowed. Or, if you prefer, since our spira mirabilis remains, amid all changes, most persistently itself, and exactly the same as ever, it may be used as a symbol, either of fortitude and constancy in adversity, or, of the human body; which after all its changes, even after death, will be restored to its exact and perfect self; so that, indeed, if the fashion of imitating Archimedes were allowed in these days, I should gladly have my tombstone bear this spiral, with the motto, Though changed, I rise again exactly the same, Eadem numero mutata resurgo.
This is the full quote from a paper by Reverend Thomas Hill published in 1875 (p. 516-517), from which Maor’s book takes an extract (p. 126-127), taken in turn from another book. Hill did not quote the source, but the original Latin quote can be found here (p. 185-186, available here), with the indication that it comes from a paper published by Bernoulli in the Leipsic Acts in 1692, which should be found here.
Bernoulli’s enthusiasm is easy to understand and to share. The logarithmic spiral is found in nature and art. The Golden spiral, whose growth factor b is the golden ratio, is a special case, and so is the circle as b tends to 0.
At the same time, it is difficult not to laugh at the manner in which Bernoulli’s wish was finally granted. Perhaps confused by the reference to Archimedes, the appointed mason cut an Archimedean spiral at the bottom of the tombstone, which has none of the properties Bernoulli so admired in the logarithmic spiral. And, to add insult to injury, he missed the word ‘numero’ from the motto. Bloody builders – always the same…
Bernoulli’s considerations made an impression on me when I first read Maor’s book. The spira mirabilis as a symbol of fortitude and constancy in adversity, or of the human body restored to its perfect self even after death. But after reading the passage in its entirety, I find it even more beautiful and inspiring. And how about taking a picture of my son – simillimus filius patri – next to the spiral, before any of this had come back to my mind?
By the way, my son’s name is Maurits, like (but not named after) M. C. Escher.
When I arrived at the Drayton Arms, he was already there. He had contacted me a few days earlier and we had arranged to meet for a drink. He worked for a head hunting firm, focused only – he was keen to specify – on investment management. After the introductory chit-chat, I made it clear that I was not interested in a job offer, and he made it clear that his purpose was to present his services to my firm’s potential hiring needs. With that out of the way, the conversation moved on amiably, flowing from market conditions to value investing, Brexit and other world affairs.
Until at one point – I can’t remember how and why – we veered towards terrorism, and from there to 9/11. “Of course” – said Sandeep, with the casual air of someone who is sharing the obvious among world-savvy, knowledgeable people, “it was clearly an inside job”.
“What? What do you mean?” – I looked at him straight in the eyes.
“What? You don’t think so?” – Sandeep was genuinely taken aback by my sudden change of tone. Which, I agree, requires some explaining.
I have a Spinozan tolerance for freedom of opinion. It is the essence of Bayes: different priors, different information, or different interpretations of the same information, can give rise to different conclusions. This is obvious, and there is nothing wrong with it. But of course it doesn’t mean that anything goes. It means that, even when I have a strong view, I hold on to Cromwell’s rule and remain open to the possibility that, however high in my mind is the probability that I am right, I may be mistaken. As we know, hypothesis testing is the result of a tug of war between confirmative and disconfirmative evidence, which accumulates multiplicatively, leaving the possibility that, however overwhelming the evidence may be on one side, it may be annihilated by even one piece of conclusive evidence on the other. Another consequence of this framework is that, while I strive for certainty, I am comfortable with uncertainty: if neither side is strong enough to win the tug of war, there is nothing wrong with accepting that a hypothesis is only probably right, and therefore also probably wrong.
It is important to remember, however, that this only works insofar as one makes sure that evidence accumulation is as thorough as possible on both sides. This is easy to understand: there is no point gathering a lot evidence on one side while neglecting to do it on the other. One side will win nothing but a rigged game. But it is far from easy to do it in practice, as it requires fighting our natural tendency to succumb to the Confirmation Bias. The easier one side seems to be winning, the stronger should be our urge to reinforce the other side. It is by winning an ever tougher tug of war that we can aim to approach certainty.
This is an aptitude I have learned to nurture. The more I am convinced about something, the more I like to explore the other side, trying to distil its best arguments. If this succeeds in lowering my confidence, so be it: I feel richer, not poorer. And if it doesn’t, I am richer anyway, as I have built a clearer picture of what the other side stands on. This, after all, is what understanding means – distinct from justifying and, more so, from agreeing. The better one understands an argument, the easier it becomes to dismantle it and, perhaps, convince people on the other side to change their mind.
This is where sometimes I fail to keep my composure: when I face a conviction based on a pile of one-sided arguments, typically soaked in hyperbolic language, which blatantly misrepresents, disregards or belittles the other side. But what really gets on my nerves is a dirtier trick: when the balance of evidence is overtly on one side, the only way to overturn the verdict is to find – or, failing that, make up – a conclusive piece of evidence on the other side. This is the standard trick employed by conspiracy theorists: I call them Conclusionists, and the pit they fall into a conclusive evidence trap.
That’s what happened with Sandeep.
“Of course I don’t think so!” I replied. “How can you say such a … thing?” I asked, working on resisting my own adjectival overpouring. He looked at me with candid disbelief. How could I be so naïve? The web is full of information about it – he said. And when I asked him to give me an example, he explained: “Of course it is not in the usual places. You need to know where to look”.
Oh my God. One tends to imagine Conclusionists as showing some exterior signs of dimwittedness. But there he was, a perfectly nice, bright-looking guy, splattering such shocking bullshit. As he excused himself to the men’s room, I tried to collect myself. But failed miserably. “So Sandeep” I asked him as he came back, even before he could regain his seat “Who killed JFK? And what about those moon landings? And the Illuminati? It’s all down to Queen Elisabeth, eh?” I deserved a sonorous expletive. But Sandeep was a gentleman, and perhaps he had regretted his own condescension over his micturating interval. “I see your point” he smiled “I’m not saying that everything you find on internet is true. But…” At which point I grabbed the two seconds void and, after mumbling some sort of apology myself, I cleared the air with a liberating “Anyway…” followed by a question about salaries, as if the whole interlude had never happened. The conversation resumed its cordial tone and carried on for a while, until it was time to go. We departed with the inevitable “Let’s keep in touch”. I never heard from him since.
They arrived in the morning, bright and early. The dishwasher had been acting strangely, so I had finally called in the engineers to figure out what was going on. I like to fix these things myself around the house, but this time, after fiddling in vain for a few days, I had given up.
“‘morning, Sir – how can we help?” Doug, the senior of the duo, had the reassuring air of the expert who has seen it all.
“Well, this is what’s happening” I started, hopeful but sceptical that Doug would immediately find an obvious explanation. “The washing cycle does not end properly. As you can see, it stops in the middle, with water still lying at the bottom. It’s not the filter or anything like that” I added, making it clear that I knew my stuff. “Sometimes, after I open and close the door a couple of times, it restarts and goes on to the end. But other times, like today, it just stops”.
“Let’s take a look” said Doug, and with a nod and a whisper instructed his younger mate Trevor to check under the sink. At this point I left, one because the children had woken up and two because watching Trevor puffing and laying his giant tattooed belly on the cold marble floor was a bit too much so early in the morning. “Call me if you need me” I said. But I had hardly greeted the kids that Doug called me back. “Here it is, Sir” – the dishwasher was working again. “It was the connection to the water drain. It is shared with the washing machine and sometimes it can be a bit too much, you know. Anyway, we’ve changed it around and it should not happen again. But remember never to use the dishwasher and the washing machine at the same time”.
“Ow…kay” I said, trying to conceal my puzzlement and following Doug’s invitation to look under the sink at the result of Trevor’s manipulation. I couldn’t see any difference – and I had never used the two machines simultaneously. “Are you sure?” I wanted to ask, but I refrained – Doug looked very sure, and ready to leave. “Thank you very much” was all I could say. “Pleasure, Sir” said Doug, “it should be alright but we’re here if you need us. Have a good day”.
Alas, the little hope I had for a quick solution soon faded away. The dishwasher finished the cycle that Doug and Trevor had managed to restart, but the next one flopped again in the middle, as I found out the following morning. A little door banging helped it to the end, and so it did in next few days. But the whole process soon became increasingly irritating: sometimes everything worked fine, sometimes the machine stopped and restarted by itself as I entered the kitchen, and some other times I had to keep banging the door. A week later I called back.
“Sorry guys” I apologised on the phone as I explained that their fixing wasn’t working. “No worries, Sir. We’ll be there tomorrow early in the morning”. So that evening I started a new cycle, with the intent of showing them the result in the morning and creating the ideal conditions for a new assessment.
I got out of bed as they rang the bell. They came in and we walked to the kitchen. One, two, three: I opened the dishwasher door, ready to show them the usual stagnant pool of water. Et voilà: no water. This time the cycle had ended properly. “No problem at all, Sir” said Doug, helping to alleviate my evident embarrassment. “We’ll put it down as ‘Intermittent Malfunctioning'”.
As they left with what I couldn’t help interpreting as a wry smile of amusement, I started contemplating my life with an erratically faulty dishwasher. Sure enough, the stop and go resumed. But what was the point of calling them again? So I kept going for a while, banging and cursing. Until one day it all came to an end. No banging, no lights, nothing. The machine was completely dead, and an increasingly smelly sludge at the bottom left me no alternative to calling Doug once again, with a view to arranging for a replacement.
This time Doug came alone, and after a few fearful moments in which I was dreading a new mysterious restart, he declared death himself. He took away the wooden bar under the dishwasher and started fiddling with its feet, exploring ways to slide it out of its casing. I left him again, and again he soon called me back. “Here, Sir” – the dishwasher had come back to life. To my befuddlement and consternation, Doug offered a new explanation: “You see, Sir, it all has to do with the alignment of the feet. They have laid the machine on MDF – that’s not the correct way, they should have used a harder material. With time, the feet have sunk a bit into the wood, enough to misalign the door closing. That’s why banging works sometimes. I have now raised the feet a bit so it’s all back in line. If this doesn’t work, the next thing is to replace the door, but I will not do it myself – I tried it once, but the hinges snapped back and I almost lost my finger. Anyway, I don’t think it will be necessary. I believe I figured it out – it’s amazing how one keeps learning after all these years”.
Oh well. I didn’t know what to make of Doug’s new theory, but he had managed to raise my hopes a bit. Once again, I would have the evidence in the morning. But later in the day I received a phone call. It was an electrician, who explained that he had been instructed by Doug to look at the dishwasher’s plug and asked whether he could come in the afternoon for a check. I was confused – Doug had said nothing to me about the plug. But why not? The whole thing was starting to reveal an amusing side.
As the young electrician came in, I gave him an abridged version of the saga. He nodded, quite uninterested, and set out to slide out the dishwasher to reach for the plug, which he had figured out was right behind it. After a few minutes he called me back. “Here, have a look” he said, with a quiet smile. The plug was stuck to the rear of the dishwasher, its plastic back partially melted and fused into it:
The mystery was finally and completely solved. And, as in the best detective stories, the explanation was simple and totally unexpected. The plug, stuck to the back, would intermittently lose contact with the socket due to the dishwasher’s vibration in mid cycle. That’s why door banging helped – it restored contact, as sometimes did just walking back into the kitchen, as floor vibration was enough to produce the same effect. All the electrician had to do was to move the socket to the side panel and reinsert the plug there. A dishwasher that was about to be chucked away is now in perfect shape and flawlessly performing its wonders.
So much for Doug’s theories. He had first tried a routine explanation – one that would probably fit most similar cases – but received disconfirming evidence from me. He then got confirming evidence from his own observation – a treacherous occurrence in many circumstances. Then, when a new piece of disconfirming evidence arrived, he built a new theory around it that seemed to fit the facts. This was as wrong as the first – and even more so, as it lacked generality and was created on the spot.
To his credit, however, Doug was crucial to finding the truth. I don’t know why he didn’t tell me about the plug – maybe it was late lateral thinking, or maybe he had it in mind but didn’t want to spoil his new-fangled theory – or simply, with no Trevor around, he didn’t feel like going through the motions of sliding the machine out.
Be that as it may, Doug was a true scientist. The search for the truth proceeds neither by deduction nor by induction but – in Charles Sanders Peirce’s somewhat awkward phrasing – abduction. We test hypotheses to produce explanations and select those that provide the best explanation of the observed evidence. The key to the process is to be open to revising and possibly rejecting any explanation in the light of the observed evidence. But a true scientist goes further: he actively looks for evidence that would reject his best theory and only stops when he finds conclusive evidence. In our dishwasher tale – a true story – the fused plastic plug was a Smoking Gun: evidence that conclusively explained the dishwasher’s strange behaviour. Hence we say it was the cause of such behaviour. I sent the picture to Doug’s phone but got no reply – I can’t remember, but perhaps, unlike his owner, the phone is not a smart one.
Investment risk is the probability of a substantial and permanent loss of capital. We buy a stock at 100 expecting to earn a return, consisting of appreciation and possibly a stream of dividends. But our expectation may be disappointed: the price may go down rather than up and we may decide to sell the stock at a loss, either because we need the money or because we come to realise, rightly or wrongly, that we made a mistake and the stock will never reach our expected level.
How does investment risk relate to volatility – the standard deviation of past returns, measuring the extent to which returns have been fluctuating and vibrating around their mean? Clearly, we prefer appreciation to be as quick and smooth as possible. If our expected price level is, say, 150, we would like the stock to reach the target in a straight line rather than through a tortuous rollercoaster. On the other hand, if we are confident that the price will get there eventually, we – unimpressionable grownups – may well endure the volatility. In fact, if on its way to 150 the price dropped to 70 it would create an inviting opportunity to buy more.
Volatility increases investment risk only insofar as it manages to undermine our confidence. We might have rightly believed that Amazon was a great investment at 85 dollars in November 1999, but by the time it reached 6 two years later our conviction would have been brutally battered. Was there any indication at the time that the stock could have had such a precipitous drop? Sure, the price had been gyrating wildly until then, up 21% in November, down 12% in October, up 29% in September and 24% August, down 20% in July, and so on. The standard deviation of monthly returns since the IPO had been 33%, compared to 5% for the S&P500, suggesting that further and possibly more extreme gyrations were to be expected. But to a confident investor that only meant: tighten your seatbelt and enjoy the ride. A 93% nosedive, however, was something else – more than enough to break the steeliest nerves and crush the most assured resolve. ‘I must be wrong, I’m out of here’ is an all too human reaction in such circumstances.
Therefore, while volatility may well contribute to raise investment risk, it is not the same as investment risk. It is only when – rightly or wrongly – conviction is overwhelmed by doubt and poise surrenders to anxiety that investment risk bears its bitter fruit.
Amazon is a dramatic example, but this is true in general. Every investment is made in the expectation of making a return, together with a more or less conscious and explicit awareness that it may turn out to be a flop. Every investor knows this, in practice. So why do many of them ignore it in theory and keep using financial models built on the axiom that volatility equals investment risk? As we have seen, the reason is the intellectual dominance of the Efficient Market Theory.
So the next question is: Why is it that, according to the EMT, investment risk coincides with volatility? The answer is as simple as it is unappreciated. Let’s see.
If the EMT could be summarised in one sentence, it would be: The market price is right. Prices are always where they should be. Amazon at 85, 6 or 1000 dollars. The Nasdaq at 5000, 1400 or 6400. At each point in time, prices incorporate all available information about expected profits, returns and discount rates. Prices are never too high or too low, except with hindsight. Therefore, an investor who buys a stock at 100 because he thinks it is worth 150 is fooling himself. Nobody can beat the market. If the market is pricing the stock at 100, then that’s what it’s worth. The price will change if and only if new information – unknown and unknowable beforehand and therefore not yet incorporated into the current price – prompts the market to revise its valuation. As this was true in the past as it is true in the present and will be true in the future, past price changes must also have been caused by no other reason than the arrival of information that was new at the time and unknown until then. Thus all price changes are unknowable and, by definition, unexpected. And since price changes are the largest components of returns – the other being dividends, which can typically be anticipated to some extent – we must conclude that past returns are largely unexpected. At this point there is only one last step: to identify risk with the unexpected. If we define investment risk as anything that could happen to the stock price that is not already incorporated into its current level, then the volatility of past returns can be taken as its accurate measure.
Identifying investment risk with volatility presupposes market efficiency. This is part of what Eugene Fama calls the joint hypothesis problem. To be an active investor, thus rejecting the EMT in practice, while at the same time using financial models based on the identification of investment risk with volatility, thus assuming the EMT in theory, is a glaring but largely unnoticed inconsistency.
So the next question is: what is it that practitioners know and makes them behave as active investors, and EMT academics ignore and leads them to declare active investment an impossible waste of time and to advocate passive investment?
Again, the answer is simple but out of sight. In a nutshell: Practitioners know by ample experience that investors have different priors. EMT academics assume, by theoretical convenience, that investors have common priors.
Different priors is the overarching theme of the entire Bayes blog. People can and do reach different conclusions based on the same evidence because they interpret evidence based on different prior beliefs. This is blatantly obvious everywhere, including financial markets, where, based on the same information, some investors love Amazon and some other short it. In the hyperuranian realm of the EMT, on the other hand, investors have common priors and therefore, when faced with common knowledge, cannot but reach the same conclusion. As Robert Aumann famously demonstrated, they cannot agree to disagree. This is why, in EMT parlance, prices reflect all available information.
Take the assumption away and the whole EMT edifice comes tumbling down. This is what Paul Samuelson was referring to in the final paragraphs on the Fluctuate and Vibrate papers. More explicitly, here is how Jonathan Ingersoll put it in his magisterial Theory of Financial Decision Making, immediately after ‘proving’ the EMT:
In fact, the entire “common knowledge” assumption is “hidden” in the presumption that investors have a common prior. If investors did not have a common prior, then their expectations conditional on the public information would not necessarily be the same. In other words, the public information would properly also be subscripted as φk – not because the information differs across investors, but because its interpretation does.
In this case the proof breaks down. (p. 81).
Interestingly, on a personal note, I first made the above quotation in my DPhil thesis (p. 132). A nice circle back to the origin of my intellectual journey.
As he wrote his ‘Challenge to Judgment’ on the first issue of the Journal of Portfolio Management in 1974, Paul Samuelson expected ‘the world of practical operators’ and ‘the new world of academics’ – which at the time looked to him ‘still light-years apart’ – to show some degree of convergence in the future.
On the face of it, he was right. The JPM recently celebrated its 40th anniversary. The Financial Analysts Journal, started with the same bridging intent 30 years earlier under Ben Graham’s auspices, is alive and well on its 73rd Volume. Dozens other periodicals have joined in the effort and hundreds of books and manuals have been written, sharing the purpose of promoting and developing a common language connecting the practice and the theory of investing.
But, while presuming and pretending to understand each other, the two worlds are still largely immersed in a sea of miscommunication. At the base of the Babel there are two divergent perspectives on the relationship between risk and return. Everybody understands return. You buy a stock at 100 and the price goes up to 110 – that’s a 10% return. But this is ex post. What was your expected return before you bought? And what risk did you assume? The practical operator does not have precise answers to these questions. I looked at the company – he would say – studied its business, read its balance sheet, talked to the managers, did my discounted cash flow valuation and concluded that the company was worth more than 100 per share. So I expected to earn a good return over time, roughly equal to the gap between my intrinsic value estimate and the purchase price. As for risk, I knew my valuation could be wrong – the company might be worth less than I thought. And even if I was right at the time of purchase, the company and my investment might have taken wrong turns in myriads different ways, causing me to lose some or all of my money.
Is that it? – says the academic – is that all you can say? Of course not – replies the operator – I could elaborate. But I couldn’t do it any better than Ben Graham: read his books and you’ll get all the answers.
But the academic would have none of it. As Eugene Fama recalls: ‘Without being there one can’t imagine what finance was like before formal asset pricing models. For example, at Chicago and elsewhere, investment courses were about security analysis: how to pick undervalued stocks’. (My Life in Finance, p. 14). Go figure. Typically confusing science with precision, the academic is not satisfied until he can squeeze concepts into formulas and insights into numbers. I don’t know what to do with Graham’s rhetoric – he says – I need measurement. So let me repeat my questions: what was your expected return exactly? How did you quantify your risk?
Give me a break – says the defiant operator – risk is much too complex to be reduced to a number. As for my expected return, I told you it is the gap between value and price, but I am under no illusion that I know it exactly. All I know is that the gap is large enough and I am prepared to wait until it closes.
Tut-tut – Fama shakes his head – Listen to me, you waffly retrograde. I will teach you the CAPM. ‘The CAPM provides the first precise definition of risk and how it drives expected return, until then vague and sloppy concepts’. (p. 15).
The operator listens attentively and in the end says: Sorry, I think the CAPM is wrong. First, you measure risk as the standard deviation of past returns. You do it because it gives you a number, but I think it makes little sense. Second, you say the higher the risk the higher is the expected return. That makes even less sense. My idea of risk is that the more there is the more uncertain I am about my expected return. In my view, the relationship between risk and expected return is, if anything, negative. So thank you for the lecture, but I stick with Graham. As Keynes did not say (again!): It is better to be vaguely right than precisely wrong.
Writing ten years after Samuelson’s piece, Warren Buffett well expressed the chasm between academics and practical operators: ‘Our Graham & Dodd investors, needless to say, do not discuss beta, the capital asset pricing model or covariance in returns among securities. These are not subjects of any interest to them. In fact, most of them would have difficulty defining those terms. The investors simply focus on two variables: price and value’. (Buffett, Superinvestors, p. 7).
But operators are rarely so blunt. Such is the intellectual authority of the Efficient Market Theory that the identification of risk with the standard deviation of returns – a.k.a. volatility – and the implication that more risk means higher returns are taken for granted and unthinkingly applied to all sorts of financial models. Hilariously, these include the same valuations that investment practitioners employ to justify their stock selection – an activity that makes sense only if one rejects the EMT! It is pure schizophrenia: investors unlearn at work what they learned at school, while at the same time continuing to use many of the constructs of the rejected theory and failing to notice the inconsistency.
But here is the biggest irony: after teaching it for forty years – twenty after Buffett’s piece – Fama finally got it out of his system: ‘The attraction of the CAPM is that it offers powerful and intuitively pleasing predictions about how to measure risk and the relation between expected return and risk. Unfortunately, the empirical record of the model is poor – poor enough to invalidate the way it is used in applications (Fama and French, JEP 2004). Hallelujah. Never mind that in the meantime the finance world – academics and practitioners – had amassed a colossal quantity of such applications and drawn an immeasurable variety of invalid conclusions. But what is truly mindboggling is that, in spite of it all, the CAPM is still regularly taught and widely applied. It is hard to disagree with Pablo Fernandez – a valiant academic whose work brings much needed clarity amidst the finance Babel – when he calls this state of affairs unethical:
If, for any reason, a person teaches that Beta and CAPM explain something and he knows that they do not explain anything, such a person is lying. To lie is not ethical. If the person “believes” that Beta and CAPM explain something, his “belief” is due to ignorance (he has not studied enough, he has not done enough calculations, he just repeats what he heard to others…). For a professor, it is not ethical to teach about a subject that he does not know enough about.
Two books that I think are particularly effective in helping operators move from practical unlearning – erratic, undigested and incoherent – to proper intellectual unlearning of the concept of risk embedded in the EMT and its derivations are David Dreman’s Contrarian Investment Strategies: the Next Generation (particularly Chapter 14: What is Risk?) and Howard Marks’ The Most Important Thing (particularly Chapters 5-7 on Understanding, Recognizing and Controlling Risk).
Besides the EMT’s predominance, unlearning is necessary because, at first glance, measuring risk with the standard deviation of returns makes intuitive sense: the more prices ‘fluctuate’ and ‘vibrate’, the higher the risk. Take Amazon:
If you had invested 30,000 dollars in Amazon’s IPO in May 1997 (it came out at 18 dollars, equivalent to 1.50 dollars after three splits), after twenty years – as the stock price reached 1,000 dollars (on 2nd June this year, to be precise) – your investment would have been be worth 20 million dollars. Everybody understands return. But look at the chart – in log scale to give a graphic sense of what was going on: 1.50 went to 16 in a year (+126% in one month – June 1998) to reach 85 in November 1999. Then in less than two years – by September 2001 – it was down to 6, only to climb back to 53 at the end of 2003, down to 27 in July 2006, up to 89 in October 2007, down to 43 in November 2008 and finally up – up up up – to 1000. Who – apart from Rip van Winkle and Jeff Bezos – would have had the stomach to withstand such infernal rollercoaster?
So yes, in a broad sense, volatility carries risk. The more violent the price fluctuations, the higher is the probability that, for a variety of psychological and financial circumstances – he may get scared and give up on his conviction, or he may need to liquidate at the wrong time – an investor might experience a catastrophic loss. But how can such probability be measured? The routine, automatic answer is: the standard deviation of returns. Here is the picture:
The graph on the left is the cumulative standard deviation of monthly returns from May 1997 (allowing for an initial 12-month data accumulation) to May 2017, for Amazon and for the S&P500 index. The graph on the right shows the 12-month rolling standard deviations. The cumulative graph, which uses the maximal amount of data, shows that while the monthly standard deviation of the S&P500 has been stable at around 5%, Amazon’s standard deviation has been, after an initial peak, steadily declining ever since, although it still remains about four times that of the index (18.4% vs. 4.4%). The 12-month rolling version shows a similar gap, with Amazon’s standard deviation currently about three times that of the S&P500 (5.1% vs. 1.8%).
What does this mean? Why is it relevant? What can such information tell us about the probability that, if we buy Amazon today, we may incur a big loss in the future? A moment’s thought gives us the answer: very little. Clearly, today’s Amazon is a completely different entity compared to its early days in the ’90s. Using any data from back then to guide today’s investment decision is nothing short of mindless. Amazon today is not four times as risky as the market, as it wasn’t five times as risky in November 2008. Nor is it three times as risky, as implied by the 12-month rolling data. The obvious point is that the standard deviation of returns is a backward-looking, time-dependent and virtually meaningless number, which, contrary to the precision it pretends to convey, has only the vaguest relation to anything resembling what it purports to measure.
The same is true for the other CAPM-based, but still commonly used measure of risk: beta. Here is Amazon’s beta versus the S&P500 index, again cumulative and on a 12-month rolling basis:
Again, the cumulative graph shows that Amazon’s beta has always been high, though it has halved over time from 4 to 2. So is Amazon a high beta stock? Not according to the 12-month rolling measure, which today is 0.4 – Amazon is less risky than the market! – but has been all over the place in the past, from as high as 6.7 in 2007 to as low as -0.4 in 2009. Longer rolling measures give a similar picture. What does it mean? Again, very little. According to the CAPM, Amazon’s beta is supposed to be a constant or at least stable coefficient, measuring the stock’s sensitivity to general market movements. But in reality it is nothing of the kind: like the standard deviation of returns, beta is just an erratic, retrospective and ultimately insignificant number.
Volatility implies risk. But reducing risk to volatility is wrong, ill-conceived and in itself risky, as it inspires the second leg of the CAPM misconception: the positive relationship between risk and expected return. ‘Be brave, don’t worry about the rollercoaster – you’ll be fine in the end and you’ll get a premium. The more risk you are willing to bear, the higher the risk premium you will earn.’ Another moment’s reflection is hardly necessary to reveal the foolishness – and to commiserate the untold damage – of such misguided line of reasoning. The operator’s common sense view is correct: once risk is properly defined as the probability of a substantial and permanent loss of capital, the more risk there is the lower – not the higher – is the probability-weighted expected return. This also requires unlearning – often, alas, the hard way.
Despite Samuelson’s best wishes, then, there is far less authentic common ground between operators and academics than what is pretended – in more or less good faith – in both camps. Operators are right: there is much more to risk than volatility and beta, and actual risk earns no premium.
So the next question becomes: what prevents academics from seeing it?
In the latest chapter of his life-long and eventually triumphal effort to promote index investing, John Bogle explains what lays at the foundation of his philosophy: ‘my first-hand experience in trying but failing to select winning managers’ (p. 6). In 1966, as the new 37-year old CEO of Wellington Management Company, Bogle decided to merge the firm with ‘a small equity fund manager that jumped on the Go-Go bandwagon of the late 1960s, only to fail miserably in the subsequent bear market. A great – but expensive – lesson’ (p. 7), which cost him his job.
It reminded me of another self-confessed failure, as recounted by Eugene Fama, who in his young days worked as a stock market forecaster for his economics professor, Harry Ernst: ‘Part of my job was to invent schemes to forecast the market. The schemes always worked on the data used to design them. But Harry was a good statistician, and he insisted on out-of-sample tests. My schemes invariably failed those tests’. (My Life in Finance, p. 3).
I can’t help seeing both incidents as instances of Festinger’s cognitive dissonance. It runs more or less like this: 1) I know a lot about economics and stock markets. 2) I am smart – truth be told, very smart. 3) I could use my brains to predict stock prices/select winning managers and make a lot of money. 4) I can’t. Therefore: it must be impossible. I think this goes a long way towards explaining the popularity and intuitive appeal of the Efficient Market Theory in academia.
Typical academics are keen to take these as conclusive demonstrations – derived from first principles, like Euclidean theorems – of the impossibility of market beating. But the Master knew better. At the end of ‘Fluctuate’ he wrote:
I have not here discussed where the basic probability distributions are supposed to come from. In whose minds are they ex ante? In there any ex post validation of them? Are they supposed to belong to the market as a whole? And what does that mean? Are they supposed to belong to the “representative individual”, and who is he? Are they some defensible or necessitous compromise of divergent expectation patterns? Do price quotations somehow produce a Pareto-optimal configuration of ex ante subjective probabilities? This paper has not attempted to pronounce on these interesting questions.
And at the end of ‘Vibrate’:
In summary, the present study shows (a) there is no incompatibility in principle between the so-called random-walk model and the fundamentalists’ model, and (b) there is no incompatibility in principle between behaviour of stocks’ prices that behave like random walk at the same time that there exists subsets of investors who can do systematically better than the average investors.
Then in 1974 he reiterated the point in crystal clear terms, addressed to both academics and practitioners on the first issue of the Journal of Portfolio Management:
What is at issue is not whether, as a matter of logic or brute fact, there could exist a subset of the decision makers in the market capable of doing better than the averages on a repeatable, sustainable basis. There is nothing in the mathematics of random walks or Brownian movements that (a) proves this to be impossible, or (b) postulates that it is in fact impossible. (Challenge to Judgment, p. 17, his italics).
And for the EMT zealots:
Many academic economists fall implicitly into confusion on this point. They think that the truth of the efficient market or random walk (or, more precisely, fair-martingale) hypothesis is established by logical tautology or by the same empirical certainty as the proposition that nickels sell for less than dimes.
The nearest thing to a deductive proof of a theorem suggestive of the fair-game hypothesis is that provided in my two articles on why properly anticipated speculative prices do vibrate randomly. But of course, the weasel words “properly anticipated” provide the gasoline that drives the tautology to its conclusion. (p. 19).
There goes ‘Bogle’s truth’. And the irony of it is that in his latest piece Bogle reminisces on how, as he read it at the time, ‘Dr. Samuelson’s essay … struck me like a bolt of lightning’ (p. 6). A hard, obnubilating blow indeed.
There was, nevertheless, a legitimate reason for the fulmination. Samuelson’s Challenge to Judgment was a call to practitioners:
What is interesting is the empirical fact that it is virtually impossible for academic researchers with access to the published records to identify any member of the subset with flair. This fact, though not an inevitable law, is a brute fact. The ball, as I have already noted, is in the court of those who doubt the random walk hypothesis. They can dispose of the uncomfortable brute fact in the only way that any fact is disposed of – by producing brute evidence to the contrary. (p. 19).
He was referring to Jensen (1968) and the copious subsequent literature presenting lack of evidence on identifying a consistent subset of long-term outperforming funds. What Samuelson missed, however – and still goes largely unnoticed – is that the ‘risk adjustments’ to fund and index returns used in these studies are based on definitions of risk – as volatility, beta and the like – that presume market efficiency. To his credit, Eugene Fama has always been very clear on this point, which he calls the joint hypothesis problem:
Market efficiency can only be tested in the context of an asset pricing model that specifies equilibrium expected returns. […] As a result, market efficiency per se is not testable. […] Almost all asset pricing models assume asset markets are efficient, so tests of these models are joint tests of the models and market efficiency. Asset pricing and market efficiency are forever joined at the hip. (My Life in Finance, p. 5-6).
Typically, outperforming funds are explained away, and their returns driven to statistical insignificance, by the ‘higher risk’ they are deemed to have assumed. But such risk is defined and measured according to some version of the EMT! It is – as James Tobin wryly put it – a game where you win when you lose (see Tobin’s comment to Robert Merton’s essay in this collection).
It was precisely in defiance of this game that Warren Buffett wrote his marvellous Superinvestors piece, which sits up there next to Ben Graham’s masterwork in every intelligent investor’s reading list. As in his latest shareholder letter, Buffett used the coin-flipping story, fit for humans as well as orangutans, to point out that past outperformance can be the product of chance. But then he drew attention to an important difference:
If (a) you had taken 225 million orangutans distributed roughly as the U.S. population is; if (b) 215 winners were left after 20 days; and if (c) you found that 40 came from a particular zoo in Omaha, you would be pretty sure you were on to something. So you would probably go out and ask the zookeeper about what he’s feeding them, whether they had special exercises, what books they read, and who knows what else. That is, if you found any really extraordinary concentrations of success, you might want to see if you could identify concentrations of unusual characteristics that might be causal factors. (p. 6).
Hence he proceeded to illustrate the track record of his nine Superinvestors, stressing that it was not an ex post rationalisation of past results but a validation of superior stock picking abilities that he had pre-identified ex ante.
So let’s do a thought experiment and imagine that Buffett 2007 went back 40 years to 1967 and wagered a bet: ‘I will give 82,000 dollars (about 500,000 2007 dollars in 1967 money) to any investment pro who can select five funds that will match the performance of the S&P500 index in the next ten years’. Would Buffett 1967 have taken the bet? Sure – he would have said – in fact, I got nine! And after nine years, one year prior to the end of the bet, he would have proclaimed his victory (I haven’t done the calculation on Buffett’s Tables, but I guess it’s right). Now let’s teleport Buffett 2016 to 1976. What would he have said? Would he have endorsed those funds or recommended investing in the then newly launched Vanguard S&P index fund?
Here is then why I am disoriented – and I’m sure I’m not alone – by Mr. Buffett’s current stance on index investing. To be clear: 1) I am sympathetic to his aversion to Buffett impersonators promoting mediocre and expensive hedge funds. 2) I think index funds can be the right choice for certain kinds of savers. 3) I think Jack Bogle is an earnest and honourable man. However, as a grateful and impassioned admirer of Buffett 1984, Buffett 2016 puzzles me. Like the former, the latter agrees with Paul Samuelson against ‘Bogle’s truth’: long term outperformance, while difficult and therefore uncommon – no one denies it – is possible. But while Buffett 1984 eloquently expanded on the ‘intellectual origin’ (p. 6) of such possibility, and on the ex ante characteristics of superior investors, Buffett 2016’s message is: forget about it, don’t fall for ex post performance and stick to index funds.
Notice this is not a message for the general public: it is addressed to Berkshire Hathaway’s shareholders – hardly the know-nothing savers who may be better served by basic funds. Buffett is very clear about this: buying a low-cost S&P500 index fund is his ‘regular recommendation’ (p. 24), to large and small, individual as well professional and institutional investors – noticeably including the trustees of his family estate (2013 shareholder letter, p. 20).
Great! There goes a life-long dedication to intelligent investing. You may as well throw away your copy of Security Analysis. Alternatively, you may disagree with Mr. Buffett – nobody is perfect – and hope he reconsiders his uncharacteristically unfocused analysis. From the Master who taught us how to select good stocks one would expect equivalent wisdom on how to select good funds. It is not the same thing, but there are many similarities. As in stock picking, there are many wrong things one can do in fund picking. Past performance is no guarantee of future performance. Expensive stocks as well as expensive funds deceptively draw investors’ attention. There is no reason why large stocks or large funds should do better than small ones. Don’t go with the crowd. And so on. Similarly, just like Mr. Buffett taught us how to do the right things in stock picking, he could easily impart comparable advice in fund picking.
Here is the first one that comes to mind: look at the first ten stocks in a fund and ask the fund manager why he holds them. If he makes any reference to their index weight, run away.
Look at the top holdings of Italian Equities funds (Azionari Italia) on morningstar.it. They are the same for most of them: ENI, Intesa Sanpaolo, Enel, Unicredit, Luxottica, Assicurazioni Generali, Fiat Chrysler, and so on. Why? Do most fund managers agree that these are the best and most attractive companies quoted on the Italian stock market? No. The reason is that these are the largest companies by market capitalization, and therefore the largest components of the most commonly used Italian Equities index, the FTSE MIB. The same is true for other countries and regions, as well as for sector funds: look at the composition of the relevant index and you will work out a large portion of the funds’ holdings.
To a candid layman this looks very strange. ENI may be a good company, but why should it be as much as 10% of an Italian Equities fund? Surely, a company’s size has nothing to do with how valuable it is as an investment. Aren’t there more attractive choices? And if so, shouldn’t the fund invest in them, rather than park most of the money in the larger companies?
No, is the fund manager’s answer: the fund’s objective is not simply to find attractive investments. It is to obtain over time a better return than its peers and the index. This is what drives investors’ choices, determines the fund’s success and its manager’s reward. To beat the index – says the manager – I have to face it: take it as a neutral position and vary weights around it. So if I think that ENI is fairly valued I will hold its index weight, if I think it is undervalued I will hold more, and if I think it is overvalued I will hold less. How much more or less is up to me. But if ENI is 10% of the index I would have to regard it as grossly overvalued before deciding to hold none of it in the fund. A zero weight would be a huge bet against the index, which, if it goes wrong – ENI does well and I don’t have it – would hurt the fund’s relative performance and my career.
Sorry to insist – says the outspoken layman – but shouldn’t the fund’s performance and your career be better served if you take that 10% and invest it in stocks that you think will do better than ENI? If you do the same with the other large stocks which, like ENI, you hold in the fund just because they are in the index, you may be wrong a few times, but if you are any good at stock picking – and you tell me you are, that’s why I should buy your fund – then surely you are going to do much better than the index. What am I missing?
Look sir, with all due respect – says the slightly irritated manager – let me do my job. You want the fund to outperform, and so do I. So let me decide how best to achieve that goal, if you don’t mind.
I do mind – says the cheeky layman, himself showing signs of impatience. Of course I want you to beat the index. But I want you to do it with all my money, not just some of it. The index is just a measure of the overall market value. If ENI is worth 53 billion euro and the whole Italian stock market is worth 560 billion – less than Apple, by the way – then, sure, ENI is about 10% of the market. But what does that have to do with how much I, you or anybody else should own of it? The market includes all stocks – the good, the bad and the ugly. If you are able to choose the best stocks, you should comfortably do better than the market. If you can’t, I will look somewhere else.
Oh yeah? Good luck with that – the manager has given up his professional demeanour – hasn’t anybody told you that most funds do worse than the index?
Yes, I am aware of it – says the layman – that’s why I am looking for the few funds that can do better. You’re right, if your peers do what you do, I am not surprised they can’t beat the index. But I’ll keep looking. Good bye.
Well done, sir – someone else approaches the layman – let me introduce myself: I am the indexer. You’re right, all this overweight and underweight business is a complete waste of time and money. The reality is that, sooner or later, most funds underperform the index – and they even want to get paid for it! So let me tell you what I do: in my fund, I hold the stocks in the index at exactly their neutral weight, but I charge a small fraction of the other funds’ fees. This way, my fund does better than most other funds, at a much lower cost. How does that sound?
Pretty awful, I must say – says the layman – I am looking for a fund that invests all my money in good stocks and you are proposing one that does none of that and mindlessly buys index stocks. And you call yourself an investor?
Pardon me, but you’re so naïve – says the indexer – I am telling you I do better than most, at a lower cost. What part of the message don’t you understand?
Well, it’s not true – say the layman – and proceeds to show the indexer a list of funds that have done better than the relevant index and the other funds for each category over several periods after all costs – he may be a layman but he’s done his homework.
Oh, that’s rubbish – retorts the indexer – and performs his well-rehearsed coin-tossing gig. These are just the lucky guys who happen to sit on the right tail of the return distribution for a while. Sooner or later, their performance will revert to the mean. And do you know why? Because markets are efficient. Have you heard of the Efficient Market Theory? – he asks with a smug look. There is tons of academic evidence that proves that consistent market beating is impossible.
Yes, I know the EMT – says the layman – and I think it is wrong. Beating the market is clearly difficult – if it were easy everybody could do it, hence nobody would – but it is not impossible. The numbers I just showed you prove my point, and to dismiss them as a fluke is a miserable argument, fit only for haughty academics in need of a soothing answer to a most nagging question: If you’re so smart, why aren’t you rich? Tell me something – continues the layman – what drives market efficiency? Certainly not you, or the other gentleman with his marginal tweaking. You buy any company in the index regardless of price.
Yes – says the indexer, hiding his discomfort – but we are powerful and responsible shareholders and make sure that our voice gets heard.
Give me a break – the layman laughs – companies don’t care about you. They know you have to hold their shares no matter what. You’re the epitome of an empty threat. You don’t even know or care what these companies do. You are not an investor – you’re a free rider.
Ok then – says the indexer (he knew his was a phony argument but he tried it anyway) – what’s wrong with that? If there are enough active investors busy driving prices to where they should be, my passive fund reaps the benefits, my investors pay less and everyone is happy.
You should be ashamed of yourself, you know – says the layman, ready to end his second conversation.
Aw come on now! – blurts the indexer – who’s worse: me, transparently declaring what I do and charging little for it, or the other guy, pretending to be smart, doing worse than me and charging ten times as much?
You’ve got a point there – says the layman – you’re better than him. But you’re not going to get my money either. Good bye.
As you like, it’s your money – says the indexer, before launching his departing salvo: you know, even Warren Buffett says that index investing is the smart thing to do.
I have seen that – says the layman – what was he thinking?
Yes, what was Warren Buffett thinking when in his 2016 shareholder letter he proposed (p. 24) to erect a statue to John Bogle? Let’s see.
Back in the 2005 letter, Buffett prognosticated that active managers would, in aggregate, underperform the US stock market. He was reiterating the ‘fundamental truth’ of index investing. In the latest words of its inventor and proselytiser:
Before intermediation costs are deducted, the returns earned by equity investors as a group precisely equal the returns of the stock market itself. After costs, therefore, investors earn lower-than-market returns. (p. 2)
In its most general sense, this is an obvious tautology: the aggregate return equals the market return by definition. However, ‘Bogle’s truth’ is usually intended to apply as well to mutual funds, which for US equities represent about 20% of the aggregate (see e.g. Figure 2.3, p. 36 here). As such, there is no logical reason why mutual funds should necessarily perform like the market as a group, and worse than the market after costs. In fact, a layman would be justified in expecting professional investors to do better, before and after costs, compared to e.g. households. Whether mutual funds do better than the market is therefore an empirical rather than a logical matter.
The question has a long history, dating back to Jensen (1968) all the way to the latest S&P SPIVA report. Most of these studies make it particularly hard for outperformance to show up. Rather than squarely comparing fund returns to the market index, they either adjust performance for ‘risk’ (Jensen) using the now abandoned CAPM model, or slice and dice fund returns (SPIVA), box them into a variety of categories and compare them to artificial sub-indices. As a result, the commonly held view – reflected in Buffett’s 2005 prediction – is that ‘most funds underperform the market’. From this, the allure of index investing is a small logical step and a seemingly impregnable conclusion. All you need to say is, as Buffett puts it (p. 24):
There are, of course, some skilled individuals who are highly likely to out-perform the S&P over long stretches. In my lifetime, though, I’ve identified – early on – only ten or so professionals that I expected would accomplish this feat.
There are no doubt many hundreds of people – perhaps thousands – whom I have never met and whose abilities would equal those of the people I’ve identified. The job, after all, is not impossible. The problem simply is that the great majority of managers who attempt to over-perform will fail. The probability is also very high that the person soliciting your funds will not be the exception who does well.
Further complicating the quest for worthy managers – says Buffett – is the fact that outperformance may well be the result of luck over short periods, and that it typically attracts a torrent of money, which the manager gladly accepts to his own benefit, thus making future returns more difficult to sustain.
The bottom line: When trillions of dollars are managed by Wall Streeters charging high fees, it will usually be the managers who reap outsized profits, not the clients. Both large and small investors should stick with low-cost index funds.
It was on this basis that Buffett followed his 2005 prophesy by offering a bet to any investment professional able to select at least five hedge funds that would match the performance of a Vanguard S&P500 index fund over the subsequent ten years. He called for hedge funds, which represent an even smaller portion of the US equity investor universe, as he considers them as the most strident example of divergence between bold return promises – reflected in hefty fees – and actual results. Most hedge funds do not set beating the S&P500 as their stated objective, preferring instead to target high returns independent of market conditions. But Buffett’s call was right: what’s the point of charging high fees if you can’t deliver more than index returns? At the same time, presumably he would not have objected to betting against long-only active funds explicitly managed to achieve S&P500 outperformance.
What followed – said Buffett – was the sound of silence. This is indeed surprising. Hedge fund managers’ objectives may be fuzzier, but if you manage a long-only US equity fund with a mandate to outperform the S&P500 and you genuinely believe you can do it, what better promotional opportunity is there than to bet against Warren Buffett and win?
Be that as it may, only one manager took up the challenge. And – bless him – he did not choose five long-only funds, nor five hedge funds, but five funds of hedge funds: he picked five funds that picked more than 100 hedge funds that picked thousands of stocks. Nothing wrong with that, in principle. Presumably, each of the five funds of funds managers believed he could select a portfolio of hedge funds that, at least on average, would do so much better than the S&P500 that, despite the double fee layer, it would itself end up well ahead of the index. They were wrong, very wrong (p. 22). Over the nine years from 2008 to 2016, the S&P500 returned 85.4% (7.1% per annum). Only fund of funds C got somewhat close, with a return of 62.8% (5.6% per annum). The other four funds returned, in order: 28.3%, 8.7%, 7.5% and 2.9% (that is 2.8%, 0.9%, 0.8% and 0.3% per annum).
Result: Buffett’s valiant and solitary challenger, Mr. Ted Seides, co-manager, at the time, of Protégé Partners, played a very bad hand and made a fool of himself. But Buffett was lucky: he set out to prove ‘Bogle’s truth’ and observe index-like returns before fees, turning into underperformance after fees, but what he got was abysmal returns. Except perhaps for fund C, the gaping hole between the funds and the S&P500 had very little to do with fees. Buffett estimated that about 60% of all gains achieved by the five funds of funds went into the two fee layers. But even if fund D, returning a whopping 0.3% per year, had charged nothing, to select hedge funds that charged nothing, it would still have ended up well below the index. Same for funds A and E and, likely, for fund B.
To recap: when applied to mutual and hedge funds, ‘Bogle’s truth’ is not a logical necessity – as it is often portrayed to be – but is an empirical statement. Performance studies make it hard for outperformance to emerge, but beating the index in the long run is certainly no easy task, even for professional investors. Fees make it even harder – the higher the fees, the harder the task. However, while difficult to achieve and therefore rare to observe, long-term outperformance is not impossible – Buffett is the first to acknowledge it: he’s a living proof!
Why is it then that he interpreted his bet win against Seides as evidence of ‘Bogle’s truth’? Imagine he had called for five value stocks and got five duds. Would he have interpreted this as evidence of the impossibility of value investing? What’s the difference between picking stocks and picking funds? Why does Buffett consider the former a difficult but valiant endeavour while the latter an impossible waste of time?
One of the beauties of maths is that it is the same in every language. So you don’t need to know Italian to read the table on the second page of this article on this week’s Milano Finanza.
The Made in Italy Fund started in May last year and is up 43% since then.
Here are the main points of the article:
The Italian stock market is dominated by the largest 40 stocks included in the FTSE MIB index. The FTSE MIB and the FTSE Italia All Shares indices are virtually overlapping (first graph on page 1).
2/3 of the Italian market is concentrated in 4 sectors.
Small Caps – companies with a market cap of less than 1 billion euro – are 3/4 of the 320 quoted names, but represent only 6% of the value of the market.
Small Caps as a whole have underperformed Large Caps (second graph).
But quality Small Caps – those included in the Star segment of the market – have outclassed the MIB index (third graph).
However, the Star index is itself concentrated (table on page 3): the top 11 stocks in the index with a market cap above 1 billion (not 12: Yoox is no longer there) represent more than 60% of the index value (a company needs to be below 1 billion to get into the Star segment, but it is not necessarily taken out when it goes above).
Therefore, to invest in Italian Small Caps you need to know what you’re doing: you can’t just buy a Mid/Small Cap ETF – which is what a lot of people did in the first quarter of this year, after the launch of PIR accounts (similar to UK ISAs), taking the Lyxor FTSE Italia Mid Cap ETF from 42 to 469 million.
To this I would add: you can’t just buy a fund tracking the Star index either (there are a couple): to own a stock just because it is part of an index makes no sense – more on this in the next post.
Fisher’s Bias – focusing on a low FPR without regard to TPR – is the mirror image of the Confirmation Bias – focusing on a high TPR without regard to FPR. They both neglect the fact that what matters is the ratio of the two – the Likelihood Ratio. As a result, they both give rise to major inferential pitfalls.
The Confirmation Bias explains weird beliefs – the ancient Greeks’ reliance on divination and the Aztecs’ gruesome propitiation rites, as well as present-day lunacies, like psychics and other fake experts, superstitions, conspiracy theories and suicide bombers, alternative medicine and why people drink liquor made by soaking a dried tiger penis, with testicles attached, into a bottle of French cognac.
Fisher’s Bias has no less deleterious consequences. FPR<5% hence PP>95%: ‘We have tested our theory and found it significant at the 5% level. Therefore, there is only a 5% probability that we are wrong.’ This is the source of a deep and far-reaching misunderstanding of the role, scope and goals of what we call science.
‘Science says that…’, ‘Scientific evidence shows that…’, ‘It has been scientifically proven that…’: the view behind these common expressions is of science as a repository of established certainties. Science is seen as the means for the discovery of conclusive evidence or, equivalently, the accumulation of overwhelmingly confirmative evidence that leaves ‘no room for doubt or opposition‘. This is a treacherous misconception. While truth is its ultimate goal, science is not the preserve of certainty but quite the opposite: it is the realm of uncertainty, and its ethos is to be entirely comfortable with it.
Fisher’s Bias sparks and propagates the misconception. Evidence can lead to certainty, but it often doesn’t: the tug of war between confirmative and disconfirmative evidence does not always have a winner. By equating ‘significance’ with ‘certainty beyond reasonable doubt’, Fisher’s Bias encourages a naïve trust in the power of science and a credulous attitude towards any claim that manages to be portrayed as ‘scientific’. In addition, once deflated by the reality of scientific controversy, such trust can turn into its opposite: a sceptical view of science as a confusing and unreliable enterprise, propounding similarly ‘significant’ but contrasting claims, all portrayed as highly probable, but in fact – as John Ioannidis crudely puts it – mostly false.
Was Ronald Fisher subject to Fisher’s Bias? Apparently not: he stressed that ‘the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis’, immediately adding that ‘if an experiment can disprove the hypothesis’ it does not mean that it is ‘able to prove the opposite hypothesis.’ (The Design of Experiments, p. 16). However, the reasoning behind such conclusion is typically awkward. The opposite hypothesis (in our words, the hypothesis of interest) cannot be tested because it is ‘inexact’ – remember in the tea-tasting experiment the hypothesis is that the lady has some unspecified level of discerning ability. But – says Fisher – even if we were to make it exact, e.g. by testing perfect ability, ‘it is easy to see that this hypothesis could be disproved by a single failure, but could never be proved by any finite amount of experimentation’ (ibid.). Notice the confusion: saying that FPR<5% disproves the null hypothesis but FPR>5% does not prove it, Fisher is using the word ‘prove’ in two different ways. By ‘disproving’ the null he means considering it unlikely enough, but not certainly false. By ‘proving’ it, however, he does not mean considering it likely enough – which would be the correct symmetrical meaning – but he means considering it certainly true. That’s why he says that the null hypothesis as well as the opposite hypothesis are never proved. But this is plainly wrong and misleading. Prove/disprove is the same as accept/reject: it is a binary decision – doing one means not doing the other. So disproving the null hypothesis does mean proving the opposite hypothesis – not in the sense that it is certainly true, but in the correct sense that it is likely enough.
Here then is Fisher’s mistake. If H is the hypothesis of interest and not H the null hypothesis, FPR=P(E|not H) – the probability of the evidence (e.g. a perfect choice in the tea-tasting experiment) given that the hypothesis of interest is false (i.e. the lady has no ability and her perfect choice is a chance event). Then saying that a low FPR disproves the null hypothesis is the same as saying that a low P(E|not H) means a low P(not H|E). But since P(not H|E)=1–P(H|E)=1–PP, then a low FPR means a high PP, as in: FPR<5% hence PP>95%.
Hence yes: Ronald Fisher was subject to Fisher’s Bias. Despite his guarded and ambiguous wording, he did implicitly believe that 5% significance means accepting the hypothesis of interest. We have seen why: prior indifference. Fisher would not contemplate any value of BR other than 50%, i.e. BO=1, hence PO=LR=TPR/FPR. Starting with prior indifference, all is needed for PP=1-FPR is error symmetry.
Fisher’s Bias gives rise to invalid inferences, misplaced expectations and wrong attitudes. By setting FPR in its proper context, our Power surface brings much needed clarity on the subject, including, as we have seen, Ioannidis’s brash claim. Let’s now take a closer look at it.
Remember Ioannidis’s main point: published research findings are skewed towards acceptance of the hypothesis of interest based on the 5% significance criterion. Fisher’s bias favours the publication of ‘significant’ yet unlikely research findings, while ‘insignificant’ results remain unpublished. As we have seen, however, this happens for a good reason: it is unrealistic to expect a balance, as neither researchers nor editors are interested in publishing rejections of unlikely hypotheses. What makes a research finding interesting is not whether it is true or false, but whether it confirms an unlikely hypothesis or disconfirms a likely one.
Take for instance Table 4 in Ioannidis’s paper (p. 0700), which shows nine examples of research claims as combinations of TPR, BO and PP, given FPR=5%. Remember the match between our and Ioannidis’s notation: FPR=α, TPR=1-β (FNR=β), BO=R and PP=PPV. For the moment, let’s just take the first two columns and leave the rest aside:
So for example the first claim has TPR=80%, hence LR=16 and, under prior indifference (BO=1, BR=50%), PO=16 and therefore PP=94.1%. In the second, we have TPR=95%, hence LR=19, BO=2 and BR=2/3, hence PO=38 and therefore PP=97.4%. And so on. As we can see, four claims have PP>50%: there is at least a preponderance of evidence that they are true. Indeed the first three claims are true even under a higher standard, with claim 2 in particular reaching beyond reasonable doubt, as it starts from an already high prior, which gets further increased by powerful confirmative evidence. In 3, powerful evidence manages to update a sceptical 25% prior to an 84% posterior, and in 6 to update an even more strongly sceptical prior to a posterior above 50%. The other five claims, on the other hand, have PP<50%: they are false even under the lowest standard of proof, with 8 and 9 in particular standing out as extremely unlikely. Notice however that in all nine cases we have LR>1: evidence is, in various degrees, confirmative, i.e. it increases prior odds to a higher level. Even in the last two cases, where evidence is not very powerful and BR is a tiny 1/1000 – just like in our child footballer story – LR=4 quadruples it to 1/250. The posterior is still very small – the claims remain very unlikely – but this is the crucial point: they are a bit less unlikely than before. That’s what makes a research finding interesting: not a high PP but a LR significantly different from 1. All nine claims in the table – true and false – are interesting and, as such, worth publication. This includes claim 2, where further confirmative evidence brings virtual certainty to an already strong consensus. But notice that in this case disconfirmative evidence, reducing prior odds and casting doubt on such consensus, would have attracted even more interest. Just as we should expect to see a preponderance of studies confirming unlikely hypotheses, we should expect to see the same imbalance in favour of studies disconfirming likely hypotheses. It is the scientific enterprise at work.
Let’s now look at Ioannidis’s auxiliary point: the preponderance of ‘significant’ findings is reinforced by a portion of studies where significance is obtained through data manipulation. He defines bias u as ‘the proportion of probed analyses that would not have been “research findings”, but nevertheless end up presented and reported as such, because of bias’ (p. 0700).
How does Ioannidis’s bias modify his main point? This is shown in the following table, where PP* coincides with PPV in his Table 4:
Priors are the same, but now bias u causes a substantial reduction in LR and therefore in PP. For instance, in the first case u=0.10 means that 10% of research findings supporting the claim have been doctored into 5% significance through some form of data tampering. As a result, LR is lowered from 16 to 5.7 and PP from 94.1% to 85%. So in this case the effect of bias is noticeable but not determinant. The same is true in the second case, where a stronger bias causes a big reduction in LR from 19 to 2.9, but again not enough to meaningfully alter the resulting PP. In the third case, however, an even stronger bias does the trick: it reduces LR from 16 to 2 and PP from 84.2% all the way down to 40.6%. While the real PP is below 50%, a 40% bias makes it appear well above: the claim looks true but is in fact false. Same for 6, while the other five claims, which would be false even without bias, are even more so with bias – their LR reduced to near 1 and their PP consequently remaining close to their low BR.
This sounds a bit confusing so let’s restate it, taking case 3 as an example. The claim starts with a 25% prior – it is not a well established claim and would therefore do well with some confirmative evidence. The appearing evidence is quite strong: FPR=5% and TPR=80%, giving LR=16, which elevates PP to 84.1%. But in reality the evidence is not as strong: 40% of the findings accepting the claim have been squeezed into 5% significance through data fiddling. Therefore the real LR – the one that would have emerged without data alterations – is much lower, and so is the real PP resulting from it: the claim appears true but is false. So is claim 6, thus bringing the total of false claims from five to seven – indeed most of them.
How does bias u alter LR? In Ioannidis’s model, it does so mainly by turning FPR into FPR*=FPR+(1-FPR)u – see Table 2 in the paper (p. 0697). FPR* is a positive linear function of u, with intercept FPR and slope 1-FPR, which, since FPR=5%, is a very steep 0.95. In case 3, for example, u produces a large increase of FPR from 5% to 43%. In addition, u turns TPR into TPR*=TPR+(1-TPR)u, which is also a positive linear function of u, with intercept TPR and slope 1-TPR which, since the TPR of confirmative evidence is higher than FPR, is flatter. In case 3 the slope is 0.2, so u increases TPR from 80% to 88%. The combined effect, as we have seen, is a much lower LR*=TPR*/FPR*, going down from 16 to 2.
I will post a separate note about this model, but the point here is that, while Ioannidis’s bias increases the proportion of false claims, it is not the main reason why most of them are false. Five of the nine claims in his Table 4 would be false even without bias.
In summary, by confusing significance with virtual certainty, Fisher’s Bias encourages Ioannidis’s bias (I write it with a small b because it has no cognitive value: it is just more or less intentional cheating). But Ioannidis’s bias does not explain ‘Why Most Research Findings Are False’. The main reason is that many of them test unlikely hypotheses, and therefore, unless they manage to present extraordinary or conclusive evidence, their PP turns out to be lower and often much lower than 50%. But this doesn’t make them worthless or unreliable – as the paper’s title obliquely suggests. As long as they are not cheating, researchers are doing their job: trying to confirm unlikely hypotheses. At the same time, however, they have another important responsibility: to warn the reader against Fisher’s Bias, by explicitly clarifying that, no matter how ‘significant’ and impressive their results may appear, they are not ‘scientific revelations’ but tentative discoveries in need of further evidence.