On the way back to London from Italy earlier this month, I decided to stop in Basel. It was mid-way and it had long been on the list of cities I wanted to visit. Why it was on that list started to surface as I picked a hotel on Trivago. Euler Hotel – definitely. We arrived in the evening and the boys were keen to get back home. So I only had half a day the following morning.
Basel’s old town centre is quite small and its main landmark is the Münster, a Romanesque church with a long and interesting history. As we waited for its doors to open at 10, I started touring the adjacent cloister. One of the highlights of the place is that Erasmus was buried there in 1536 – a sudden death following an attack of dysentery. But while looking for the grave in the cloister, wandering among tombs and commemorative plates of the city’s notables, one of them gave me a jolt:
Jacob Bernoulli, of course. He was born and lived in Basel his whole life, and died there on 16 August 1705 – morbo chronico, mente ad extremum integra – at the age of 50 years and 7 months.
Jacob – the eldest scion of the prodigious Bernoulli family – is one of my heroes. The author of the greatest masterwork in early probability theory, Ars Conjectandi, he is also credited as the first to discover the relationship between continuous compound interest and Euler’s number e, the base of natural logarithms. There – I suddenly realised – was a big piece of my subconscious attraction to Basel. Enchanted by my discovery, I asked my second child to pose for a photo next to the tombstone – my elder son was wandering somewhere else, supremely bored and impatiently waiting for lunch and departure.
After leaving the cloister, unable to come up with anything intelligible to say about Bernoulli, I told the kids about Erasmus and Paracelsus – another illustrious Basler. At 10 we visited the church – Erasmus’s grave is inside – and shortly after I realised my time was up – the children would have killed me if I had proposed any more ‘history stuff’. So we walked back to the Euler Hotel – Leonhard Euler was born in Basel two years after Jacob Bernoulli’s death. He was the first to use the letter e for the base of natural logarithms, apparently as the first letter of ‘exponential’, rather than of ‘Euler’. He also established the notation for π and for the imaginary number i, all beautifully joined together in Euler’s identity eiπ+1=0.
On the road to London, I kept thinking with delight at my semi-serendipitous encounter with Bernoulli. Then it struck me: I had seen that tombstone before. Back home, I checked. I was right: it was in one of the best books I have ever read, Eli Maor’s e: The Story of a Number.
As I reopened the book, it all came back to me: the Spira Mirabilis.
The logarithmic spiral is the curve r=aebθ in polar coordinates (r is the radius from the origin, θ is the angle between the radius and the horizontal axis, and a and b are parametric constants). Bernoulli had a lifelong fascination with the self-similar properties of the spiral:
But since this marvellous spiral, by such a singular and wonderful peculiarity, pleases me so much that I can scarce be satisfied with thinking about it, I have thought that it might be not inelegantly used for a symbolic representation of various matters. For since it always produces a spiral similar to itself, indeed precisely the same spiral, however it may be involved or evolved, or reflected or refracted, it may be taken as an emblem of a progeny always in all things like the parent, simillima filia matri. Or, if it is not forbidden to compare a theorem of eternal truth to the mysteries of our faith, it may be taken as an emblem of the eternal generation of the Son, who as an image of the Father, emanating from him, as light of light, remains ὁμοούσιος [consubstantial] with him, howsoever overshadowed. Or, if you prefer, since our spira mirabilis remains, amid all changes, most persistently itself, and exactly the same as ever, it may be used as a symbol, either of fortitude and constancy in adversity, or, of the human body; which after all its changes, even after death, will be restored to its exact and perfect self; so that, indeed, if the fashion of imitating Archimedes were allowed in these days, I should gladly have my tombstone bear this spiral, with the motto, Though changed, I rise again exactly the same, Eadem numero mutata resurgo.
This is the full quote from a paper by Reverend Thomas Hill published in 1875 (p. 516-517), from which Maor’s book takes an extract (p. 126-127), taken in turn from another book. Hill did not quote the source, but the original Latin quote can be found here (p. 185-186, available here), with the indication that it comes from a paper published by Bernoulli in the Leipsic Acts in 1692, which should be found here.
Bernoulli’s enthusiasm is easy to understand and to share. The logarithmic spiral is found in nature and art. The Golden spiral, whose growth factor b is the golden ratio, is a special case, and so is the circle as b tends to 0.
At the same time, it is difficult not to laugh at the manner in which Bernoulli’s wish was finally granted. Perhaps confused by the reference to Archimedes, the appointed mason cut an Archimedean spiral at the bottom of the tombstone, which has none of the properties Bernoulli so admired in the logarithmic spiral. And, to add insult to injury, he missed the word ‘numero’ from the motto. Bloody builders – always the same…
Bernoulli’s considerations made an impression on me when I first read Maor’s book. The spira mirabilis as a symbol of fortitude and constancy in adversity, or of the human body restored to its perfect self even after death. But after reading the passage in its entirety, I find it even more beautiful and inspiring. And how about taking a picture of my son – simillimus filius patri – next to the spiral, before any of this had come back to my mind?
By the way, my son’s name is Maurits, like (but not named after) M. C. Escher.
When I arrived at the Drayton Arms, he was already there. He had contacted me a few days earlier and we had arranged to meet for a drink. He worked for a head hunting firm, focused only – he was keen to specify – on investment management. After the introductory chit-chat, I made it clear that I was not interested in a job offer, and he made it clear that his purpose was to present his services to my firm’s potential hiring needs. With that out of the way, the conversation moved on amiably, flowing from market conditions to value investing, Brexit and other world affairs.
Until at one point – I can’t remember how and why – we veered towards terrorism, and from there to 9/11. “Of course” – said Sandeep, with the casual air of someone who is sharing the obvious among world-savvy, knowledgeable people, “it was clearly an inside job”.
“What? What do you mean?” – I looked at him straight in the eyes.
“What? You don’t think so?” – Sandeep was genuinely taken aback by my sudden change of tone. Which, I agree, requires some explaining.
I have a Spinozan tolerance for freedom of opinion. It is the essence of Bayes: different priors, different information, or different interpretations of the same information, can give rise to different conclusions. This is obvious, and there is nothing wrong with it. But of course it doesn’t mean that anything goes. It means that, even when I have a strong view, I hold on to Cromwell’s rule and remain open to the possibility that, however high in my mind is the probability that I am right, I may be mistaken. As we know, hypothesis testing is the result of a tug of war between confirmative and disconfirmative evidence, which accumulates multiplicatively, leaving the possibility that, however overwhelming the evidence may be on one side, it may be annihilated by even one piece of conclusive evidence on the other. Another consequence of this framework is that, while I strive for certainty, I am comfortable with uncertainty: if neither side is strong enough to win the tug of war, there is nothing wrong with accepting that a hypothesis is only probably right, and therefore also probably wrong.
It is important to remember, however, that this only works insofar as one makes sure that evidence accumulation is as thorough as possible on both sides. This is easy to understand: there is no point gathering a lot evidence on one side while neglecting to do it on the other. One side will win nothing but a rigged game. But it is far from easy to do it in practice, as it requires fighting our natural tendency to succumb to the Confirmation Bias. The easier one side seems to be winning, the stronger should be our urge to reinforce the other side. It is by winning an ever tougher tug of war that we can aim to approach certainty.
This is an aptitude I have learned to nurture. The more I am convinced about something, the more I like to explore the other side, trying to distil its best arguments. If this succeeds in lowering my confidence, so be it: I feel richer, not poorer. And if it doesn’t, I am richer anyway, as I have built a clearer picture of what the other side stands on. This, after all, is what understanding means – distinct from justifying and, more so, from agreeing. The better one understands an argument, the easier it becomes to dismantle it and, perhaps, convince people on the other side to change their mind.
This is where sometimes I fail to keep my composure: when I face a conviction based on a pile of one-sided arguments, typically soaked in hyperbolic language, which blatantly misrepresents, disregards or belittles the other side. But what really gets on my nerves is a dirtier trick: when the balance of evidence is overtly on one side, the only way to overturn the verdict is to find – or, failing that, make up – a conclusive piece of evidence on the other side. This is the standard trick employed by conspiracy theorists: I call them Conclusionists, and the pit they fall into a conclusive evidence trap.
That’s what happened with Sandeep.
“Of course I don’t think so!” I replied. “How can you say such a … thing?” I asked, working on resisting my own adjectival overpouring. He looked at me with candid disbelief. How could I be so naïve? The web is full of information about it – he said. And when I asked him to give me an example, he explained: “Of course it is not in the usual places. You need to know where to look”.
Oh my God. One tends to imagine Conclusionists as showing some exterior signs of dimwittedness. But there he was, a perfectly nice, bright-looking guy, splattering such shocking bullshit. As he excused himself to the men’s room, I tried to collect myself. But failed miserably. “So Sandeep” I asked him as he came back, even before he could regain his seat “Who killed JFK? And what about those moon landings? And the Illuminati? It’s all down to Queen Elisabeth, eh?” I deserved a sonorous expletive. But Sandeep was a gentleman, and perhaps he had regretted his own condescension over his micturating interval. “I see your point” he smiled “I’m not saying that everything you find on internet is true. But…” At which point I grabbed the two seconds void and, after mumbling some sort of apology myself, I cleared the air with a liberating “Anyway…” followed by a question about salaries, as if the whole interlude had never happened. The conversation resumed its cordial tone and carried on for a while, until it was time to go. We departed with the inevitable “Let’s keep in touch”. I never heard from him since.
They arrived in the morning, bright and early. The dishwasher had been acting strangely, so I had finally called in the engineers to figure out what was going on. I like to fix these things myself around the house, but this time, after fiddling in vain for a few days, I had given up.
“‘morning, Sir – how can we help?” Doug, the senior of the duo, had the reassuring air of the expert who has seen it all.
“Well, this is what’s happening” I started, hopeful but sceptical that Doug would immediately find an obvious explanation. “The washing cycle does not end properly. As you can see, it stops in the middle, with water still lying at the bottom. It’s not the filter or anything like that” I added, making it clear that I knew my stuff. “Sometimes, after I open and close the door a couple of times, it restarts and goes on to the end. But other times, like today, it just stops”.
“Let’s take a look” said Doug, and with a nod and a whisper instructed his younger mate Trevor to check under the sink. At this point I left, one because the children had woken up and two because watching Trevor puffing and laying his giant tattooed belly on the cold marble floor was a bit too much so early in the morning. “Call me if you need me” I said. But I had hardly greeted the kids that Doug called me back. “Here it is, Sir” – the dishwasher was working again. “It was the connection to the water drain. It is shared with the washing machine and sometimes it can be a bit too much, you know. Anyway, we’ve changed it around and it should not happen again. But remember never to use the dishwasher and the washing machine at the same time”.
“Ow…kay” I said, trying to conceal my puzzlement and following Doug’s invitation to look under the sink at the result of Trevor’s manipulation. I couldn’t see any difference – and I had never used the two machines simultaneously. “Are you sure?” I wanted to ask, but I refrained – Doug looked very sure, and ready to leave. “Thank you very much” was all I could say. “Pleasure, Sir” said Doug, “it should be alright but we’re here if you need us. Have a good day”.
Alas, the little hope I had for a quick solution soon faded away. The dishwasher finished the cycle that Doug and Trevor had managed to restart, but the next one flopped again in the middle, as I found out the following morning. A little door banging helped it to the end, and so it did in next few days. But the whole process soon became increasingly irritating: sometimes everything worked fine, sometimes the machine stopped and restarted by itself as I entered the kitchen, and some other times I had to keep banging the door. A week later I called back.
“Sorry guys” I apologised on the phone as I explained that their fixing wasn’t working. “No worries, Sir. We’ll be there tomorrow early in the morning”. So that evening I started a new cycle, with the intent of showing them the result in the morning and creating the ideal conditions for a new assessment.
I got out of bed as they rang the bell. They came in and we walked to the kitchen. One, two, three: I opened the dishwasher door, ready to show them the usual stagnant pool of water. Et voilà: no water. This time the cycle had ended properly. “No problem at all, Sir” said Doug, helping to alleviate my evident embarrassment. “We’ll put it down as ‘Intermittent Malfunctioning'”.
As they left with what I couldn’t help interpreting as a wry smile of amusement, I started contemplating my life with an erratically faulty dishwasher. Sure enough, the stop and go resumed. But what was the point of calling them again? So I kept going for a while, banging and cursing. Until one day it all came to an end. No banging, no lights, nothing. The machine was completely dead, and an increasingly smelly sludge at the bottom left me no alternative to calling Doug once again, with a view to arranging for a replacement.
This time Doug came alone, and after a few fearful moments in which I was dreading a new mysterious restart, he declared death himself. He took away the wooden bar under the dishwasher and started fiddling with its feet, exploring ways to slide it out of its casing. I left him again, and again he soon called me back. “Here, Sir” – the dishwasher had come back to life. To my befuddlement and consternation, Doug offered a new explanation: “You see, Sir, it all has to do with the alignment of the feet. They have laid the machine on MDF – that’s not the correct way, they should have used a harder material. With time, the feet have sunk a bit into the wood, enough to misalign the door closing. That’s why banging works sometimes. I have now raised the feet a bit so it’s all back in line. If this doesn’t work, the next thing is to replace the door, but I will not do it myself – I tried it once, but the hinges snapped back and I almost lost my finger. Anyway, I don’t think it will be necessary. I believe I figured it out – it’s amazing how one keeps learning after all these years”.
Oh well. I didn’t know what to make of Doug’s new theory, but he had managed to raise my hopes a bit. Once again, I would have the evidence in the morning. But later in the day I received a phone call. It was an electrician, who explained that he had been instructed by Doug to look at the dishwasher’s plug and asked whether he could come in the afternoon for a check. I was confused – Doug had said nothing to me about the plug. But why not? The whole thing was starting to reveal an amusing side.
As the young electrician came in, I gave him an abridged version of the saga. He nodded, quite uninterested, and set out to slide out the dishwasher to reach for the plug, which he had figured out was right behind it. After a few minutes he called me back. “Here, have a look” he said, with a quiet smile. The plug was stuck to the rear of the dishwasher, its plastic back partially melted and fused into it:
The mystery was finally and completely solved. And, as in the best detective stories, the explanation was simple and totally unexpected. The plug, stuck to the back, would intermittently lose contact with the socket due to the dishwasher’s vibration in mid cycle. That’s why door banging helped – it restored contact, as sometimes did just walking back into the kitchen, as floor vibration was enough to produce the same effect. All the electrician had to do was to move the socket to the side panel and reinsert the plug there. A dishwasher that was about to be chucked away is now in perfect shape and flawlessly performing its wonders.
So much for Doug’s theories. He had first tried a routine explanation – one that would probably fit most similar cases – but received disconfirming evidence from me. He then got confirming evidence from his own observation – a treacherous occurrence in many circumstances. Then, when a new piece of disconfirming evidence arrived, he built a new theory around it that seemed to fit the facts. This was as wrong as the first – and even more so, as it lacked generality and was created on the spot.
To his credit, however, Doug was crucial to finding the truth. I don’t know why he didn’t tell me about the plug – maybe it was late lateral thinking, or maybe he had it in mind but didn’t want to spoil his new-fangled theory – or simply, with no Trevor around, he didn’t feel like going through the motions of sliding the machine out.
Be that as it may, Doug was a true scientist. The search for the truth proceeds neither by deduction nor by induction but – in Charles Sanders Peirce’s somewhat awkward phrasing – abduction. We test hypotheses to produce explanations and select those that provide the best explanation of the observed evidence. The key to the process is to be open to revising and possibly rejecting any explanation in the light of the observed evidence. But a true scientist goes further: he actively looks for evidence that would reject his best theory and only stops when he finds conclusive evidence. In our dishwasher tale – a true story – the fused plastic plug was a Smoking Gun: evidence that conclusively explained the dishwasher’s strange behaviour. Hence we say it was the cause of such behaviour. I sent the picture to Doug’s phone but got no reply – I can’t remember, but perhaps, unlike his owner, the phone is not a smart one.
Investment risk is the probability of a substantial and permanent loss of capital. We buy a stock at 100 expecting to earn a return, consisting of appreciation and possibly a stream of dividends. But our expectation may be disappointed: the price may go down rather than up and we may decide to sell the stock at a loss, either because we need the money or because we come to realise, rightly or wrongly, that we made a mistake and the stock will never reach our expected level.
How does investment risk relate to volatility – the standard deviation of past returns, measuring the extent to which returns have been fluctuating and vibrating around their mean? Clearly, we prefer appreciation to be as quick and smooth as possible. If our expected price level is, say, 150, we would like the stock to reach the target in a straight line rather than through a tortuous rollercoaster. On the other hand, if we are confident that the price will get there eventually, we – unimpressionable grownups – may well endure the volatility. In fact, if on its way to 150 the price dropped to 70 it would create an inviting opportunity to buy more.
Volatility increases investment risk only insofar as it manages to undermine our confidence. We might have rightly believed that Amazon was a great investment at 85 dollars in November 1999, but by the time it reached 6 two years later our conviction would have been brutally battered. Was there any indication at the time that the stock could have had such a precipitous drop? Sure, the price had been gyrating wildly until then, up 21% in November, down 12% in October, up 29% in September and 24% August, down 20% in July, and so on. The standard deviation of monthly returns since the IPO had been 33%, compared to 5% for the S&P500, suggesting that further and possibly more extreme gyrations were to be expected. But to a confident investor that only meant: tighten your seatbelt and enjoy the ride. A 93% nosedive, however, was something else – more than enough to break the steeliest nerves and crush the most assured resolve. ‘I must be wrong, I’m out of here’ is an all too human reaction in such circumstances.
Therefore, while volatility may well contribute to raise investment risk, it is not the same as investment risk. It is only when – rightly or wrongly – conviction is overwhelmed by doubt and poise surrenders to anxiety that investment risk bears its bitter fruit.
Amazon is a dramatic example, but this is true in general. Every investment is made in the expectation of making a return, together with a more or less conscious and explicit awareness that it may turn out to be a flop. Every investor knows this, in practice. So why do many of them ignore it in theory and keep using financial models built on the axiom that volatility equals investment risk? As we have seen, the reason is the intellectual dominance of the Efficient Market Theory.
So the next question is: Why is it that, according to the EMT, investment risk coincides with volatility? The answer is as simple as it is unappreciated. Let’s see.
If the EMT could be summarised in one sentence, it would be: The market price is right. Prices are always where they should be. Amazon at 85, 6 or 1000 dollars. The Nasdaq at 5000, 1400 or 6400. At each point in time, prices incorporate all available information about expected profits, returns and discount rates. Prices are never too high or too low, except with hindsight. Therefore, an investor who buys a stock at 100 because he thinks it is worth 150 is fooling himself. Nobody can beat the market. If the market is pricing the stock at 100, then that’s what it’s worth. The price will change if and only if new information – unknown and unknowable beforehand and therefore not yet incorporated into the current price – prompts the market to revise its valuation. As this was true in the past as it is true in the present and will be true in the future, past price changes must also have been caused by no other reason than the arrival of information that was new at the time and unknown until then. Thus all price changes are unknowable and, by definition, unexpected. And since price changes are the largest components of returns – the other being dividends, which can typically be anticipated to some extent – we must conclude that past returns are largely unexpected. At this point there is only one last step: to identify risk with the unexpected. If we define investment risk as anything that could happen to the stock price that is not already incorporated into its current level, then the volatility of past returns can be taken as its accurate measure.
Identifying investment risk with volatility presupposes market efficiency. This is part of what Eugene Fama calls the joint hypothesis problem. To be an active investor, thus rejecting the EMT in practice, while at the same time using financial models based on the identification of investment risk with volatility, thus assuming the EMT in theory, is a glaring but largely unnoticed inconsistency.
So the next question is: what is it that practitioners know and makes them behave as active investors, and EMT academics ignore and leads them to declare active investment an impossible waste of time and to advocate passive investment?
Again, the answer is simple but out of sight. In a nutshell: Practitioners know by ample experience that investors have different priors. EMT academics assume, by theoretical convenience, that investors have common priors.
Different priors is the overarching theme of the entire Bayes blog. People can and do reach different conclusions based on the same evidence because they interpret evidence based on different prior beliefs. This is blatantly obvious everywhere, including financial markets, where, based on the same information, some investors love Amazon and some other short it. In the hyperuranian realm of the EMT, on the other hand, investors have common priors and therefore, when faced with common knowledge, cannot but reach the same conclusion. As Robert Aumann famously demonstrated, they cannot agree to disagree. This is why, in EMT parlance, prices reflect all available information.
Take the assumption away and the whole EMT edifice comes tumbling down. This is what Paul Samuelson was referring to in the final paragraphs on the Fluctuate and Vibrate papers. More explicitly, here is how Jonathan Ingersoll put it in his magisterial Theory of Financial Decision Making, immediately after ‘proving’ the EMT:
In fact, the entire “common knowledge” assumption is “hidden” in the presumption that investors have a common prior. If investors did not have a common prior, then their expectations conditional on the public information would not necessarily be the same. In other words, the public information would properly also be subscripted as φk – not because the information differs across investors, but because its interpretation does.
In this case the proof breaks down. (p. 81).
Interestingly, on a personal note, I first made the above quotation in my DPhil thesis (p. 132). A nice circle back to the origin of my intellectual journey.
As he wrote his ‘Challenge to Judgment’ on the first issue of the Journal of Portfolio Management in 1974, Paul Samuelson expected ‘the world of practical operators’ and ‘the new world of academics’ – which at the time looked to him ‘still light-years apart’ – to show some degree of convergence in the future.
On the face of it, he was right. The JPM recently celebrated its 40th anniversary. The Financial Analysts Journal, started with the same bridging intent 30 years earlier under Ben Graham’s auspices, is alive and well on its 73rd Volume. Dozens other periodicals have joined in the effort and hundreds of books and manuals have been written, sharing the purpose of promoting and developing a common language connecting the practice and the theory of investing.
But, while presuming and pretending to understand each other, the two worlds are still largely immersed in a sea of miscommunication. At the base of the Babel there are two divergent perspectives on the relationship between risk and return. Everybody understands return. You buy a stock at 100 and the price goes up to 110 – that’s a 10% return. But this is ex post. What was your expected return before you bought? And what risk did you assume? The practical operator does not have precise answers to these questions. I looked at the company – he would say – studied its business, read its balance sheet, talked to the managers, did my discounted cash flow valuation and concluded that the company was worth more than 100 per share. So I expected to earn a good return over time, roughly equal to the gap between my intrinsic value estimate and the purchase price. As for risk, I knew my valuation could be wrong – the company might be worth less than I thought. And even if I was right at the time of purchase, the company and my investment might have taken wrong turns in myriads different ways, causing me to lose some or all of my money.
Is that it? – says the academic – is that all you can say? Of course not – replies the operator – I could elaborate. But I couldn’t do it any better than Ben Graham: read his books and you’ll get all the answers.
But the academic would have none of it. As Eugene Fama recalls: ‘Without being there one can’t imagine what finance was like before formal asset pricing models. For example, at Chicago and elsewhere, investment courses were about security analysis: how to pick undervalued stocks’. (My Life in Finance, p. 14). Go figure. Typically confusing science with precision, the academic is not satisfied until he can squeeze concepts into formulas and insights into numbers. I don’t know what to do with Graham’s rhetoric – he says – I need measurement. So let me repeat my questions: what was your expected return exactly? How did you quantify your risk?
Give me a break – says the defiant operator – risk is much too complex to be reduced to a number. As for my expected return, I told you it is the gap between value and price, but I am under no illusion that I know it exactly. All I know is that the gap is large enough and I am prepared to wait until it closes.
Tut-tut – Fama shakes his head – Listen to me, you waffly retrograde. I will teach you the CAPM. ‘The CAPM provides the first precise definition of risk and how it drives expected return, until then vague and sloppy concepts’. (p. 15).
The operator listens attentively and in the end says: Sorry, I think the CAPM is wrong. First, you measure risk as the standard deviation of past returns. You do it because it gives you a number, but I think it makes little sense. Second, you say the higher the risk the higher is the expected return. That makes even less sense. My idea of risk is that the more there is the more uncertain I am about my expected return. In my view, the relationship between risk and expected return is, if anything, negative. So thank you for the lecture, but I stick with Graham. As Keynes did not say (again!): It is better to be vaguely right than precisely wrong.
Writing ten years after Samuelson’s piece, Warren Buffett well expressed the chasm between academics and practical operators: ‘Our Graham & Dodd investors, needless to say, do not discuss beta, the capital asset pricing model or covariance in returns among securities. These are not subjects of any interest to them. In fact, most of them would have difficulty defining those terms. The investors simply focus on two variables: price and value’. (Buffett, Superinvestors, p. 7).
But operators are rarely so blunt. Such is the intellectual authority of the Efficient Market Theory that the identification of risk with the standard deviation of returns – a.k.a. volatility – and the implication that more risk means higher returns are taken for granted and unthinkingly applied to all sorts of financial models. Hilariously, these include the same valuations that investment practitioners employ to justify their stock selection – an activity that makes sense only if one rejects the EMT! It is pure schizophrenia: investors unlearn at work what they learned at school, while at the same time continuing to use many of the constructs of the rejected theory and failing to notice the inconsistency.
But here is the biggest irony: after teaching it for forty years – twenty after Buffett’s piece – Fama finally got it out of his system: ‘The attraction of the CAPM is that it offers powerful and intuitively pleasing predictions about how to measure risk and the relation between expected return and risk. Unfortunately, the empirical record of the model is poor – poor enough to invalidate the way it is used in applications (Fama and French, JEP 2004). Hallelujah. Never mind that in the meantime the finance world – academics and practitioners – had amassed a colossal quantity of such applications and drawn an immeasurable variety of invalid conclusions. But what is truly mindboggling is that, in spite of it all, the CAPM is still regularly taught and widely applied. It is hard to disagree with Pablo Fernandez – a valiant academic whose work brings much needed clarity amidst the finance Babel – when he calls this state of affairs unethical:
If, for any reason, a person teaches that Beta and CAPM explain something and he knows that they do not explain anything, such a person is lying. To lie is not ethical. If the person “believes” that Beta and CAPM explain something, his “belief” is due to ignorance (he has not studied enough, he has not done enough calculations, he just repeats what he heard to others…). For a professor, it is not ethical to teach about a subject that he does not know enough about.
Two books that I think are particularly effective in helping operators move from practical unlearning – erratic, undigested and incoherent – to proper intellectual unlearning of the concept of risk embedded in the EMT and its derivations are David Dreman’s Contrarian Investment Strategies: the Next Generation (particularly Chapter 14: What is Risk?) and Howard Marks’ The Most Important Thing (particularly Chapters 5-7 on Understanding, Recognizing and Controlling Risk).
Besides the EMT’s predominance, unlearning is necessary because, at first glance, measuring risk with the standard deviation of returns makes intuitive sense: the more prices ‘fluctuate’ and ‘vibrate’, the higher the risk. Take Amazon:
If you had invested 30,000 dollars in Amazon’s IPO in May 1997 (it came out at 18 dollars, equivalent to 1.50 dollars after three splits), after twenty years – as the stock price reached 1,000 dollars (on 2nd June this year, to be precise) – your investment would have been be worth 20 million dollars. Everybody understands return. But look at the chart – in log scale to give a graphic sense of what was going on: 1.50 went to 16 in a year (+126% in one month – June 1998) to reach 85 in November 1999. Then in less than two years – by September 2001 – it was down to 6, only to climb back to 53 at the end of 2003, down to 27 in July 2006, up to 89 in October 2007, down to 43 in November 2008 and finally up – up up up – to 1000. Who – apart from Rip van Winkle and Jeff Bezos – would have had the stomach to withstand such infernal rollercoaster?
So yes, in a broad sense, volatility carries risk. The more violent the price fluctuations, the higher is the probability that, for a variety of psychological and financial circumstances – he may get scared and give up on his conviction, or he may need to liquidate at the wrong time – an investor might experience a catastrophic loss. But how can such probability be measured? The routine, automatic answer is: the standard deviation of returns. Here is the picture:
The graph on the left is the cumulative standard deviation of monthly returns from May 1997 (allowing for an initial 12-month data accumulation) to May 2017, for Amazon and for the S&P500 index. The graph on the right shows the 12-month rolling standard deviations. The cumulative graph, which uses the maximal amount of data, shows that while the monthly standard deviation of the S&P500 has been stable at around 5%, Amazon’s standard deviation has been, after an initial peak, steadily declining ever since, although it still remains about four times that of the index (18.4% vs. 4.4%). The 12-month rolling version shows a similar gap, with Amazon’s standard deviation currently about three times that of the S&P500 (5.1% vs. 1.8%).
What does this mean? Why is it relevant? What can such information tell us about the probability that, if we buy Amazon today, we may incur a big loss in the future? A moment’s thought gives us the answer: very little. Clearly, today’s Amazon is a completely different entity compared to its early days in the ’90s. Using any data from back then to guide today’s investment decision is nothing short of mindless. Amazon today is not four times as risky as the market, as it wasn’t five times as risky in November 2008. Nor is it three times as risky, as implied by the 12-month rolling data. The obvious point is that the standard deviation of returns is a backward-looking, time-dependent and virtually meaningless number, which, contrary to the precision it pretends to convey, has only the vaguest relation to anything resembling what it purports to measure.
The same is true for the other CAPM-based, but still commonly used measure of risk: beta. Here is Amazon’s beta versus the S&P500 index, again cumulative and on a 12-month rolling basis:
Again, the cumulative graph shows that Amazon’s beta has always been high, though it has halved over time from 4 to 2. So is Amazon a high beta stock? Not according to the 12-month rolling measure, which today is 0.4 – Amazon is less risky than the market! – but has been all over the place in the past, from as high as 6.7 in 2007 to as low as -0.4 in 2009. Longer rolling measures give a similar picture. What does it mean? Again, very little. According to the CAPM, Amazon’s beta is supposed to be a constant or at least stable coefficient, measuring the stock’s sensitivity to general market movements. But in reality it is nothing of the kind: like the standard deviation of returns, beta is just an erratic, retrospective and ultimately insignificant number.
Volatility implies risk. But reducing risk to volatility is wrong, ill-conceived and in itself risky, as it inspires the second leg of the CAPM misconception: the positive relationship between risk and expected return. ‘Be brave, don’t worry about the rollercoaster – you’ll be fine in the end and you’ll get a premium. The more risk you are willing to bear, the higher the risk premium you will earn.’ Another moment’s reflection is hardly necessary to reveal the foolishness – and to commiserate the untold damage – of such misguided line of reasoning. The operator’s common sense view is correct: once risk is properly defined as the probability of a substantial and permanent loss of capital, the more risk there is the lower – not the higher – is the probability-weighted expected return. This also requires unlearning – often, alas, the hard way.
Despite Samuelson’s best wishes, then, there is far less authentic common ground between operators and academics than what is pretended – in more or less good faith – in both camps. Operators are right: there is much more to risk than volatility and beta, and actual risk earns no premium.
So the next question becomes: what prevents academics from seeing it?
In the latest chapter of his life-long and eventually triumphal effort to promote index investing, John Bogle explains what lays at the foundation of his philosophy: ‘my first-hand experience in trying but failing to select winning managers’ (p. 6). In 1966, as the new 37-year old CEO of Wellington Management Company, Bogle decided to merge the firm with ‘a small equity fund manager that jumped on the Go-Go bandwagon of the late 1960s, only to fail miserably in the subsequent bear market. A great – but expensive – lesson’ (p. 7), which cost him his job.
It reminded me of another self-confessed failure, as recounted by Eugene Fama, who in his young days worked as a stock market forecaster for his economics professor, Harry Ernst: ‘Part of my job was to invent schemes to forecast the market. The schemes always worked on the data used to design them. But Harry was a good statistician, and he insisted on out-of-sample tests. My schemes invariably failed those tests’. (My Life in Finance, p. 3).
I can’t help seeing both incidents as instances of Festinger’s cognitive dissonance. It runs more or less like this: 1) I know a lot about economics and stock markets. 2) I am smart – truth be told, very smart. 3) I could use my brains to predict stock prices/select winning managers and make a lot of money. 4) I can’t. Therefore: it must be impossible. I think this goes a long way towards explaining the popularity and intuitive appeal of the Efficient Market Theory in academia.
Typical academics are keen to take these as conclusive demonstrations – derived from first principles, like Euclidean theorems – of the impossibility of market beating. But the Master knew better. At the end of ‘Fluctuate’ he wrote:
I have not here discussed where the basic probability distributions are supposed to come from. In whose minds are they ex ante? In there any ex post validation of them? Are they supposed to belong to the market as a whole? And what does that mean? Are they supposed to belong to the “representative individual”, and who is he? Are they some defensible or necessitous compromise of divergent expectation patterns? Do price quotations somehow produce a Pareto-optimal configuration of ex ante subjective probabilities? This paper has not attempted to pronounce on these interesting questions.
And at the end of ‘Vibrate’:
In summary, the present study shows (a) there is no incompatibility in principle between the so-called random-walk model and the fundamentalists’ model, and (b) there is no incompatibility in principle between behaviour of stocks’ prices that behave like random walk at the same time that there exists subsets of investors who can do systematically better than the average investors.
Then in 1974 he reiterated the point in crystal clear terms, addressed to both academics and practitioners on the first issue of the Journal of Portfolio Management:
What is at issue is not whether, as a matter of logic or brute fact, there could exist a subset of the decision makers in the market capable of doing better than the averages on a repeatable, sustainable basis. There is nothing in the mathematics of random walks or Brownian movements that (a) proves this to be impossible, or (b) postulates that it is in fact impossible. (Challenge to Judgment, p. 17, his italics).
And for the EMT zealots:
Many academic economists fall implicitly into confusion on this point. They think that the truth of the efficient market or random walk (or, more precisely, fair-martingale) hypothesis is established by logical tautology or by the same empirical certainty as the proposition that nickels sell for less than dimes.
The nearest thing to a deductive proof of a theorem suggestive of the fair-game hypothesis is that provided in my two articles on why properly anticipated speculative prices do vibrate randomly. But of course, the weasel words “properly anticipated” provide the gasoline that drives the tautology to its conclusion. (p. 19).
There goes ‘Bogle’s truth’. And the irony of it is that in his latest piece Bogle reminisces on how, as he read it at the time, ‘Dr. Samuelson’s essay … struck me like a bolt of lightning’ (p. 6). A hard, obnubilating blow indeed.
There was, nevertheless, a legitimate reason for the fulmination. Samuelson’s Challenge to Judgment was a call to practitioners:
What is interesting is the empirical fact that it is virtually impossible for academic researchers with access to the published records to identify any member of the subset with flair. This fact, though not an inevitable law, is a brute fact. The ball, as I have already noted, is in the court of those who doubt the random walk hypothesis. They can dispose of the uncomfortable brute fact in the only way that any fact is disposed of – by producing brute evidence to the contrary. (p. 19).
He was referring to Jensen (1968) and the copious subsequent literature presenting lack of evidence on identifying a consistent subset of long-term outperforming funds. What Samuelson missed, however – and still goes largely unnoticed – is that the ‘risk adjustments’ to fund and index returns used in these studies are based on definitions of risk – as volatility, beta and the like – that presume market efficiency. To his credit, Eugene Fama has always been very clear on this point, which he calls the joint hypothesis problem:
Market efficiency can only be tested in the context of an asset pricing model that specifies equilibrium expected returns. […] As a result, market efficiency per se is not testable. […] Almost all asset pricing models assume asset markets are efficient, so tests of these models are joint tests of the models and market efficiency. Asset pricing and market efficiency are forever joined at the hip. (My Life in Finance, p. 5-6).
Typically, outperforming funds are explained away, and their returns driven to statistical insignificance, by the ‘higher risk’ they are deemed to have assumed. But such risk is defined and measured according to some version of the EMT! It is – as James Tobin wryly put it – a game where you win when you lose (see Tobin’s comment to Robert Merton’s essay in this collection).
It was precisely in defiance of this game that Warren Buffett wrote his marvellous Superinvestors piece, which sits up there next to Ben Graham’s masterwork in every intelligent investor’s reading list. As in his latest shareholder letter, Buffett used the coin-flipping story, fit for humans as well as orangutans, to point out that past outperformance can be the product of chance. But then he drew attention to an important difference:
If (a) you had taken 225 million orangutans distributed roughly as the U.S. population is; if (b) 215 winners were left after 20 days; and if (c) you found that 40 came from a particular zoo in Omaha, you would be pretty sure you were on to something. So you would probably go out and ask the zookeeper about what he’s feeding them, whether they had special exercises, what books they read, and who knows what else. That is, if you found any really extraordinary concentrations of success, you might want to see if you could identify concentrations of unusual characteristics that might be causal factors. (p. 6).
Hence he proceeded to illustrate the track record of his nine Superinvestors, stressing that it was not an ex post rationalisation of past results but a validation of superior stock picking abilities that he had pre-identified ex ante.
So let’s do a thought experiment and imagine that Buffett 2007 went back 40 years to 1967 and wagered a bet: ‘I will give 82,000 dollars (about 500,000 2007 dollars in 1967 money) to any investment pro who can select five funds that will match the performance of the S&P500 index in the next ten years’. Would Buffett 1967 have taken the bet? Sure – he would have said – in fact, I got nine! And after nine years, one year prior to the end of the bet, he would have proclaimed his victory (I haven’t done the calculation on Buffett’s Tables, but I guess it’s right). Now let’s teleport Buffett 2016 to 1976. What would he have said? Would he have endorsed those funds or recommended investing in the then newly launched Vanguard S&P index fund?
Here is then why I am disoriented – and I’m sure I’m not alone – by Mr. Buffett’s current stance on index investing. To be clear: 1) I am sympathetic to his aversion to Buffett impersonators promoting mediocre and expensive hedge funds. 2) I think index funds can be the right choice for certain kinds of savers. 3) I think Jack Bogle is an earnest and honourable man. However, as a grateful and impassioned admirer of Buffett 1984, Buffett 2016 puzzles me. Like the former, the latter agrees with Paul Samuelson against ‘Bogle’s truth’: long term outperformance, while difficult and therefore uncommon – no one denies it – is possible. But while Buffett 1984 eloquently expanded on the ‘intellectual origin’ (p. 6) of such possibility, and on the ex ante characteristics of superior investors, Buffett 2016’s message is: forget about it, don’t fall for ex post performance and stick to index funds.
Notice this is not a message for the general public: it is addressed to Berkshire Hathaway’s shareholders – hardly the know-nothing savers who may be better served by basic funds. Buffett is very clear about this: buying a low-cost S&P500 index fund is his ‘regular recommendation’ (p. 24), to large and small, individual as well professional and institutional investors – noticeably including the trustees of his family estate (2013 shareholder letter, p. 20).
Great! There goes a life-long dedication to intelligent investing. You may as well throw away your copy of Security Analysis. Alternatively, you may disagree with Mr. Buffett – nobody is perfect – and hope he reconsiders his uncharacteristically unfocused analysis. From the Master who taught us how to select good stocks one would expect equivalent wisdom on how to select good funds. It is not the same thing, but there are many similarities. As in stock picking, there are many wrong things one can do in fund picking. Past performance is no guarantee of future performance. Expensive stocks as well as expensive funds deceptively draw investors’ attention. There is no reason why large stocks or large funds should do better than small ones. Don’t go with the crowd. And so on. Similarly, just like Mr. Buffett taught us how to do the right things in stock picking, he could easily impart comparable advice in fund picking.
Here is the first one that comes to mind: look at the first ten stocks in a fund and ask the fund manager why he holds them. If he makes any reference to their index weight, run away.
Look at the top holdings of Italian Equities funds (Azionari Italia) on morningstar.it. They are the same for most of them: ENI, Intesa Sanpaolo, Enel, Unicredit, Luxottica, Assicurazioni Generali, Fiat Chrysler, and so on. Why? Do most fund managers agree that these are the best and most attractive companies quoted on the Italian stock market? No. The reason is that these are the largest companies by market capitalization, and therefore the largest components of the most commonly used Italian Equities index, the FTSE MIB. The same is true for other countries and regions, as well as for sector funds: look at the composition of the relevant index and you will work out a large portion of the funds’ holdings.
To a candid layman this looks very strange. ENI may be a good company, but why should it be as much as 10% of an Italian Equities fund? Surely, a company’s size has nothing to do with how valuable it is as an investment. Aren’t there more attractive choices? And if so, shouldn’t the fund invest in them, rather than park most of the money in the larger companies?
No, is the fund manager’s answer: the fund’s objective is not simply to find attractive investments. It is to obtain over time a better return than its peers and the index. This is what drives investors’ choices, determines the fund’s success and its manager’s reward. To beat the index – says the manager – I have to face it: take it as a neutral position and vary weights around it. So if I think that ENI is fairly valued I will hold its index weight, if I think it is undervalued I will hold more, and if I think it is overvalued I will hold less. How much more or less is up to me. But if ENI is 10% of the index I would have to regard it as grossly overvalued before deciding to hold none of it in the fund. A zero weight would be a huge bet against the index, which, if it goes wrong – ENI does well and I don’t have it – would hurt the fund’s relative performance and my career.
Sorry to insist – says the outspoken layman – but shouldn’t the fund’s performance and your career be better served if you take that 10% and invest it in stocks that you think will do better than ENI? If you do the same with the other large stocks which, like ENI, you hold in the fund just because they are in the index, you may be wrong a few times, but if you are any good at stock picking – and you tell me you are, that’s why I should buy your fund – then surely you are going to do much better than the index. What am I missing?
Look sir, with all due respect – says the slightly irritated manager – let me do my job. You want the fund to outperform, and so do I. So let me decide how best to achieve that goal, if you don’t mind.
I do mind – says the cheeky layman, himself showing signs of impatience. Of course I want you to beat the index. But I want you to do it with all my money, not just some of it. The index is just a measure of the overall market value. If ENI is worth 53 billion euro and the whole Italian stock market is worth 560 billion – less than Apple, by the way – then, sure, ENI is about 10% of the market. But what does that have to do with how much I, you or anybody else should own of it? The market includes all stocks – the good, the bad and the ugly. If you are able to choose the best stocks, you should comfortably do better than the market. If you can’t, I will look somewhere else.
Oh yeah? Good luck with that – the manager has given up his professional demeanour – hasn’t anybody told you that most funds do worse than the index?
Yes, I am aware of it – says the layman – that’s why I am looking for the few funds that can do better. You’re right, if your peers do what you do, I am not surprised they can’t beat the index. But I’ll keep looking. Good bye.
Well done, sir – someone else approaches the layman – let me introduce myself: I am the indexer. You’re right, all this overweight and underweight business is a complete waste of time and money. The reality is that, sooner or later, most funds underperform the index – and they even want to get paid for it! So let me tell you what I do: in my fund, I hold the stocks in the index at exactly their neutral weight, but I charge a small fraction of the other funds’ fees. This way, my fund does better than most other funds, at a much lower cost. How does that sound?
Pretty awful, I must say – says the layman – I am looking for a fund that invests all my money in good stocks and you are proposing one that does none of that and mindlessly buys index stocks. And you call yourself an investor?
Pardon me, but you’re so naïve – says the indexer – I am telling you I do better than most, at a lower cost. What part of the message don’t you understand?
Well, it’s not true – say the layman – and proceeds to show the indexer a list of funds that have done better than the relevant index and the other funds for each category over several periods after all costs – he may be a layman but he’s done his homework.
Oh, that’s rubbish – retorts the indexer – and performs his well-rehearsed coin-tossing gig. These are just the lucky guys who happen to sit on the right tail of the return distribution for a while. Sooner or later, their performance will revert to the mean. And do you know why? Because markets are efficient. Have you heard of the Efficient Market Theory? – he asks with a smug look. There is tons of academic evidence that proves that consistent market beating is impossible.
Yes, I know the EMT – says the layman – and I think it is wrong. Beating the market is clearly difficult – if it were easy everybody could do it, hence nobody would – but it is not impossible. The numbers I just showed you prove my point, and to dismiss them as a fluke is a miserable argument, fit only for haughty academics in need of a soothing answer to a most nagging question: If you’re so smart, why aren’t you rich? Tell me something – continues the layman – what drives market efficiency? Certainly not you, or the other gentleman with his marginal tweaking. You buy any company in the index regardless of price.
Yes – says the indexer, hiding his discomfort – but we are powerful and responsible shareholders and make sure that our voice gets heard.
Give me a break – the layman laughs – companies don’t care about you. They know you have to hold their shares no matter what. You’re the epitome of an empty threat. You don’t even know or care what these companies do. You are not an investor – you’re a free rider.
Ok then – says the indexer (he knew his was a phony argument but he tried it anyway) – what’s wrong with that? If there are enough active investors busy driving prices to where they should be, my passive fund reaps the benefits, my investors pay less and everyone is happy.
You should be ashamed of yourself, you know – says the layman, ready to end his second conversation.
Aw come on now! – blurts the indexer – who’s worse: me, transparently declaring what I do and charging little for it, or the other guy, pretending to be smart, doing worse than me and charging ten times as much?
You’ve got a point there – says the layman – you’re better than him. But you’re not going to get my money either. Good bye.
As you like, it’s your money – says the indexer, before launching his departing salvo: you know, even Warren Buffett says that index investing is the smart thing to do.
I have seen that – says the layman – what was he thinking?
Yes, what was Warren Buffett thinking when in his 2016 shareholder letter he proposed (p. 24) to erect a statue to John Bogle? Let’s see.
Back in the 2005 letter, Buffett prognosticated that active managers would, in aggregate, underperform the US stock market. He was reiterating the ‘fundamental truth’ of index investing. In the latest words of its inventor and proselytiser:
Before intermediation costs are deducted, the returns earned by equity investors as a group precisely equal the returns of the stock market itself. After costs, therefore, investors earn lower-than-market returns. (p. 2)
In its most general sense, this is an obvious tautology: the aggregate return equals the market return by definition. However, ‘Bogle’s truth’ is usually intended to apply as well to mutual funds, which for US equities represent about 20% of the aggregate (see e.g. Figure 2.3, p. 36 here). As such, there is no logical reason why mutual funds should necessarily perform like the market as a group, and worse than the market after costs. In fact, a layman would be justified in expecting professional investors to do better, before and after costs, compared to e.g. households. Whether mutual funds do better than the market is therefore an empirical rather than a logical matter.
The question has a long history, dating back to Jensen (1968) all the way to the latest S&P SPIVA report. Most of these studies make it particularly hard for outperformance to show up. Rather than squarely comparing fund returns to the market index, they either adjust performance for ‘risk’ (Jensen) using the now abandoned CAPM model, or slice and dice fund returns (SPIVA), box them into a variety of categories and compare them to artificial sub-indices. As a result, the commonly held view – reflected in Buffett’s 2005 prediction – is that ‘most funds underperform the market’. From this, the allure of index investing is a small logical step and a seemingly impregnable conclusion. All you need to say is, as Buffett puts it (p. 24):
There are, of course, some skilled individuals who are highly likely to out-perform the S&P over long stretches. In my lifetime, though, I’ve identified – early on – only ten or so professionals that I expected would accomplish this feat.
There are no doubt many hundreds of people – perhaps thousands – whom I have never met and whose abilities would equal those of the people I’ve identified. The job, after all, is not impossible. The problem simply is that the great majority of managers who attempt to over-perform will fail. The probability is also very high that the person soliciting your funds will not be the exception who does well.
Further complicating the quest for worthy managers – says Buffett – is the fact that outperformance may well be the result of luck over short periods, and that it typically attracts a torrent of money, which the manager gladly accepts to his own benefit, thus making future returns more difficult to sustain.
The bottom line: When trillions of dollars are managed by Wall Streeters charging high fees, it will usually be the managers who reap outsized profits, not the clients. Both large and small investors should stick with low-cost index funds.
It was on this basis that Buffett followed his 2005 prophesy by offering a bet to any investment professional able to select at least five hedge funds that would match the performance of a Vanguard S&P500 index fund over the subsequent ten years. He called for hedge funds, which represent an even smaller portion of the US equity investor universe, as he considers them as the most strident example of divergence between bold return promises – reflected in hefty fees – and actual results. Most hedge funds do not set beating the S&P500 as their stated objective, preferring instead to target high returns independent of market conditions. But Buffett’s call was right: what’s the point of charging high fees if you can’t deliver more than index returns? At the same time, presumably he would not have objected to betting against long-only active funds explicitly managed to achieve S&P500 outperformance.
What followed – said Buffett – was the sound of silence. This is indeed surprising. Hedge fund managers’ objectives may be fuzzier, but if you manage a long-only US equity fund with a mandate to outperform the S&P500 and you genuinely believe you can do it, what better promotional opportunity is there than to bet against Warren Buffett and win?
Be that as it may, only one manager took up the challenge. And – bless him – he did not choose five long-only funds, nor five hedge funds, but five funds of hedge funds: he picked five funds that picked more than 100 hedge funds that picked thousands of stocks. Nothing wrong with that, in principle. Presumably, each of the five funds of funds managers believed he could select a portfolio of hedge funds that, at least on average, would do so much better than the S&P500 that, despite the double fee layer, it would itself end up well ahead of the index. They were wrong, very wrong (p. 22). Over the nine years from 2008 to 2016, the S&P500 returned 85.4% (7.1% per annum). Only fund of funds C got somewhat close, with a return of 62.8% (5.6% per annum). The other four funds returned, in order: 28.3%, 8.7%, 7.5% and 2.9% (that is 2.8%, 0.9%, 0.8% and 0.3% per annum).
Result: Buffett’s valiant and solitary challenger, Mr. Ted Seides, co-manager, at the time, of Protégé Partners, played a very bad hand and made a fool of himself. But Buffett was lucky: he set out to prove ‘Bogle’s truth’ and observe index-like returns before fees, turning into underperformance after fees, but what he got was abysmal returns. Except perhaps for fund C, the gaping hole between the funds and the S&P500 had very little to do with fees. Buffett estimated that about 60% of all gains achieved by the five funds of funds went into the two fee layers. But even if fund D, returning a whopping 0.3% per year, had charged nothing, to select hedge funds that charged nothing, it would still have ended up well below the index. Same for funds A and E and, likely, for fund B.
To recap: when applied to mutual and hedge funds, ‘Bogle’s truth’ is not a logical necessity – as it is often portrayed to be – but is an empirical statement. Performance studies make it hard for outperformance to emerge, but beating the index in the long run is certainly no easy task, even for professional investors. Fees make it even harder – the higher the fees, the harder the task. However, while difficult to achieve and therefore rare to observe, long-term outperformance is not impossible – Buffett is the first to acknowledge it: he’s a living proof!
Why is it then that he interpreted his bet win against Seides as evidence of ‘Bogle’s truth’? Imagine he had called for five value stocks and got five duds. Would he have interpreted this as evidence of the impossibility of value investing? What’s the difference between picking stocks and picking funds? Why does Buffett consider the former a difficult but valiant endeavour while the latter an impossible waste of time?
One of the beauties of maths is that it is the same in every language. So you don’t need to know Italian to read the table on the second page of this article on this week’s Milano Finanza.
The Made in Italy Fund started in May last year and is up 43% since then.
Here are the main points of the article:
The Italian stock market is dominated by the largest 40 stocks included in the FTSE MIB index. The FTSE MIB and the FTSE Italia All Shares indices are virtually overlapping (first graph on page 1).
2/3 of the Italian market is concentrated in 4 sectors.
Small Caps – companies with a market cap of less than 1 billion euro – are 3/4 of the 320 quoted names, but represent only 6% of the value of the market.
Small Caps as a whole have underperformed Large Caps (second graph).
But quality Small Caps – those included in the Star segment of the market – have outclassed the MIB index (third graph).
However, the Star index is itself concentrated (table on page 3): the top 11 stocks in the index with a market cap above 1 billion (not 12: Yoox is no longer there) represent more than 60% of the index value (a company needs to be below 1 billion to get into the Star segment, but it is not necessarily taken out when it goes above).
Therefore, to invest in Italian Small Caps you need to know what you’re doing: you can’t just buy a Mid/Small Cap ETF – which is what a lot of people did in the first quarter of this year, after the launch of PIR accounts (similar to UK ISAs), taking the Lyxor FTSE Italia Mid Cap ETF from 42 to 469 million.
To this I would add: you can’t just buy a fund tracking the Star index either (there are a couple): to own a stock just because it is part of an index makes no sense – more on this in the next post.
Fisher’s Bias – focusing on a low FPR without regard to TPR – is the mirror image of the Confirmation Bias – focusing on a high TPR without regard to FPR. They both neglect the fact that what matters is the ratio of the two – the Likelihood Ratio. As a result, they both give rise to major inferential pitfalls.
The Confirmation Bias explains weird beliefs – the ancient Greeks’ reliance on divination and the Aztecs’ gruesome propitiation rites, as well as present-day lunacies, like psychics and other fake experts, superstitions, conspiracy theories and suicide bombers, alternative medicine and why people drink liquor made by soaking a dried tiger penis, with testicles attached, into a bottle of French cognac.
Fisher’s Bias has no less deleterious consequences. FPR<5% hence PP>95%: ‘We have tested our theory and found it significant at the 5% level. Therefore, there is only a 5% probability that we are wrong.’ This is the source of a deep and far-reaching misunderstanding of the role, scope and goals of what we call science.
‘Science says that…’, ‘Scientific evidence shows that…’, ‘It has been scientifically proven that…’: the view behind these common expressions is of science as a repository of established certainties. Science is seen as the means for the discovery of conclusive evidence or, equivalently, the accumulation of overwhelmingly confirmative evidence that leaves ‘no room for doubt or opposition‘. This is a treacherous misconception. While truth is its ultimate goal, science is not the preserve of certainty but quite the opposite: it is the realm of uncertainty, and its ethos is to be entirely comfortable with it.
Fisher’s Bias sparks and propagates the misconception. Evidence can lead to certainty, but it often doesn’t: the tug of war between confirmative and disconfirmative evidence does not always have a winner. By equating ‘significance’ with ‘certainty beyond reasonable doubt’, Fisher’s Bias encourages a naïve trust in the power of science and a credulous attitude towards any claim that manages to be portrayed as ‘scientific’. In addition, once deflated by the reality of scientific controversy, such trust can turn into its opposite: a sceptical view of science as a confusing and unreliable enterprise, propounding similarly ‘significant’ but contrasting claims, all portrayed as highly probable, but in fact – as John Ioannidis crudely puts it – mostly false.
Was Ronald Fisher subject to Fisher’s Bias? Apparently not: he stressed that ‘the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis’, immediately adding that ‘if an experiment can disprove the hypothesis’ it does not mean that it is ‘able to prove the opposite hypothesis.’ (The Design of Experiments, p. 16). However, the reasoning behind such conclusion is typically awkward. The opposite hypothesis (in our words, the hypothesis of interest) cannot be tested because it is ‘inexact’ – remember in the tea-tasting experiment the hypothesis is that the lady has some unspecified level of discerning ability. But – says Fisher – even if we were to make it exact, e.g. by testing perfect ability, ‘it is easy to see that this hypothesis could be disproved by a single failure, but could never be proved by any finite amount of experimentation’ (ibid.). Notice the confusion: saying that FPR<5% disproves the null hypothesis but FPR>5% does not prove it, Fisher is using the word ‘prove’ in two different ways. By ‘disproving’ the null he means considering it unlikely enough, but not certainly false. By ‘proving’ it, however, he does not mean considering it likely enough – which would be the correct symmetrical meaning – but he means considering it certainly true. That’s why he says that the null hypothesis as well as the opposite hypothesis are never proved. But this is plainly wrong and misleading. Prove/disprove is the same as accept/reject: it is a binary decision – doing one means not doing the other. So disproving the null hypothesis does mean proving the opposite hypothesis – not in the sense that it is certainly true, but in the correct sense that it is likely enough.
Here then is Fisher’s mistake. If H is the hypothesis of interest and not H the null hypothesis, FPR=P(E|not H) – the probability of the evidence (e.g. a perfect choice in the tea-tasting experiment) given that the hypothesis of interest is false (i.e. the lady has no ability and her perfect choice is a chance event). Then saying that a low FPR disproves the null hypothesis is the same as saying that a low P(E|not H) means a low P(not H|E). But since P(not H|E)=1–P(H|E)=1–PP, then a low FPR means a high PP, as in: FPR<5% hence PP>95%.
Hence yes: Ronald Fisher was subject to Fisher’s Bias. Despite his guarded and ambiguous wording, he did implicitly believe that 5% significance means accepting the hypothesis of interest. We have seen why: prior indifference. Fisher would not contemplate any value of BR other than 50%, i.e. BO=1, hence PO=LR=TPR/FPR. Starting with prior indifference, all is needed for PP=1-FPR is error symmetry.
Fisher’s Bias gives rise to invalid inferences, misplaced expectations and wrong attitudes. By setting FPR in its proper context, our Power surface brings much needed clarity on the subject, including, as we have seen, Ioannidis’s brash claim. Let’s now take a closer look at it.
Remember Ioannidis’s main point: published research findings are skewed towards acceptance of the hypothesis of interest based on the 5% significance criterion. Fisher’s bias favours the publication of ‘significant’ yet unlikely research findings, while ‘insignificant’ results remain unpublished. As we have seen, however, this happens for a good reason: it is unrealistic to expect a balance, as neither researchers nor editors are interested in publishing rejections of unlikely hypotheses. What makes a research finding interesting is not whether it is true or false, but whether it confirms an unlikely hypothesis or disconfirms a likely one.
Take for instance Table 4 in Ioannidis’s paper (p. 0700), which shows nine examples of research claims as combinations of TPR, BO and PP, given FPR=5%. Remember the match between our and Ioannidis’s notation: FPR=α, TPR=1-β (FNR=β), BO=R and PP=PPV. For the moment, let’s just take the first two columns and leave the rest aside:
So for example the first claim has TPR=80%, hence LR=16 and, under prior indifference (BO=1, BR=50%), PO=16 and therefore PP=94.1%. In the second, we have TPR=95%, hence LR=19, BO=2 and BR=2/3, hence PO=38 and therefore PP=97.4%. And so on. As we can see, four claims have PP>50%: there is at least a preponderance of evidence that they are true. Indeed the first three claims are true even under a higher standard, with claim 2 in particular reaching beyond reasonable doubt, as it starts from an already high prior, which gets further increased by powerful confirmative evidence. In 3, powerful evidence manages to update a sceptical 25% prior to an 84% posterior, and in 6 to update an even more strongly sceptical prior to a posterior above 50%. The other five claims, on the other hand, have PP<50%: they are false even under the lowest standard of proof, with 8 and 9 in particular standing out as extremely unlikely. Notice however that in all nine cases we have LR>1: evidence is, in various degrees, confirmative, i.e. it increases prior odds to a higher level. Even in the last two cases, where evidence is not very powerful and BR is a tiny 1/1000 – just like in our child footballer story – LR=4 quadruples it to 1/250. The posterior is still very small – the claims remain very unlikely – but this is the crucial point: they are a bit less unlikely than before. That’s what makes a research finding interesting: not a high PP but a LR significantly different from 1. All nine claims in the table – true and false – are interesting and, as such, worth publication. This includes claim 2, where further confirmative evidence brings virtual certainty to an already strong consensus. But notice that in this case disconfirmative evidence, reducing prior odds and casting doubt on such consensus, would have attracted even more interest. Just as we should expect to see a preponderance of studies confirming unlikely hypotheses, we should expect to see the same imbalance in favour of studies disconfirming likely hypotheses. It is the scientific enterprise at work.
Let’s now look at Ioannidis’s auxiliary point: the preponderance of ‘significant’ findings is reinforced by a portion of studies where significance is obtained through data manipulation. He defines bias u as ‘the proportion of probed analyses that would not have been “research findings”, but nevertheless end up presented and reported as such, because of bias’ (p. 0700).
How does Ioannidis’s bias modify his main point? This is shown in the following table, where PP* coincides with PPV in his Table 4:
Priors are the same, but now bias u causes a substantial reduction in LR and therefore in PP. For instance, in the first case u=0.10 means that 10% of research findings supporting the claim have been doctored into 5% significance through some form of data tampering. As a result, LR is lowered from 16 to 5.7 and PP from 94.1% to 85%. So in this case the effect of bias is noticeable but not determinant. The same is true in the second case, where a stronger bias causes a big reduction in LR from 19 to 2.9, but again not enough to meaningfully alter the resulting PP. In the third case, however, an even stronger bias does the trick: it reduces LR from 16 to 2 and PP from 84.2% all the way down to 40.6%. While the real PP is below 50%, a 40% bias makes it appear well above: the claim looks true but is in fact false. Same for 6, while the other five claims, which would be false even without bias, are even more so with bias – their LR reduced to near 1 and their PP consequently remaining close to their low BR.
This sounds a bit confusing so let’s restate it, taking case 3 as an example. The claim starts with a 25% prior – it is not a well established claim and would therefore do well with some confirmative evidence. The appearing evidence is quite strong: FPR=5% and TPR=80%, giving LR=16, which elevates PP to 84.1%. But in reality the evidence is not as strong: 40% of the findings accepting the claim have been squeezed into 5% significance through data fiddling. Therefore the real LR – the one that would have emerged without data alterations – is much lower, and so is the real PP resulting from it: the claim appears true but is false. So is claim 6, thus bringing the total of false claims from five to seven – indeed most of them.
How does bias u alter LR? In Ioannidis’s model, it does so mainly by turning FPR into FPR*=FPR+(1-FPR)u – see Table 2 in the paper (p. 0697). FPR* is a positive linear function of u, with intercept FPR and slope 1-FPR, which, since FPR=5%, is a very steep 0.95. In case 3, for example, u produces a large increase of FPR from 5% to 43%. In addition, u turns TPR into TPR*=TPR+(1-TPR)u, which is also a positive linear function of u, with intercept TPR and slope 1-TPR which, since the TPR of confirmative evidence is higher than FPR, is flatter. In case 3 the slope is 0.2, so u increases TPR from 80% to 88%. The combined effect, as we have seen, is a much lower LR*=TPR*/FPR*, going down from 16 to 2.
I will post a separate note about this model, but the point here is that, while Ioannidis’s bias increases the proportion of false claims, it is not the main reason why most of them are false. Five of the nine claims in his Table 4 would be false even without bias.
In summary, by confusing significance with virtual certainty, Fisher’s Bias encourages Ioannidis’s bias (I write it with a small b because it has no cognitive value: it is just more or less intentional cheating). But Ioannidis’s bias does not explain ‘Why Most Research Findings Are False’. The main reason is that many of them test unlikely hypotheses, and therefore, unless they manage to present extraordinary or conclusive evidence, their PP turns out to be lower and often much lower than 50%. But this doesn’t make them worthless or unreliable – as the paper’s title obliquely suggests. As long as they are not cheating, researchers are doing their job: trying to confirm unlikely hypotheses. At the same time, however, they have another important responsibility: to warn the reader against Fisher’s Bias, by explicitly clarifying that, no matter how ‘significant’ and impressive their results may appear, they are not ‘scientific revelations’ but tentative discoveries in need of further evidence.
Armed with our Power surface, let’s revisit John Ioannidis’s claim according to which ‘Most Published Research Findings Are False’.
Ioannidis’s target is the immeasurable confusion generated by the widespread mistake of interpreting Fisher’s 5% statistical significance as implying a high probability that the hypothesis of interest is true. FPR<5%, hence PP>95%. As we have seen, this is far from being the general case: TPR=PP if and only if BO=FPR/FNR, which under prior indifference requires error symmetry.
Fisher’s 5% significance is neither a sufficient not a necessary condition for accepting the hypothesis of interest. Besides FPR, acceptance and rejection depend on BR, PP and TPR. Given FPR=5%, all combinations of the three variables lying on or above the curved surface indicate acceptance. But, contrary to Fisher’s criterion, combinations below the surface indicate rejection. The same is true for values of FPR below 5%, which are even more ‘significant’ according to Fisher’s criterion. These widen the curved surface and shrink the roof, thus enlarging the scope for acceptance, but may still indicate rejection for low priors and high standards of proof, if TPR power is not or cannot be high enough. On the other hand, TPR values above 5%, which according to Fisher’s criterion are ‘not significant’ and therefore imply unqualified rejection, reduce the curved surface and expand the roof, thus enlarging the scope for rejection, but may still indicate acceptance for higher priors and lower standards of proof, provided TPR power is high enough. Here are pictures for FPR=2.5% and 10%:
So let’s go back to where we started. We want to know whether a certain claim is true or false. But now, rather than seeing it from the perspective of a statistician who wants to test the claim, let’s see it from the perspective of a layman who wants to know if the claim has been tested and whether the evidence has converged towards a consensus, one way or the other.
For example: ‘Is it possible to tell whether milk has been poured before or after hot water by just tasting a cup of tea?’ (bear with me please). We google the question and let’s imagine we get ten papers delving into this vital issue. The first and earliest is by none other than the illustrious Ronald Fisher, who performed it on the algologist Muriel Bristol and, on finding that she made no mistakes with 8 cups – an event that has only 1/70 probability of being the product of chance, i.e. a p-value of 1.4%, much lower than required by his significance criterion – concluded, against his initial scepticism, that ‘Yes, it is possible’. That’s it? Well, no. The second paper describes an identical test performed on the very same Ms Bristol three months later, where she made 1 mistake – an event that has a 17/70 probability of being a chance event, i.e. a p-value of 24.3%, much larger than the 5% limit allowed by Fisher’s criterion. Hence the author rejected Professor Fisher’s earlier claim about the lady’s tea-tasting ability. On to the third paper, where Ms Bristol was given 12 cups and made 1 mistake, an event with a 37/924=4% probability of being random, once again below Fisher’s significance criterion. And so on with the other papers, each one with its own set up, its own numbers and its own conclusions.
It is tempting at this point for the layman to throw up his arms in despair and execrate the so-called experts for being unable to give a uniform answer to such a simple question. But he would be entirely wrong. The evidential tug of war between confirmative and disconfirmative evidence is the very essence of science. It is up to us to update our prior beliefs through multiplicative accumulation of evidence and to accept or reject a claim according to our standard of proof.
If anything, the problem is the opposite: not too much disagreement but – and this is Ioannidis’s main point – too little. Evidence accumulation presupposes that we are able to collect the full spectrum, or at least a rich, unbiased sample of all the available evidence. But we hardly ever do. The evidence we see is what reaches publication, and publication is naturally skewed towards ‘significant’ findings. A researcher who is trying to prove a point will only seek publication if he thinks he has gathered enough evidence to support it. Who wants to publish a paper about a theory only to announce that he has got inadequate evidence for it? And even if he tried, what academic journal would publish it?
As a result, available evidence tends to be biased towards acceptance. And since acceptance is still widely based on Fisher’s criterion, most published papers present FPR<5%, while those with FPR>5% remain unpublished and unavailable. To add insult to injury, in order to reach publication some studies get squeezed into significance through more or less malevolent data manipulation. It is what Ioannidis calls bias: ‘the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced’. (p. 0697).
This dramatically alters the evidential tug of war. It is as if, when looking into the milk-tea question, we would only find Fisher’s paper and others accepting the lady’s ability – including some conveniently glossing over a mistake or two – and none on the side of rejection. We would then be inclined to conclude that the experts agree and would be tempted to go along with them – perhaps disseminating and reinforcing the bias through our own devices.
How big a problem is this? Ioannidis clearly thinks it is huge – hence the dramatic title of his paper, enough to despair not just about some experts but about the entire academic community and its scientific enterprise. Is it this bad? Are we really swimming in a sea of false claims?
Let’s take a better look. First, we need to specify what we mean by true and false. As we know, it depends on the standard of proof, which in turn depends on utility preferences. What is the required standard of proof for accepting a research finding as true? Going by the wrongful interpretation of Fisher’s 5% significance criterion, it is PP>95%. But this is not only a mistake: it is the premise behind an insidious misrepresentation of the very ethos of scientific research. Obviously, truth and certainty are the ultimate goals of any claim. But the value of a research finding is not in how close it is to the truth, but in how closer it gets us to the truth. In our framework, it is not in PP but in LR.
The goal of scientific research is finding and presenting evidence that confirms or disconfirms a specific hypothesis. How much more (or less) likely is the evidence if the hypothesis is true than if it is false? The value of evidence is in its distance from the unconfirmative middle point LR=1. A study is informative, hence worth publication, if the evidence it presents has a Likelihood Ratio significantly different from 1, and is therefore a valuable factor in the multiplicative accumulation of knowledge. But a high (or low) LR is not the same as a high (or low) PP. They only coincide under prior indifference, where, more precisely, PO=LR, i.e. PP=LR/(1+LR). So, for example, if LR=19 – the evidence is 19 times more likely if the hypothesis is true than if it is false – then PP=19/20=95%. But, as we know very well, prior indifference is not a given: it is a starting assumption which, depending on the circumstances, may or may not be valid. BO – Ioannidis calls it R, ‘the ratio of “true relationships” to “no relationships”‘ (p. 0696) – gives the pre-study odds of the investigated hypothesis being true. It can be high, if the hypothesis has been tested before and is already regarded as likely true, or low, if it is a novel hypothesis that has never been tested and, if true, would be an unexpected discovery. In the first case, LR>1 is a further confirmation that the hypothesis should be accepted as true – a useful but hardly noteworthy exercise that just reinforces what is already known. On the other hand, LR<1 is much more interesting, as it runs against the established consensus. LR could be as low as to convert a high BO into a low PO, thus rejecting a previously accepted hypothesis. But not necessarily: while lower than BO, PO could remain high, thus keeping the hypothesis true, while casting some doubts on it and prodding further investigation. In the second case, LR<1 is a further confirmation that the hypothesis should be rejected as false. On the other hand, LR>1 increases the probability of an unlikely hypothesis. It could be as high as to convert a low BO into a high PO, thus accepting what was previously an unfounded conjecture. But not necessarily: while higher than BO, PO could remain low, thus keeping the hypothesis false, but at the same time stimulating more research.
Such distinctions get lost in Ioannidis’s sweeping claim. True, SUTCing priors and neglecting a low BO can lead to mistakenly accepting hypotheses on the basis of evidence that, while confirmative, leaves PO well below any acceptance level. The mistake is exacerbated by Fisher’s Bias – confusing significance (a low FPR) with confirmation (a high PP) – and by Ioannidis’s bias – squeezing FPR below 5% through data alteration. FPR<5% does not mean PP>95% or even PP>50%. As shown in our Power surface, for any standard of proof, the lower is BO the higher is the required TPR for any level of FPR. Starting from a low BO, accepting the hypothesis requires very powerful evidence. Without it, acceptance is a false claim. Moreover, published acceptances – rightful or wrongful – are not adequately counterbalanced by rejections, which remain largely unpublished. This however occurs for an entirely legitimate reason: there is little interest in rejecting an already unlikely hypothesis. Interesting research is what runs counter prior consensus. Starting from a low BO, any confirmative evidence is interesting, even when it is not powerful enough to turn a low BO into a high PO. Making an unlikely hypothesis a bit less unlikely is interesting enough, and is worth publication. But – here Ioannidis is right – it should not be confused with acceptance. Likewise, there is little interest in confirmative evidence when BO is already high. What is interesting in this case is disconfirmative evidence, again even when it is not powerful enough to reject the hypothesis by turning a high BO into a low PO. Making a likely hypothesis a bit less likely is interesting enough. But it should not be confused with rejection.