Advertisements
Feeds:
Posts
Comments

Posts Tagged ‘Quant’

One of the most interesting ideas suggested by Ian Ayers’s book Super Crunchers is the role of humans in the implementation of a quantitative investment strategy. As we know from Andrew McAfee’s Harvard Business Review blog post, The Future of Decision Making: Less Intuition, More Evidence, and James Montier’s 2006 research report, Painting By Numbers: An Ode To Quant, in context after context, simple statistical models outperform expert judgements. Further, decision makers who, when provided with the output of the simple statistical model, wave off the model’s predictions tend to make poorer decisions than the model. The reason? We are overconfident in our abilities. We tend to think that restraints are useful for the other guy but not for us. Ayres provides a great example in his article,  How computers routed the experts:

To cede complete decision-making power to lock up a human to a statistical algorithm is in many ways unthinkable.

The problem is that discretionary escape hatches have costs too. In 1961, the Mercury astronauts insisted on a literal escape hatch. They balked at the idea of being bolted inside a capsule that could only be opened from the outside. They demanded discretion. However, it was discretion that gave Liberty Bell 7 astronaut Gus Grissom the opportunity to panic upon splashdown. In Tom Wolfe’s memorable account, The Right Stuff, Grissom “screwed the pooch” when he prematurely blew the 70 explosive bolts securing the hatch before the Navy SEALs were able to secure floats. The space capsule sank and Grissom nearly drowned.

The natural question, then, is, “If humans can’t even be trusted with a small amount of discretion, what role do they play in the quantitative investment scenario?”

What does all this mean for human endeavour? If we care about getting the best decisions overall, there are many contexts where we need to relegate experts to supporting roles in the decision-making process. We, like the Mercury astronauts, probably can’t tolerate a system that forgoes any possibility of human override, but at a minimum, we should keep track of how experts fare when they wave off the suggestions of the formulas. And we should try to limit our own discretion to places where we do better than machines.

This is in many ways a depressing story for the role of flesh-and-blood people in making decisions. It looks like a world where human discretion is sharply constrained, where humans and their decisions are controlled by the output of machines. What, if anything, in the process of prediction can we humans do better than the machines?

The answer is that we formulate the factors to be tested. We hypothesise. We dream.

The most important thing left to humans is to use our minds and our intuition to guess at what variables should and should not be included in statistical analysis. A statistical regression can tell us the weights to place upon various factors (and simultaneously tell us how, precisely, it was able to estimate these weights). Humans, however, are crucially needed to generate the hypotheses about what causes what. The regressions can test whether there is a causal effect and estimate the size of the causal impact, but somebody (some body, some human) needs to specify the test itself.

So the machines still need us. Humans are crucial not only in deciding what to test, but also in collecting and, at times, creating the data. Radiologists provide important assessments of tissue anomalies that are then plugged into the statistical formulas. The same goes for parole officials who judge subjectively the rehabilitative success of particular inmates. In the new world of database decision-making, these assessments are merely inputs for a formula, and it is statistics – and not experts – that determine how much weight is placed on the assessments.

In investment terms, this means honing the strategy. LSV Asset Management, described by James Montier as being a “fairly normal” quantitative fund (as opposed to being “rocket scientist uber-geeks”) and authors of the landmark Contrarian Investment, Extrapolation and Risk paper, describe the ongoing role of the humans in its funds as follows (emphasis mine):

A proprietary investment model is used to rank a universe of stocks based on a variety of factors we believe to be predictive of future stock returns. The process is continuously refined and enhanced by our investment team although the basic philosophy has never changed – a combination of value and momentum factors.

The blasphemy about momentum aside, the refinement and enhancement process sounds like fun to me.

Advertisements

Read Full Post »

I’ve just finished Ian Ayres’s book Super Crunchers, which I found via Andrew McAfee’s Harvard Business Review blog post, The Future of Decision Making: Less Intuition, More Evidence (discussed in Intuition and the quantitative value investor). Super Crunchers is a more full version of James Montier’s 2006 research report, Painting By Numbers: An Ode To Quant, providing several more anecdotes in support of Montier’s thesis that simple statistical models outperform the best judgements of experts. McAfee discusses one such example in his blog post:

Princeton economist Orley Ashenfleter predicts Bordeaux wine quality (and hence eventual price) using a model he developed that takes into account winter and harvest rainfall and growing season temperature. Massively influential wine critic Robert Parker has called Ashenfleter an “absolute total sham” and his approach “so absurd as to be laughable.” But as Ian Ayres recounts in his great book Supercrunchers, Ashenfelter was right and Parker wrong about the ‘86 vintage, and the way-out-on-a-limb predictions Ashenfelter made about the sublime quality of the ‘89 and ‘90 wines turned out to be spot on.

Ayers provides a number of stories not covered in Montier’s article, from Don Berwick’s “100,000 lives” campaign, Epagogix’s hit movie predictor, Offermatica’s automated web ad serving software, Continental Airlines’s complaint process, and a statistical algorithm for predicting the outcome of Supreme Court decisions. While seemingly unrelated, all are prediction engines based on a quantitative analysis of subjective or qualitative factors.

The Supreme Court decision prediction algorithm is particularly interesting to me, not because I am an ex-lawyer, but because the language of law is language, not often plain, and seemingly irreducible to quantitative analysis. (I believe this is true also of value investment, although numbers play a larger role in that realm, and therefore it lends itself more readily to quantitative analysis.) According to Andrew Martin and Kevin Quinn, the authors of Competing Approaches to Predicting Supreme Court Decision Making, if they are provided with just a few variables concerning the politics of a case, they can predict how the US Supreme Court justices will vote.

Ayers discussed the operation of Martin and Quinn’s Supreme Court decision prediction algorithm in How computers routed the experts:

Analysing historical data from 628 cases previously decided by the nine Supreme Court justices at the time, and taking into account six factors, including the circuit court of origin and the ideological direction of that lower court’s ruling, Martin and Quinn developed simple flowcharts that best predicted the votes of the individual justices. For example, they predicted that if a lower court decision was considered “liberal”, Justice Sandra Day O’Connor would vote to reverse it. If the decision was deemed “conservative”, on the other hand, and came from the 2nd, 3rd or Washington DC circuit courts or the Federal circuit, she would vote to affirm.

Ted Ruger, a law professor at the University of Pennsylvania, approached Martin and Quinn at a seminar and suggested that they test the performance of the algorithm against a group of legal experts:

As the men talked, they decided to run a horse race, to create “a friendly interdisciplinary competition” to compare the accuracy of two different ways to predict the outcome of Supreme Court cases. In one corner stood the predictions of the political scientists and their flow charts, and in the other, the opinions of 83 legal experts – esteemed law professors, practitioners and pundits who would be called upon to predict the justices’ votes for cases in their areas of expertise. The assignment was to predict in advance the votes of the individual justices for every case that was argued in the Supreme Court’s 2002 term.

The outcome?

The experts lost. For every argued case during the 2002 term, the model predicted 75 per cent of the court’s affirm/reverse results correctly, while the legal experts collectively got only 59.1 per cent right. The computer was particularly effective at predicting the crucial swing votes of Justices O’Connor and Anthony Kennedy. The model predicted O’Connor’s vote correctly 70 per cent of the time while the experts’ success rate was only 61 per cent.

Ayers provides a copy of the flowchart in Super Crunchers. Its simplicity is astonishing: there are only 6 decision points, and none of the relate to the content of the matter. Ayers posits the obvious question:

How can it be that an incredibly stripped-down statistical model outpredicted legal experts with access to detailed information about the cases? Is this result just some statistical anomaly? Does it have to do with idiosyncrasies or the arrogance of the legal profession? The short answer is that Ruger’s test is representative of a much wider phenomenon. Since the 1950s, social scientists have been comparing the predictive accuracies of number crunchers and traditional experts – and finding that statistical models consistently outpredict experts. But now that revelation has become a revolution in which companies, investors and policymakers use analysis of huge datasets to discover empirical correlations between seemingly unrelated things.

Perhaps I’m naive, but, for me, one of the really surprising implications arising from Martin and Quinn’s model is that the merits of the legal arguments before the court are largely irrelevant to the decision rendered, and it is Ayres’s “seemingly unrelated things” that affect the outcome most. Ayres puts his finger on the point at issue:

The test would implicate some of the most basic questions of what law is. In 1881, Justice Oliver Wendell Holmes created the idea of legal positivism by announcing: “The life of the law has not been logic; it has been experience.” For him, the law was nothing more than “a prediction of what judges in fact will do”. He rejected the view of Harvard’s dean at the time, Christopher Columbus Langdell, who said that “law is a science, and … all the available materials of that science are contained in printed books”.

Martin and Quinn’s model shows Justice Oliver Wendell Holmes to be right. Law is nothing more than a prediction of what judges will in fact do. How is this relevant to a deep value investing site? Deep value investing is nothing more than a prediction of what companies and stocks will in fact do. If the relationship holds, seemingly unrelated things will affect the performance of stock prices. Part of the raison d’etre of this site is to determine what those things are. To quantify the qualitative factors affecting deep value stock price performance.

Read Full Post »

In his 2006 research report Painting By Numbers: An Ode To Quant (via The Hedge Fund Journal) James Montier presents a compelling argument for a quantitative approach to investing. Montier’s thesis is that simple statistical or quantitative models consistently outperform expert judgements. This phenomenon continues even when the experts are provided with the models’ predictions. Montier argues that the models outperform because humans are overconfident, biased, and unable or unwilling to change.

Montier makes his argument via a series of examples drawn from fields other than investment. The first example he gives, which he describes as a “classic in the field” and which succinctly demonstrates the two important elements of his thesis, is the diagnosis of patients as either neurotic or psychotic. The distinction is as follows: a psychotic patient “has lost touch with the external world” whereas a neurotic patient “is in touch with the external world but suffering from internal emotional distress, which may be immobilising.” According to Montier, the standard test to distinguish between neurosis or psychosis is the Minnesota Multiphasic Personality Inventory or MMPI:

In 1968, Lewis Goldberg1 obtained access to more than 1000 patients’ MMPI test responses and final diagnoses as neurotic or psychotic. He developed a simple statistical formula, based on 10 MMPI scores, to predict the final diagnosis. His model was roughly 70% accurate when applied out of sample. Goldberg then gave MMPI scores to experienced and inexperienced clinical psychologists and asked them to diagnose the patient. As Fig.1 shows, the simple quant rule significantly outperformed even the best of the psychologists.

Even when the results of the rules’ predictions were made available to the psychologists, they still underperformed the model. This is a very important point: much as we all like to think we can add something to the quant model output, the truth is that very often quant models represent a ceiling in performance (from which we detract) rather than a floor (to which we can add).

The MMPI example illustrates the two important points of Montier’s thesis:

  1. The simple statistical model outperforms the judgements of the best experts.
  2. The simple statistical model outperforms the judgements of the best experts, even when those experts are given access to the simple statistical model.

Montier goes on to give diverse examples of the application of his theory, ranging from the detection of brain damage, the interview process to admit students to university, the likelihood of a criminal to re-offend, the selection of “good” and “bad” vintages of Bordeaux wine, and the buying decisions of purchasing managers. He then discusses some “meta-analysis” of studies to demonstrate that “the range of evidence I’ve presented here is not somehow a biased selection designed to prove my point:”

Grove et al consider an impressive 136 studies of simple quant models versus human judgements. The range of studies covered areas as diverse as criminal recidivism to occupational choice, diagnosis of heart attacks to academic performance. Across these studies 64 clearly favoured the model, 64 showed approximately the same result between the model and human judgement, and a mere 8 studies found in favour of human judgements. All of these eight shared one trait in common; the humans had more information than the quant models. If the quant models had the same information it is highly likely they would have outperformed.

As Paul Meehl (one of the founding fathers of the importance of quant models versus human judgements) wrote: There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one… predicting everything from the outcomes of football games to the diagnosis of liver disease and when you can hardly come up with a half a dozen studies showing even a weak tendencyin favour of the clinician, it is time to draw a practical conclusion.

Why not investing?

Montier says that, within the world of investing, the quantitative approach is “far from common,” and, where it does exist, the practitioners tend to be “rocket scientist uber-geeks,” the implication being that they would not employ a simple model. So why isn’t quantitative investing more common? According to Montier, the “most likely answer is overconfidence.”

We all think that we know better than simple models. The key to the quant model’s performance is that it has a known error rate while our error rates are unknown.

The most common response to these findings is to argue that surely a fund manager should be able to use quant as an input, with the flexibility to override the model when required. However, as mentioned above, the evidence suggests that quant models tend to act as a ceiling rather than a floor for our behaviour. Additionally there is plenty of evidence to suggest that we tend to overweight our own opinions and experiences against statistical evidence.

Montier provides the following example is support of his contention that we tend to prefer our own views to statistical evidence:

For instance, Yaniv and Kleinberger11 have a clever experiment based on general knowledge questions such as: In which year were the Dead Sea scrolls discovered?

Participants are asked to give a point estimate and a 95% confidence interval. Having done this they are then presented with an advisor’s suggested answer, and asked for their final best estimate and rate of estimates. Fig.7 shows the average mean absolute error in years for the original answer and the final answer. The final answer is more accurate than the initial guess.

The most logical way of combining your view with that of the advisor is to give equal weight to each answer. However, participants were not doing this (they would have been even more accurate if they had done so). Instead they were putting a 71% weight on their own answer. In over half the trials the weight on their own view was actually 90-100%! This represents egocentric discounting – the weighing of one’s own opinions as much more important than another’s view.

Similarly, Simonsohn et al12 showed that in a series of experiments direct experience is frequently much more heavily weighted than general experience, even if the information is equally relevant and objective. They note, “If people use their direct experience to assess the likelihood of events, they are likely to overweight the importance of unlikely events that have occurred to them, and to underestimate the importance of those that have not”. In fact, in one of their experiments, Simonsohn et al found that personal experience was weighted twice as heavily as vicarious experience! This is an uncannily close estimate to that obtained by Yaniv and Kleinberger in an entirely different setting.

It is worth noting that Montier identifies LSV Asset Management and Fuller & Thaler Asset Management as being “fairly normal” quantitative funds (as opposed to being “rocket scientist uber-geeks”) with “admirable track records in terms of outperformance.” You might recognize the names: “LSV” stands for Lakonishok, Shleifer, and Vishny, authors of the landmark Contrarian Investment, Extrapolation and Risk paper, and the “Thaler” in Fuller & Thaler is Richard H. Thaler, co-author of Further Evidence on Investor Overreaction and Stock Market Seasonality, both papers I’m wont to cite. I’m not entirely sure what strategies LSV and Fuller & Thaler pursue, wrapped as they are in the cloaks of “behavioural finance,” but judging from those two papers, I’d say it’s a fair bet that they are both pursuing value-based strategies.

It might be a while before we see a purely quantitative value fund, or at least a fund that acknowledges that it is one. As Montier notes:

We find it ‘easy’ to understand the idea of analysts searching for value, and fund managers rooting out hidden opportunities. However, selling a quant model will be much harder. The term ‘black box’ will be bandied around in a highly pejorative way. Consultants may question why they are employing you at all, if ‘all’ you do is turn up and run the model and then walk away again.

It is for reasons like these that quant investing is likely to remain a fringe activity, no matter how successful it may be.

Montier’s now at GMO, and has produced a new research report called Ten Lessons (Not?) Learnt (via Trader’s Narrative).

Read Full Post »

%d bloggers like this: