A fascinating tale, from the book Dance With Chance:
During the 1970s … It bugged the professor greatly that [business] practitioners were making these predictions without recourse to the latest, most theoretically sophisticated methods developed by statisticians like himself. Instead, they preferred simpler techniques which – they said – allowed them to explain their forecasts more easily to senior management. The outraged author … embarked on a research project that would demonstrate the superiority of the latest statistical techniques. …
The professor and his research assistant collected [111] sets of economic and business data over time from a wide range of economic and business sources. … Each series was split into two parts: earlier data and later data. The researchers pretended that the later part hadn’t happened yet and proceeded to fit various statistical techniques, both simple and statistically sophisticated, to the earlier data. Treating this earlier data as “the past,” they then used each of the techniques to predict “the future,” whereupon they sat back and started to compare their “predictions” with what had actually happened. Horror of horrors, the practitioners’ simple, boss-pleasing techniques turned out to be more accurate than the statisticians’ clever, statistically sophisticated methods. … One of the simplest methods, known as “single exponential smoothing,” in fact appeared to be one of the most accurate. …
The professor submitted a paper on his surprising and important findings to a prestigious, learned journal … The paper was rejected on the grounds that the results didn’t square with statistical theory! Fortunately, another journal did decide to publish the paper, but they insisted on including comments from the leading statisticians of the day. The experts were not impressed. Among the many criticisms was a suggestion that the poor performance of the sophisticated methods was due to the inability of the author to apply them properly. …
[Next] time around they collected and made forecasts for [1001] sets of data … from the worlds of business, economics and finance. … But there was a new and cunning plan. Instead of doing all the work himself, the author asked the most renowned experts in their fields — both academics and practitioners — to forecast the 1,001 series. All in all, fourteen experts participated and compared the accuracy of seventeen methods. … The findings were exactly the same as in his previous research. … The only difference was that there were no experts to criticize, as most of the world’s leading authorities had taken part.
That was way back in 1982. Since then, the author has organized two further forecasting “competitions” to keep pace with new developments and eliminate the new criticisms that academics have ingeniously managed to concoct. The latest findings, published in 2000, consisted of 3,003 economic series, an expanding range of statistical methods, and a growing army of experts. However, the basic conclusion — supported by many other academic studies over the past three decades — remains steadfast.
I can confirm that the models at work in the NetflixPrize are ridiculously simple. Rather than complex bayesian statistical formulations and multilevel models, you have early stopping with a little ridge regression. I was humored to see the earlier reference to single exponential smoothing. I had just used something similar to that to great effect.
Though it's not clear to me how competitive the contest was. On multiple occasions, what were essentially amateurs scaled to the top 10 within a few months of beginning their efforts. I have a model that beats the best published ones, and I don't really have any clue what I'm doing.
Same comment as Phil Goetz. Empirically on the high-stakes ultra-competitive Netflix Prize, the best performance was not put forth by simple models but by combining many models ranging from simple to complex. But conversely, most statisticians who tried their hand at the Netflix Prize did much more poorly than the best performers. We may be looking at inadequate incentives, inadequate controls for overfitting, prestigious folk who are not the best performers, prestigious folk who overuse complex and impressive models with inadequate checking, or it may just be an empirical fact (though it would surprise me and I would have expected the opposite) that the machine learning community has its act together and the statistical learning community doesn't.