Was Nate Silver the Most Accurate 2012 Election Pundit?

Nate-Silver

Over at Center for Applied Rationality, Luke with Gwern Branwen:

Obama may have won the presidency on election night, but pundit Nate Silver won the internet by correctly predicting presidential race outcomes in every state plus the District of Columbia — a perfect 51/51 score.

Now the interwebs are abuzz with Nate Silver praise. Gawker proclaims him “America’s Chief Wizard.” Gizmodo humorously offers 25 Nate Silver Facts (sample: “Nate Silver’s computer has no “backspace” button; Nate Silver doesn’t make mistakes”). IsNateSilverAWitch.com concludes: “Probably.”

Was Silver simply lucky? Probably not. In the 2008 elections he scored 50/51, missing only Indiana, which went to Obama by a mere 1%.

How does he do it? In his CFAR-recommended book The Signal and the Noise: Why So Many Predictions Fail, but Some Don’t, Silver reveals that his “secret” is bothering to obey the laws of probability theory rather than predicting things from his gut.

An understanding of probability can help us see what Silver’s critics got wrong. For example, Brandon Gaylord wrote:

Silver… confuses his polling averages with an opaque weighting process… and the inclusion of all polls no matter how insignificant – or wrong – they may be. For example, the poll that recently helped put Obama ahead in Virginia was an Old Dominion poll that showed Obama up by seven points. The only problem is that the poll was in the field for 28 days – half of which were before the first Presidential debate. Granted, Silver gave it his weakest weighting, but its inclusion in his model is baffling.

Actually, what Silver did is exactly right according to probability theory. Each state poll provided some evidence about who would win that state, but some polls — for example those which had been accurate in the past — provided more evidence than others. Even the Old Dominion poll provided some evidence, just not very much — which is why Silver gave it “his weakest weighting.” Silver’s “opaque weighting process” was really just a straightforward application of probability theory. (In particular, it was an application of Bayes’ Theorem.)