by Greg Mayer

As both an undergraduate and graduate student, I was fortunate to be taught statistics by some of the best statistical minds in biology: Robert Sokal and Jim Rohlf at Stony Brook, and Dick Lewontin at Harvard. All three have influenced biostatistics enormously, not just through their many students, but also through writing textbooks, the former two coauthoring the still essential *Biometry* (4th edition, 2012), the latter joining the great G.G. Simpson and Anne Roe in revising the seminal *Quantitative Zoology* (2nd edition, 1960). In my first year of graduate school, while on a two month field course in Costa Rica, other students, knowing I’d already “done” Sokal and Rohlf, would consult with me on statistical questions. Towards the end of the interview that got me the position I currently hold, I was casually asked if I could teach a course in “quantitative biology”, to which I replied “yes” (the position had been advertised for evolution and vertebrate zoology). The course, now entitled biostatistics, has wound up being the only class I have taught every academic year.

I mention these things to establish my cred as, if not a maven, at least an aficionado of statistics. It was thus with some professional interest that I (along with others) noted that towards the end of the recent presidential election campaign, pollsters and poll analysts came in for a lot of flak. Polling is very much a statistical activity, the chief aim being, in the technical jargon, to estimate a proportion (i.e., what percent of people or voters support some candidate or hold some opinion), and to also estimate the *uncertainty* of the estimate (i.e., how good or bad the estimate is, in the sense of being close to the “truth” now [which is defined as the proportion you would get if you exhaustively surveyed the entire population], and also as prediction of a future proportion). The uncertainties of these estimates can be reduced by increasing the sample size, and thus poll aggregators, such as those at Pollster and Real Clear Politics, will usually have the best estimates.

In the last weeks before the election, a large swath of the punditry declared that polls, and especially the aggregators, were all wrong. Many prominent Republicans predicted a landslide or near-landslide win for Mitt Romney. The polls, it was claimed, had a pro-Obama bias that skewed their results, and a website called UnSkewed Polls, to ‘correct’ the skew, was even created. Nate Silver, the sabermetrician turned polling aggregator of 538.com (at the *New York Times*), was the subject of particular opprobrium. Joe Scarborough of MSNBC had this to say:

Nate Silver says this is a 73.6 percent chance that the president is going to win? Nobody in that campaign thinks they have a 73 percent chance — they think they have a 50.1 percent chance of winning. And you talk to the Romney people, it’s the same thing. Both sides understand that it is close, and it could go either way. And anybody that thinks that this race is anything but a tossup right now is such an ideologue, they should be kept away from typewriters, computers, laptops and microphones for the next 10 days, because they’re jokes.

Dylan Byers of Politico mused that Silver might be a “one-term celebrity”, apparently referring to Silver’s accuracy in 2008 , but apparently not noticing his accuracy in 2010 as well. The nadir of these attacks, offered up by Dean Chambers, was not just innumerate, but vile; Silver, he wrote, is

a man of very small stature, a thin and effeminate man with a soft-sounding voice that sounds almost exactly like the ‘Mr. New Castrati’ voice used by Rush Limbaugh on his program.

[Chambers has removed this passage from his piece, but many, including Jennifer Ouellette at Cocktail Party Physics and Andrew Sullivan, captured it before it was taken out.] I’ve seen Silver on TV many times, but he’s usually sitting, so I have no clear idea of his size, and I have no idea what Chambers finds effeminate about him (unless this is a code to reveal that Silver is gay, something that a follower of Silver’s analyses would never know– I didn’t). But even if Chambers physical description were true, what could it possibly have to do with the veracity of Silver’s statistical analyses?

Averages of large numbers of polls have rarely if ever been as far off as these pundits would have had us believe, but in polling, as in science, the proof is in the pudding. As the results came in, Fox News analyst Karl Rove, one of those who had foreseen a Romney victory, seemed to enter a dissociative state, as his inability to assimilate the election results was painfully displayed before the viewing audience. Anchor Megyn Kelly eventually asked him, “Is this just math you do as a Republican to make yourself feel better?” So just as paleontologists can boast “we have the fossils, we win”, poll aggregators can now boast, “we have the election results, we win”.

To me, it seems that there is a class of related, and unfounded, positions taken up primarily by conservatives that have a common source: the determination that when the facts are inconvenient, they can be wished away. As scientists, we’ve seen it mostly in scientific issues: embryology, evolution, the big bang, global warming, the age of the Earth. Some conservatives don’t like the facts, so they create a parallel world of invented facts, or dream up conspiracy theories, and choose to dwell in an alternate reality that, unfortunately for them, isn’t real. In a curious convergence with postmodernism, the very notion of “fact” is disdained. Paul Krugman has noted that the problems are at root epistemological, constituting a “war on objectivity“, and that these conservative pundits have this problem not just with science, but with political reality as well. Andrew Sullivan is also mystified by the divorce from reality.

The poll-based statisticians, all of whom predicted an Obama victory, were broadly correct. Several analyses of *which* analyst did best have already appeared. Nate Silver has compared the pollsters, although Pollster notes it may be too early tell. The *LA Times* has self-assessments by several pundits and poll aggregators.

Being, as I said, a statistics aficionado, and a few weeks having passed since the election, I thought I’d compare the prognostications myself. I chose to compare the three poll aggregators that I followed during the run-up to the election.

All three did a state-by-state (+ District of Columbia) analysis, which, under the electoral college system, makes the most sense. Electoral-vote.com, run by Andrew Tanenbaum, the Votemaster, has the simplest aggregating algorithm: non-partisan polls from every state are arithmetically averaged over a one-week period starting with the most recent poll. Each candidate’s electoral votes are whatever the states he’s leading in add up to.

The Princeton Election Consortium, run by Sam Wang, takes the median of recent polls, assumes an error distribution to give a win probability, then calculates the probability of all 2^51 possible outcomes, creating a probability distribution over the possible electoral vote outcomes. Wang prefers to look at the median, but this distribution also has a mode.

Finally, Nate Silver at 538 takes a weighted average of state polls, where the weights discount known “house effects” of particular pollsters and the recency of the poll, and then throws in corrections for various other non-polling data (e.g. economics), national polling data, and things that effect polling (e.g. convention bounces). This all leads to a win probability, which again leads to a probability distribution of electoral vote outcomes. Silver emphasizes the mean, but this distribution also has a mode. When polling data is dense, and especially when the election date is near, all three should have about the same result. When polling data is sparse, Silver’s method, because it uses other sources of data for predictive inference, might be better.

So, how’d they do? We can look at how they did on state calls, electoral vote, and popular vote.

**State (including District of Columbia) Calls.** Nate Silver got all 51 right. Sam Wang got 50 right, missing on Florida, which he called for Romney, but noted it was on knife edge. The Votemaster got 49 right, called North Carolina a tie, and called Florida for Romney. It’s of course easy to call Texas, New York and California correctly, so the test is how they did in toss up and leaning states. They all did well, but advantage Nate.

**Electoral Vote.** Obama got 332 electoral votes. Nate Silver’s model prediction was 313, Sam Wang’s prediction was 305, and the Votemaster gave Obama 303.

In addition to their predictions, we can also add up the electoral votes Obama would get based on the state calls—I term this the “add-up” prediction. For this prediction, Nate gave Obama 332 (exactly correct), and Sam gave him 303 (because he got Florida wrong). The Votemaster’s prediction *is* the add-up prediction of his state calls, so it’s again 303. We could perhaps split the tied North Carolina electoral votes for the Votemaster, giving 310.5 for Obama, but this brings him closer to the final result only by counting for Obama votes from a state he lost.

For the two aggregators that showed full distributions of outcomes, we can also look at the mode of the distribution (remember, Nate prefers the mean of this distribution, Sam prefers the median). Nate’s mode is 332, again exactly right, while Sam’s mode is 303, although 332 was only slightly less likely an outcome in his final distribution. All of them did pretty well, each slightly underestimating Obama, but once again a slight advantage goes to Nate.

**Popular Vote**. The Votemaster does not make a prediction of national popular vote, so he can’t be evaluated on this criterion. Sam Wang doesn’t track the popular vote either (stressing, correctly, the individual state effects on the electoral college), but he did give every day what he calls the popular vote meta-margin, which is his estimate of the size of the shift in the national vote necessary to engender an electoral tie. Also, in his final prediction, he did make a popular vote prediction, a Bayesian estimate based on state and national polls.

Even more problematic than the predictions is knowing what the election results are. As Ezra Klein (see video 3) and Nate Silver have both noted in the last few days, there are still many votes to be counted, and most of them will be for Obama.

The compilations of major news sources, such as the New York Times or CBS News, are derived from the Associated Press, and have been stuck at one of two counts (Obama 62,211,250 vs. Romney 59,134,475 or Obama 62,615,406 vs. Romney 59,142,004) for some days now, with latest results not added.

Wikipedia is in general an unreliable source (perhaps more on this later). However, David Wasserman of the Cook *Political* Report, despite his article being for subscribers only, has been posting his invaluable collection of state results in Googledocs. The latest results are Obama 64,497,241, Romney 60,296,061, others 2,163,462, or 50.80%, 47.49%, 1.70%. (Rounding to the nearest whole number, this gives Romney 47% of the vote, a delicious irony noted by Ezra Klein in the video linked to above.) We could also calculate the two-party percentages, which are Obama 51.68%, Romney 48.32%

Nate Silver predicted an all-candidate popular vote distribution of Obama 50.8%, Romney 48.3%, others .9%. This is spot on for Obama, and a tad high for Romney. We can, however, convert Nate’s numbers to two-party percentages, and get 51.3 vs. 48.7; this slightly underestimates Obama. Sam Wang gave only a two-party vote prediction, 51.1 vs. 48.9; this is a slightly greater underestimation of Obama’s percentage. Sam’s final popular vote meta-margin was 2.76%, and this is closer to the actual margin (3.31% [all] or 3.36% [two-party]). So one last time, advantage Nate.

(I should note that Nate Silver’s and the Votemaster’s calls are not personal decisions, but entirely algorithmic, with Nate’s algorithm being complex, and the Votemaster’s very simple. Sam Wang’s calls are algorithmic up until election eve, at which point he makes predictions based on additional factors; for example, this year he expanded his usual one week polling window in making his final predictions. In fact, his last algorithmic prediction of the electoral vote, 312, was slightly better than his final prediction.)

More refined analyses of the predictions can be made (political science professors and graduate students are feverishly engaged in these analyses as you read this). We could also do individual state popular votes, and extend the results to Senate races, too. (Quicky analysis of the 33 Senate races: Silver 31 right, 2 wrong; the Votemaster 30 right, 0 wrong, 3 ties; Wang a bit harder to say, because he paid less attention to the Senate, but I believe he got all 33 right.) But overall, we can say that the pollsters (on whose work the predictions were based) and the aggregators did quite well. Of the three I followed closely for the presidential election, Nate Silver gets a slight nod.

The critics of statistics and the statisticians got several things wrong.

First, they did not understand that a 51-49 poll division, if based on large samples, doesn’t mean that the second candidate has a 49% chance of winning; rather, it is far smaller.

Second, they thought the polls were biased in Obama’s favor, but, if anything, they slightly *underestimated* his support and slightly overstated Romney’s (Obama’s margin will increase a bit further as the last votes are counted).

And finally, they thought that the predictions were manipulated by the biases of the aggregators. But the opinions of the aggregators enter only in setting up the initial algorithms (very simple for the Votemaster, most complex for Nate Silver), and in most cases seem to have been well chosen.

Rather, it is the pundits who engaged in what Sam Wang has rightly mocked as “motivated reasoning”; Andrew Sullivan has also noted the bizarre ability of pundits to precisely reverse the evidential meaning of the polls. It was the pundits who were guilty of picking through the polling data to find something that supported their preconceived notions (scientific creationism, anyone?); it was not the aggregators, who, especially in Silver’s case, were generally paragons of proper statistical humility.