As the presidential race draws to a close, there are numerous polls from diverse sources available to the public. However, there is a lack of consistency between many of the polls. Is Hillary up by 3 points in Florida or is Trump up by 2 points? Whose poll is right and whose is wrong? Like many questions in politics, it depends.
All political polls are based upon some assumptions about who is actually going to vote. This is called a model of the electorate. Having a correct or incorrect model will determine how accurately a poll will predict the outcome. Social scientists who argue for a pure random sample can really mispredict an election if they do not take into consideration data collection methodologies. One example is the recent U.S.C. Dornsife/Los Angeles Times Daybreak poll, which has been getting a lot of slack by pollsters, due to its outlying poll results. Some make the argument on how the data was weighted; others blame a 19 year-old Trump supporter for skewing the poll results. While those are both legitimate points and probably contributed to the skewed poll result, sample and data collection probably played an even more significant role in this issue.
Dr. Jim Kitchens, a research practitioner with over 30 years experience in political polling suggests that “ Weighting works as well as setting quotas, within a reasonable limit. If the sampling source (list, panel, etc.) is good, you should be close to your quotas and it may require some weighting.” In other words, weighting alone is not the issue nor is that 19 year-old Trump supporter. By applying quotas in the sample, this would ensure that enough Republicans and Democrats were represented. Thus, minimizing the risk of working with a toxic sample.
The Romney campaign failed to call Ohio (the entire 2012 election for that matter) correctly because they were dependent upon telephone-based data collection. Even merging in cell phones, this methodology will skew a sample toward older voters, white voters, and Republicans. They assumed many of the younger voters and minority voters who supported President Obama in 2008 were not going to vote because they did not find them on the telephone. This was a mistaken assumption. However, if a pure random sample is taken from an internet panel, it may skew the sample toward younger people. This, again, boils down to data collection and sample.
The key is to set quotas from two or three critical groups based upon past elections of a similar nature. The most critical factors for politics are party affiliation, race, age, and gender. According to Dr. Kitchens, “there are two ways to construct a model: (1) quotas during the data collection or (2) mathematical weighting based upon the assumed turnout. Either method is methodologically sound and will work.”
The problem for political polls is that no one knows whose model is right until the election is over. Even Nate Silver, who is regarded as a god among pollsters now because he accurately predicted the winner in the 2012 Presidential election for every 50 states, including the District of Columbia, has had his critics.
This year, several assumption pollsters have to consider include:
Will the minority voters turn out for Hillary Clinton at the sample level they turned out for Barack Obama?
Will Donald Trump be perceived in such a negative way by Republican women that they will either vote for Hillary Clinton or stay home?
With both candidates having a majority of voters view them unfavorably, will turnout among all voters decline? Low turnout usually means an older, more conservative electorate.
Will the outrage from Hispanic leaders toward Donald Trump actually drive a significant percentage of new Hispanic voters into the electorate?
Every poll has to be built upon the assumed correct answer to these questions. So, it will be election day before the argument about whose polls are correct can be answered.
While we may not know whose poll is right or wrong until after the 2016 Presidential Election, I’m sure Mitt Romney would agree with the following statement: Get your sample and data collection methodology right!