January 7, 2016

Can Likely Voter Models Be Improved?

Evidence from the 2014 U.S. House elections

By Scott Keeter and Ruth Igielnik

In recent years, polling has missed the mark in several high-profile elections, drawing particular attention to the difficulties inherent in using surveys to predict election outcomes. These failures typically result from one or more of three causes: biased samples that include an incorrect proportion of each candidate’s supporters; change in voter preferences between the time of the poll and the election; or incorrect forecasts about who will vote. While not a new concern, the third of these – the difficulty of identifying likely voters – may be the most serious, and that is the focus of this study. Election polls face a unique problem in survey research: They are asked to produce a model of a population that does not yet exist at the time the poll is conducted, the future electorate.

It is well understood that many people who are eligible to vote and who tell pollsters they intend to cast a ballot will not actually do so. Similarly, some people who express little interest in the election or uncertainty about voting will nevertheless turn out. This is not a source of random error, because people who vote regularly are demographically and politically different from those who vote less often. In U.S. elections, experienced pollsters know that supporters of Republican candidates tend to be more likely to vote than supporters of Democratic candidates, especially in off-year elections. Consequently, identifying who is likely to vote is fundamental to making accurate forecasts from pre-election polls and correctly characterizing the views of the electorate.

This study examines various methods of determining who is a likely voter. It then compares the relative effectiveness of each approach in describing the electorate and measuring the division of the vote between parties in the 2014 U.S. House of Representatives elections. Pollsters would like to have a crystal ball that would allow them to see who will ultimately turn out to vote. While this study has no crystal ball, it has the next best thing: a survey of people interviewed before and after the 2014 congressional elections that is enhanced with verified turnout data from a national voter file (a database of adults and their publicly available voter turnout records from all states).

In particular, this study makes it possible to assess at least some of the benefits of sampling from lists of registered voters, the method favored by many campaign pollsters. Public pollsters, such as Pew Research Center and the major news organizations that conduct election polls, typically have used random digit dial (RDD) samples to reach a random sampling of all Americans, then narrowing down to prospective voters by asking people a series of questions that gauge interest in the election, past voting behavior and intention to vote.1 Campaign pollsters tend to use samples from databases of registered voters and incorporate past vote history from those databases into their forecasting models, ensuring that they know whether the respondent has voted in the past.2 The sample employed in this study was originally obtained from an RDD survey and later matched to a voter file so that both the survey questions and the past vote history could be used in the analysis.

All of the methods examined here result in more-accurate forecasts than using either all those respondents who say they are registered to vote, or else all those who say they intend to vote, both of which include far too many people who ultimately will not cast a ballot. But some approaches performed better than others. Nearly all of the methods produced more-accurate forecasts when voter file records of previous voting were incorporated into the models.

How the study was conducted

The analysis is based on pre- and post-election interviews with 2,424 U.S. adults from Pew Research Center’s nationally representative American Trends Panel who reported that they are registered to vote and were able to be matched to a national voter file. Panelists were interviewed from Sept. 9 to Oct. 3, 2014, about the upcoming congressional election. The survey included a range of standard questions about intention to vote, interest in the campaign, past voting experience and party preference in the election that Pew Research Center and others use to model the likely electorate. Panelists were re-interviewed from Nov. 17 to Dec. 15, 2014, and asked whether and for whom they voted in the election for the U.S. House of Representatives.

The names and addresses of most panelists were gathered as part of the core American Trends Panel methodology and used to match respondents from the survey sample to their corresponding record in a national voter file. To preserve the privacy of the panelists, the names and addresses of the panelists are securely stored and kept separate from the survey data and voter file information.

The voter file, gathered from publicly available individual voter lists from each state, contains information on nearly every voter’s turnout history along with a variety of demographic information (the voter file does not indicate the candidates for whom a person voted, only whether he or she turned out in that election). Matching this voter file to our survey data allows us to incorporate past turnout history and to validate whether panel respondents were recorded as having cast a ballot in the 2014 contest.

  1. An exception to this is polling in primaries. Some public polling organizations use voter files as sampling frames in primaries because they are far more efficient for reaching likely primary voters, who typically constitute a very small share of all voters.
  2. The exact methods employed by most public and campaign pollsters are proprietary and so could not be reproduced precisely here. Many campaign pollsters who sample from voter files also construct their samples to match what they expect to be the demographics and political characteristics of the likely electorate, rather than interviewing a broader group of voters and narrowing the sample to match the likely electorate.