January 7, 2016

Can Likely Voter Models Be Improved?

Appendix B: The choice of a turnout measure

There are two indicators of voter turnout available for the type of analysis in this report: (1) each respondent’s self-report in the post-election survey and (2) a voter file record of turnout. Among registered voters, 63% have a voter file record indicating that they voted in 2014 (“verified voters”) and 75% said they voted (“self-reported voters”).

Comparing the two approaches reveals that nearly everyone (99%) recorded in the voter file as having voted also reported that they voted. Conversely, almost no one (2%) who said that he or she did not vote has a record of having voted. Thus, a voter file record of voting almost certainly identifies actual voters. The problem is that a considerable number of respondents who said they voted do not have a corresponding record of voting: 26% of those with no voter file record of turnout say that they voted. These individuals may be correctly reporting their vote but were missed by the voter file, or they are misreporting their vote.

It is well understood that people over-report socially desirable behaviors such as voting. In our sample, 75% of our registered voter sample said they voted. That would imply a national turnout rate of about 53%, far higher than the actual rate of 36%. Some of this difference could be accounted for if politically engaged people are overrepresented in the panel. But the magnitude of this difference seems particularly large, given that the sample is already limited to self-described registered voters (94% of whom have a registration record on file) and the survey is weighted to match population demographic characteristics that are themselves strongly correlated with voter turnout. Both of these characteristics of the survey should mitigate the effects of nonresponse bias.

The verified voter turnout was 63% of registered voters (and 64% of verified registered voters), which implies a national turnout rate of about 44% – higher than documented by the total ballots counted, but less so than the rate based on self-reported turnout.

Berent, Krosnick and Lupia (2011) argue that much of the discrepancy between self-reports and voter file information is a result of errors in the matches or the voting records. More recently, Jackman and Spahn (2015b) estimate that at least 11% of the adult population is not listed on commercial voter files, and that the characteristics of those who are missing are quite different from those who are listed. Indeed, we find that the kinds of respondents who report voting but have no record of doing so are more mobile and thus more likely to be missed by the company that assembled the voter file. Among these discrepant cases, 48% have lived at their current address for five years or longer, compared with 70% among verified self-reported voters. Of course, length of tenure at an address is itself related to the likelihood of voting, so the shorter tenure of residence of these individuals is both a reason they would be missed by the voter file and a reason they might not have voted.

Alternatively, Ansolabehere and Hersh (2012) make the case that technological change and new legal requirements have resulted in significant improvements in the quality of the voter files. They argue that the vast majority of discrepancies result from misreporting rather than voter file errors.

Ultimately, verified vote was chosen as the “ground truth” because the error introduced by over-reporting of turnout was judged to be greater than the error resulting from mismatching. Matching errors are indeed problematic, but likely to afflict general public samples more severely than samples like the one employed in this analysis.