October 9, 2018

Comparing Survey Sampling Strategies: Random-Digit Dial vs. Voter Files

4. The RBS poll comports well with data from the full voter file about the partisanship of registered voters

The finding that the registration-based poll tilted slightly less politically conservative than the random-digit-dial poll raises the possibility that the RBS poll suffered from differential partisan nonresponse, with the Republicans called in the RBS poll being less likely to participate than Democrats. That would yield an RBS poll that was too Democratic relative to the U.S. as a whole. But a comparison of those sampled with those responding shows that Republicans were actually more likely to respond than Democrats. Furthermore, routine weighting brought the RBS poll basically in line with registered voters on the full national voter file.

Nonresponse tilted RBS sample more Republican; largely fixed by weightingUnlike with RDD, RBS polling makes it possible to examine the political partisanship of the survey respondents, the nonrespondents and the U.S. as a whole. For those individuals identified as registered to vote by state governments, the voter file has a score (ranging from 0 to 100) describing their likelihood of voting for Democrats or Republicans, with higher numbers indicating a preference for Democrats. These data are modeled estimates of political partisanship by the vendor, and they account for a person’s party registration, race, where they live and other characteristics appended from other publicly available databases. Analysis conducted as part of this study (and others) confirmed that the modeled partisanship data correspond closely with people’s own responses about their political leanings.

Normally, an RBS pollster would just have access to the modeled partisanship of the people sampled for the poll. For this study, Pew Research Center supplemented that with a file consisting of a representative 1% of the total registered voter records in the commercial voter file vendor’s database. In this section, “full voter file” is used as shorthand for estimates computed using the 1% sample of registered voters.17

According to the full voter file, 55% of all registered voters in the U.S. are likely Democrats and 33% are likely Republicans.[18. The sample drawn for the RBS poll was designed to be representative of the country. As expected, the modeled partisanship for all the registered voter records sampled for the poll, which includes respondents and nonrespondents, was 55% likely Democrat and 33% likely Republicans. This is exactly the same as the distribution for the entire national 1% file, which rules out sampling error as a potential reason for differences between the RBS poll estimates and the full voter file.] This is based on a scale of 0 to 100, where higher scores (60 to 100) indicate that someone is more likely to be Democratic and lower scores (0 to 40) indicate someone is more likely to be Republican (scores closer to the middle indicate weaker partisanship/no partisan leaning). These figures are almost exactly the same for the subset of records with telephone numbers on the file. However, among the registered voters who responded to the RBS survey, 51% were likely Democrats and 38% were likely Republicans. This indicates that Democratic-leaning adults were not more inclined to take the survey than Republican-leaning ones. Furthermore, weighting, which did not adjust on partisanship, brought the composition back in line with the complete file. This suggests that any partisan imbalance was not the result of telephone coverage or nonresponse among the registered portion of the RBS sample.

Telephone coverage and nonresponse for registered voters

Analysis of how the RBS poll respondents compared with the full voter file provided many additional insights beyond just partisan nonresponse. To unpack this, it is useful to note that there are two main problems that can lead to unrepresentative samples: noncoverage and nonresponse. Noncoverage means that some individuals in the population are not listed in the frame used to select the sample. Nonresponse occurs when some portion of those who were sampled ultimately do not complete the survey, typically because they could not be reached or declined to participate. Both noncoverage and nonresponse mean that some portion of the population is not included in the final set of survey respondents. If that missing portion is sufficiently large and systematically different from the rest of the population, the result can be biased survey estimates. All surveys suffer from noncoverage and nonresponse to some extent, and pollsters use weighting adjustments to try to correct any imbalances and reduce bias.

Since most public opinion telephone surveys have response rates in the single digits, nonresponse receives the lion’s share of attention from pollsters and survey methodologists. What’s more, the landline and cellphone frames used for RDD telephone surveys together cover about 97% of the adult population in the U.S., making noncoverage much less of a concern.18

For RBS surveys based on voter files, the story is very different, at least with respect to coverage. While nearly all registered voters are listed in state voter files, many of those records do not have an associated telephone number. For this study, the sample of registered voters was drawn from a representative 1% sample of the vendor’s voter file. Only 60% of the records in that file had any kind of telephone number. Importantly, the coverage rate varied considerably across states, ranging from a low of 30% in Alaska to a high of 84% in Indiana. While technically listed on the sampling frame, there is no practical way for individuals without phone numbers to be included in a telephone survey, making them effectively uncovered. For this survey, records without telephone numbers were sent to Survey Sampling International, which performed an additional search. This brought the telephone coverage rate for registered voters up to 73%.

When it comes to nonresponse, RBS and RDD surveys likely have similar dynamics – after all, it is improbable that one’s decision to answer the phone and participate in a survey hinges on the pollster’s choice of database. That said, there may still be important differences from RDD. This could be the case if the kinds of people who have telephone numbers on the file are also easier to contact or more willing to participate, or if asking for respondents by name influences the decision to participate.19

Fortunately, commercial voter files also contain a great deal of information about everyone on the file, not just those who have a phone number or responded to the survey. This makes it possible to see how telephone coverage and nonresponse influence the composition of the survey respondents and how well weighting works to correct any imbalance.

Although the file has hundreds of variables, this analysis was limited to some basic demographics that are known to be accurate in the voter files (age, sex, and race), modeled partisanship, and voting in the 2012, 2014 and 2016 general elections. With the exception of age and vote history, which come directly from state records, these variables must be appended from other sources or estimated using statistical models.

Prior to weighting, there were substantial differences between the national file and the survey respondents confirmed to be registered voters. With few exceptions, these differences were driven by nonresponse rather than telephone coverage. Weighting adjustments were largely successful in correcting demographic and partisan differences, but only partially effective at reducing the overrepresentation of more engaged voters.

Unweighted, RBS respondents skew older than full voter fileIn terms of demographics, the complete file and the portion with telephone numbers were virtually identical. For sex, age and race, none of the categories differed by more than a percentage point. Nonresponse, on the other hand, produced much larger shifts.

The largest differences occurred with age. Individuals ages 65 and older made up 25% of the complete file and 26% of those with telephone numbers. Among respondents, the share was nearly twice as large, at 46%. The shares in the 18-29 and 30-64 age groups were lower than for the complete file by 8 and 13 percentage points respectively.

The racial composition of the unweighted sample was also significantly biased due to nonresponse. Whites made up 75% of confirmed respondents – 11 points higher than the rate on the full voter file and the portion with telephone numbers (both 64% white). The balance of men to women was only minimally affected, with men making up 46% of the full file and 49% of confirmed respondents.

After weighting respondents to match the demographic profile of registered voters in the 2016 CPS Voting and Registration Supplement, the poll’s age and sex distributions both fell back in line with the figures for full national file. Weighting brought the share of white respondents down to 71%. While this is fully 7 points higher than for the complete file, it closely aligns with the CPS estimate (72%) used for weighting.

Missing phone numbers, nonresponse lead to overestimates of votingVoter turnout is the one area where telephone coverage appears to be a meaningful source of error (after weighting) in RBS polling. Across the entire file, 70% of registered voters are recorded as voting in the 2016 presidential election, 41% in the 2014 midterm and 60% in 2012. The turnout rate for all three elections is 3 to 4 percentage points higher among records with a telephone number. After nonresponse, this relatively modest coverage bias increased by an additional 13 to 14 points for the 2012 and 2016 presidential elections and an even larger 21 points for the 2014 midterm.

The overrepresentation of more politically engaged voters was only partially corrected by weighting. At 82%, the weighted share of respondents who voted in 2016 was still 12 points higher than the full voter file. For 2014, weighting brought the share from 65% down to 54%. For 2012, weighting brought the share who voted from 78% to 69%, for a final error of 9 percentage points.

  1. Analysis in this section is restricted to the RBS sample vendor’s registered voter records and does not include records for the unregistered because several key variables (e.g., modeled party affiliation, election turnout) are not available for the latter.
  2. The telephone coverage rate comes from Stephen J. Blumberg and Julian V. Luke. “Wireless substitution: Early release of estimates from the National Health Interview Survey, January–June 2017.” National Center for Health Statistics. December 2017.
  3. For this survey, individuals who were called on a landline were asked for by name at the start of the survey. For cellphone numbers, the person who answered the phone was assumed to be the correct person and their identity was confirmed at the end of the survey.