Comparing random-digit dial and voter file surveys

By Courtney Kennedy, Nick Hatley, Scott Keeter, Andrew Mercer, Ruth Igielnik and Frederic Traylor

A new telephone survey experiment finds that, despite major structural differences, an opinion poll drawn from a commercial voter file can produce results similar to those from a sample based on random-digit-dialing (RDD). The study intentionally pushed the boundaries of current polling practices by employing a voter file and a registration-based sampling (RBS) approach as the basis of a full national sample. While voter files are widely used for election surveys at the state and local level, relatively few pollsters have employed them for national surveys. As a result, there are few settled best practices for how to draw national samples from voter files and how to handle missing phone numbers.

The study also tackles the question of how successful voter files are in representing Americans as a whole, including those who are not registered to vote. This research was possible because voter file vendors are increasingly trying to provide coverage of all U.S. adults, including those who are not registered to vote, by combining state voter rolls with other commercially available databases.

On the large majority of survey questions compared (56 of 65), RBS and RDD polls produced estimates that were statistically indistinguishable.¹ Where the polls differed, the RBS results tilted somewhat more Democratic than the RDD results.

An analysis of survey participation among registered voters in the RBS sample found that any partisan differences between RDD and RBS surveys are unlikely to be the result of too many Democrats responding. In fact, the set of confirmed registered voters who participated in the RBS survey were somewhat more Republican than the national voter file as a whole in terms of their modeled partisanship (38% vs. 33%, respectively).² The routine demographic weighting applied to the sample corrected most of this overrepresentation.

Despite missing phone numbers on the RBS sampling frame, estimates were on par with RDD

Viewed comparatively, the study found several notable advantages to national sampling using the voter file. One such advantage of RBS is the ability to compare the partisan leanings of people who respond to a poll to those who do not – giving researchers some sense as to whether the nonresponders are significantly different from those who are answering. By comparison, little is known about those who do not respond to RDD surveys. RBS is also less expensive to conduct because the phone numbers that are available are more likely to be in service. Two-thirds (66%) of the numbers dialed in the RBS survey were working and residential, versus fewer than half (44%) of those dialed in the RDD survey.

onresponse tilted RBS sample more Republican; largely fixed by weighting

The major limitation of RBS for telephone polling is the absence of a phone number for wide swaths of the public. Unlike RDD samples, which are based on telephone numbers, RBS samples are based on lists of people who may or may not have an associated telephone number on the file. In the national voter file used in this study, a phone number was available for 60% of registered voter records and 54% of the nonregistered adult records. A key finding is that this low coverage rate did not translate into inferior estimates, relative to RDD. On 15 questions where benchmark data were available from government surveys, the RBS and RDD polls showed similar levels of accuracy on estimates for all U.S. adults and also in a companion analysis that examined five benchmark questions for registered voters. When the RBS and RDD estimates differed from the benchmarks, they both tended to overrepresent adults who are struggling financially. For example, the American Community Survey finds that about one-in-ten U.S. adults (10%) do not have health insurance, but this rate was 13% in the RDD survey and 14% in the RBS.

The RDD survey was conducted according to Pew Research Center’s standard protocol for telephone surveys. Interviewing occurred from April 25 to May 1, 2018, with 1,503 adults living in the U.S., including 376 respondents on a landline telephone (25% of the total) and 1,127 on a cellphone (75%). The parallel RBS survey interviewed 1,800 adults, with 884 interviewed on a landline (49%) and 916 interviewed on a cellphone (51%) using a seven-call protocol, which was also used for the RDD survey. Interviewing began April 25 and concluded on May 17, 2018. Both surveys included interviews in English and Spanish.

Other key findings:

- Whites reached by RBS were more Democratic than those reached by RDD. Among non-Hispanic whites, partisanship was evenly split in the RBS survey (46% identified with or leaned to the Republican Party, 46% identified with or leaned to the Democratic Party), while in the RDD there was a 16-point Republican advantage (53% Republican, 37% Democrat). The pattern was reversed for Hispanics.
- Presence of phone numbers on the RBS frame varies substantially by state. In the national registered voter file used for this study, the share of records with a phone number ranged from a low of 30% in Alaska to a high of 84% in Indiana. This phenomenon has long been discussed by survey researchers and has greater implications for state and local surveys than national ones.³
- Both RBS and RDD surveys recorded a low response rate. One of the purported advantages of RBS surveys is their efficiency. Unlike RDD surveys, which rely on lists of potentially working telephone numbers, RBS surveys leverage lists of actual Americans. In addition, RBS surveys typically focus on registered voters, a population that tends to be more cooperative with survey requests than those who are unregistered. The overall response rate was 8% for the RBS survey versus 6% for the RDD survey.
- The RBS survey required more weighting than the RDD survey. While the pool of adults responding to both the RDD and RBS surveys contained proportionally too many college graduates, non-Hispanic whites and older adults, the severity of these imbalances was more acute for the RBS survey. For example, while 19% of U.S. adults are ages 65 and older, this rate was 42% in the RDD sample and 49% in the RBS sample, prior to weighting. Consequently, despite its larger sample size, the margin of error for the RBS survey was larger than that of the RDD survey (3.4 and 3.0 percentage points, respectively).

Overview of study methodology

As part of a multi-year examination of commercial voter files – lists of U.S. adults that combine state voter registries with other public and commercial databases – Pew Research Center conducted parallel national telephone surveys to compare voter files with random-digit-dialing as a sample source. A comparison of results from the two sources is the subject of this report. Among the goals of the study is to determine whether commercial voter files (RBS) could provide data of comparable or better quality than RDD at similar or lower cost. The parallel surveys employed nearly identical questionnaires and were conducted in roughly the same time period (April and May of 2018). The questionnaires included content typical of Pew Research Center political surveys, along with several measures of economic, demographic and lifestyle characteristics for which government statistics are available as a benchmark.

Despite their name, commercial voter files are not limited to registered voters. As research and targeting using these voter files has become more widespread, voter file vendors are increasingly trying to provide coverage of all U.S. adults, including those who are not registered to vote. Accordingly, assessing their suitability as a source for producing a representative sample of the entire U.S. adult population is a key objective of this study.

To obtain the RBS samples for this study, Pew Research Center purchased samples consisting of 1% of the total number of records separately in the registered voter and nonregistered adult databases from L2, a nonpartisan commercial voter file vendor. From these two 1% files, smaller samples were drawn for survey administration. An effort was made to locate a telephone number for all records that did not already have one attached. Telephone numbers were ultimately available or located for 73% of individuals in the RBS registered voter sample and for 55% of those in the RBS nonregistered sample.

Linking named individuals in the voter files to the obtained survey respondent makes it possible to take advantage of important information on the files, most notably an individual’s history of turnout in previous elections. For those reached on a landline, the survey asked for the sampled person by name before proceeding with the interview. If the named person was not living in the household, the interview ended. Due to greater effort and expense involved in obtaining cellphone respondents, researchers took a different approach with the cellphone respondents. Respondents reached on a cellphone were administered the entire interview and asked to confirm their name at the end. More than six-in-ten cellphone respondents (62%) confirmed being the person named on the sampled record. Following the interview, an effort was made to locate those who did not confirm their name (N=351, or 38% of all cellphone respondents) in the L2 databases. In total, 36 of these 351 respondents were located under a different telephone number. Including the 884 landline respondents, a total of 1,485 of the 1,800 respondents have an associated record in either the registered voter or nonregistered database.

The RDD and RBS samples were weighted to match national population parameters for sex, age, race, Hispanic origin, region, population density, telephone usage and self-reported voter registration status. Voter registration is not typically used by Pew Research Center as a weighting variable for its RDD surveys but was employed here in order to ensure that the RDD and RBS samples were identical with respect to this important indicator of political engagement.⁴

Limitations and caveats

[5. A recent Center study explored differences between five voter file vendors in the accuracy of data they were able to match to a national sample of adults (specifically the 3,985 adults active in the American Trends Panel).]

While RBS samples are widely used for election polling in individual states and localities, there have been relatively few national RBS surveys like the one conducted here.⁵ As a consequence, there are few widely accepted best practices for national surveys among practitioners. Pew Research Center researchers made a number of choices in designing the RBS study that might differ from what other researchers would choose to do. For example, RBS pollsters typically sample only records that have a phone number on file, but this RBS sample was selected without regard to presence of a phone number. This enabled us to test whether there would be a material benefit from sampling records that could be matched to a phone number with greater effort. This RBS survey also sampled 21% of its respondents from the vendor’s national database of unregistered adults. We are not aware of any other RBS polls that have sampled nonregistered cases.

Despite efforts to ensure that the RBS and RDD survey efforts were identical in all respects other than the samples used, some differences occurred. The field period for the RBS study was 16 days longer than for the RDD survey, due mainly to limits on availability of interviewer labor. In addition, the ratio of cellphone to landline respondents was 75%-to-25% in the RDD survey and 50%-50% in the RBS survey, as the majority of telephone numbers available in voter files are landlines.

By chance alone, the RDD and RBS polls would be expected to differ on roughly three of 65 questions. The statistical test applied here compares the RDD and RBS estimates for the first answer category on each question, excluding demographics and split-formed items.↩

The same pattern is observed if party registration rather than modeled party affiliation is used.↩

Although not a focus of this report, voter files also vary by state in the quality of information available about the voters’ partisan leanings.↩

Because the relative shares of the RBS sample drawn from the registered and nonregistered databases are arbitrary, the resulting level of self-reported voter registration is not necessarily representative of the actual level in the population.↩

While Pew Research Center is not aware of other studies comparing RBS and RDD for national polling, a number of comparative studies have been done at the state level. These include: Donald P. Green and Alan S. Gerber. 2006. “Can Registration-Based Sampling Improve the Accuracy of Midterm Election Forecasts?” Public Opinion Quarterly 70:197-223; Warren Mitofsky, Joel Bloom, Joseph Lenski, Scott Dingman, and Jennifer Agiesta. 2005. “A Dual Frame RDD/Registration-Based Sample Design: Lessons from Oregon’s 2004 National Election Pool Survey.” Proceedings of the Survey Research Methods Section of the American Statistical Association, Alexandria, VA: 3929-3936; Joseph Shipman and Jay H. Leve. 2006. “Gunfight at the Cleveland Mayoral Primary Corral: RBS Vs RDD in a Head-to-Head Test.” Proceedings of the American Association for Public Opinion Research, 2006 Annual Conference: 1-17.↩