For decades the gold standard for public opinion surveys has been the probability poll, where a sample of randomly selected adults is polled and results are used to measure public opinion across an entire population. But the cost of these traditional polls is growing each year, leading many pollsters to turn to online nonprobability surveys, which do not rely on random sampling and instead recruit through ads, pop-up solicitations and other approaches.

The advantages of these online surveys are obvious – they are fast and relatively inexpensive, and the technology for them is pervasive. But are they accurate? Pew Research Center undertook a study to answer this question. Courtney Kennedy, director of survey research, explains what we found.

Courtney Kennedy, Pew Research Center director of survey research

What was the Center’s goal in undertaking this study?

We recognize that the use of online nonprobability surveys is already widespread and that their popularity is likely to increase in the years ahead, due in large part to the growing costs associated with traditional, probability-based surveys. So we conducted this study as a first step in trying to understand what the landscape of online nonprobability polls looks like and how these types of surveys perform – without elaborate adjustments – using several metrics.

What did the study look like?

We constructed a 56-item questionnaire that was administered to nine different samples from eight commercial vendors. The questionnaire included 20 items that have population benchmarks (such as smoking, health care coverage, household composition and more) from high-quality federal surveys so we could estimate the accuracy of each vendor’s results and better understand the makeup of each sample’s participants.

Nearly all the same 56 questions were also asked on waves of the Center’s American Trends Panel (ATP), which is conducted predominantly online but also allows for mail response for adults without internet access. Unlike the other samples in the study, the ATP is probability-based because its members were recruited from our 2014 random-digit-dial telephone survey of 10,000 U.S. adults.

In a nutshell, what did the study find?

We found that not all online surveys perform equally well. A key reason seems to be the different methods employed by vendors. One of the nine nonprobability samples clearly performed better than the others, and it seems to be related to the fact that they use a more sophisticated set of statistical adjustments, both in selecting their sample and weighting their results. Our conclusions about why that sample performed the best are preliminary, though, because we have just one survey from that vendor and the relevant design features were not experimentally tested within that survey.

One of the other major findings was that, in general, subgroup estimates from these samples for blacks and Hispanics were very inaccurate. Almost all the samples were off by double digits – more than 10 percentage points – on average in their estimates for blacks and Hispanics. That is quite concerning for researchers like us who study the experiences of different groups within the U.S. population.

What about the Center’s American Trends Panel? How did it do? And are there any changes in store based on the results?

The ATP performed quite well on most of the benchmark items we studied, such as having health coverage, trust in neighbors and daily smoking. But it also showed a weakness in overrepresenting people who are more civically engaged. Fortunately, follow-up analysis that we did indicated that this weakness does not spill over to other topics that we study, such as political and social attitudes.

While this issue looks to be very contained, we are working to fix it. In the short term we are exploring a weighting solution to correct for it. For a more permanent solution we are exploring recruiting the panel through the mail using an address-based sample instead of recruiting from people who had just completed a survey about politics (and who therefore may be more likely to be civic-minded).

So does this report suggest that nonprobability samples do just as well as probability-based samples?

This report focuses on variation within the online nonprobability space; the study wasn’t intended to answer the question of which is more accurate, nonprobability or probability surveys. While we did include the probability-based ATP in the study, the ATP is not typical in its design. A rigorous comparison of nonprobability and probability samples would need to encompass a wider range of probability-based polls, including random-digit-dial telephone surveys.

Pew Research Center has already used the American Trends Panel for reports it has published. Is it still confident in its data?

We are still very confident in our findings. We have known for a long time that civically minded people are more willing to participate in surveys – well before this study was conducted. For example, we documented this same pattern in our 2012 random-digit-dial nonresponse study and have often noted the fact that people who do surveys about politics tend to be a bit more interested and engaged with politics than the general public as a whole.

While the overrepresentation of civically engaged adults is concerning, we found that it is basically limited to that one topic and does not carry over into other topics that we study. In both 2012 and again following this study, we performed an additional analysis to see what happens when we correct for the volunteerism bias through weighting. Then, as now, the vast majority of our survey estimates move by just 1 or 2 percentage points at most. We do not see evidence of large biases on other survey responses. In fact, in the latest study, the ATP was the most accurate of the tested samples when it came to benchmarks other than civic engagement – so we still have high confidence in the ATP data.

Why did some of the vendors you studied perform better than others?

The panel sources that performed better tended to be in the field longer and to adjust their data on a wider variety of characteristics. By contrast, the samples that did relatively poorly tended to have shorter field periods and to adjust only basic attributes like age, gender and region of residence. However, our conclusions on these points are tentative. We just have the nine samples and we did not experimentally test the effect of the sampling procedures or field length.

Is nonprobability polling the future of survey research? Is this where the field (and Pew Research Center) is headed?

We are planning more work on this topic, but for now it remains experimental. We are still using telephone random-digit dialing as our primary design, though we employ nonprobability samples for questionnaire testing and for certain specialized studies.

This report represents the first step in our effort to better understand online nonprobability surveys. We learned a lot from this study, even about our probability-based methodologies, and we have planned subsequent stages of research to answer further questions. In particular, we’re very interested in the reliability of nonprobability samples. If you do multiple measurements close in time are the results stable? Or do they move around a lot? Historically, reliability has been one of the great benefits of our telephone random-digit-dial work.

Also, for this study we used relatively simple weighting adjustments. We used a process called “raking” to align the sample to population benchmarks on a number of demographics. Exploring more sophisticated weighting and modeling approaches to try to improve these surveys is the cutting edge of this line of research and another question we’re going to be pursuing.

A number of studies over the past 15 or so years have indicated that nonprobability surveys do not accurately measure public opinion. Is that still true?

This study was a first step for us in trying to address that, and I would not say that we have a firm answer yet. One of the samples performed quite well and that is really intriguing to us. We also are open to the possibility that, with the right advanced statistical modeling and weighting, some of these online samples might support fairly accurate and reliable estimates about public opinion.

It’s also important to consider accuracy in light of one’s goals for the research. If the goal is to get a general sense of how opinion is shaped, some nonprobability samples may be accurate enough – especially given their very low cost. If one needs very precise estimates, then our results suggest the field is not there yet.

What do you hope the survey world will take away from this study?

Our results suggest that more elaborate adjustments help to improve accuracy and that not weighting your data – a common practice among online sample vendors – is not a great idea. To advance this survey approach, our study suggests that you need adjustments across a large range of variables, rather than the narrow set often used when weighting traditional, probability samples.

We also hope that this study will encourage others to conduct similar evaluations and contribute to the growing body of empirical work on the accuracy of nonprobability samples.

Courtney Kennedy  is Vice President of Methods and Innovation at Pew Research Center.
Andrea Caumont  is an associate director at Pew Research Center.