Pew Research Center
Fact Tank Blog |

Q/A: What the New York Times’ polling decision means

The New York Times and CBS News made big news in the polling world this weekend when they announced that they will begin using online survey panels from YouGov as part of their election coverage. YouGov, a U.K.-based research firm founded in 2000, uses such panels rather than traditional telephone surveys; the panel the Times and CBS are using has more than 100,000 members. The Times, citing concerns about the dearth of high-quality, non-partisan survey data, particularly at the state level, says it plans to include YouGov results as part of “a diverse suite of surveys employing diverse methodologies.”

Scott Keeter, the Pew Research Center’s director of survey research

While panels have long been used by market researchers, they’re relatively new in the opinion-research field, and views on them are sharply divided. We asked Scott Keeter, the Pew Research Center’s director of survey research, to explain the issues at stake and give us his preliminary thoughts.

What’s different about the YouGov panel surveys from the surveys previously used by the Times, CBS and Pew Research?

There are two big differences. One is that these are conducted entirely online, among internet users. People who don’t use the internet aren’t included (more on this below). The other – and arguably the biggest — difference is that the samples for these surveys are selected using so-called non-probability sampling methods. For decades, only probability – or random – samples have been generally accepted as a scientific way to produce accurate, representative samples for surveys.

Explain the difference between a probability and a non-probability sample.

The American Association for Public Opinion Research, the leading association of survey research professionals, has explained it well. Here’s how their Task Force on Non-Probability Sampling put it in their major report last year: “In a probability sample, everyone in the population of interest (e.g., all registered voters in a political poll) has a chance of being selected for an interview. Knowing those chances is critical to creating valid statistical estimates.” Non-probability samples, in contrast, “are those in which the participants are chosen or choose themselves so that the chance of being selected is not known.”

Here’s how the Times’ Nate Cohn explained how YouGov puts its online panel together: The firm “attempts to build a large, diverse panel and then match[es] its panelists to demographically similar respondents from the American Community Survey….This step is intended to mimic probability sampling. But it can require significant assumptions about the composition of the electorate, including partisanship.”

The biggest benefits of probability sampling are (1) the fact that the researchers are selecting respondents, rather than allowing respondents to volunteer to participate, and (2) that knowing the chances of selecting an individual allows us to estimate how much any given sample is likely to differ from what we would have found had we interviewed everyone in the population. That shows up as a survey’s “margin of error.” There is no comparable margin of error for non-probability samples.

So why would the Times and CBS look to non-probability methods? Do probability-based methods such as random-digit dialing (RDD) telephone surveys have their own problems?

Indeed they do. As our previous research on survey non-response has shown, it’s increasingly difficult to reach and interview people. But there are two important points to make. First, telephone polls – including ours — did a generally good job predicting elections in 2012 and in previous years. Our samples continue to look very much like the population on most indicators for which we have reliable data from the Census and other high-quality sources.

Second, the troubles faced by one method of research cannot, by themselves, be used to justify the adoption of an alternative. The alternative has to prove itself to be accurate enough and precise enough for the purposes to which we currently apply RDD telephone surveys. The field knows a lot about the biases and problems with our current methods. There is a much less complete understanding of the biases and problems with non-probability methods, but I’m hopeful that we’ll gain that understanding through a process of careful experimentation

What does it mean for journalism, and for the survey world more generally, that the Times and CBS, of all news organizations, have decided to use YouGov’s panel? Would you expect other news organizations to follow now that those two have broken the ice?

This is a very big deal in the survey world. Until now, no major news organization has put its brand on using surveys based on non-probability methods. The move has set off a very lively debate on Twitter among journalists and pollsters. There are strong opinions about the issue of non-probability samples. Here’s one view, from Washington Post pollster Scott Clement:

And a different view from Amy Walter, national editor of the Cook Political Report:

I can’t predict what other organizations are going to do, but I do expect this to spur more experimentation – and that’s a good thing for the field. Because the Times and CBS News have good reputations for transparency, I fully expect that we will learn a lot more about the YouGov methodology in the coming weeks. That’s a good thing as well.

How can a survey based only on internet users provide an accurate representation of the entire public?

It’s worth remembering that not everyone is online: According to our most recent estimates, 89% of U.S. adults use the internet. The good news from a polling perspective is that that figure is steadily increasing – it was 79% just five years ago – so the potential bias from excluding non-internet users in getting smaller and smaller.

It’s still the case that people who don’t use the internet are different in many ways from those who do – in particular, they are older, poorer and less educated. But their dwindling numbers mean that their absence from a survey won’t make a huge difference in the findings on most questions. Still, we think it’s important to be able to describe our samples as “nationally representative” and try to make sure that they are whenever possible.

If this proves to be a successful endeavor for the Times and CBS News, does that mean other pollsters will embrace non-probability sampling?

Not necessarily. It’s important to keep in mind that online non-probability panels vary in quality, just as probability-based surveys do. One of the most important points in the AAPOR Task Force report is that there’s no single consensus method for conducting “non-probability sampling.” There are many different approaches, and most of them don’t have the public record of performance that YouGov has. YouGov has been conducting public polls in elections for many years. As a result, they have a track record that can be compared with probability-based polls. Until we have more organizations conducting polls in advance of elections and explaining their methods in detail, I believe that adoption of non-probability sampling for political polling will proceed slowly.

Isn’t Pew Research using an online panel right now?

We do have a panel – it’s called “The American Trends Panel” — but it’s very different from the one that the Times and CBS are using. It’s based on a probability sample, and while most of the interviews are conducted online, we also have panelists who don’t use the internet. We interview those individuals by mail or phone. Here’s a link to more detail.

Is Pew Research ever going to use the kind of online non-probability panel that the Times and CBS are using?

Yes, we will – but the real question is what we will use it for. Our current standards permit the use of non-probability samples for certain purposes, such as conducting experiments or doing in-depth interviews. In addition, we have embarked on a program of research to help us better understand the conditions under which non-probability samples can provide scientifically valid data. We also are exploring how to utilize non-survey data sources, which by their very nature tend to come from “samples” that are not random. But until we understand the pros and cons of those methods a lot better, we’re going to be very cautious about incorporating them into our research.