9. Conclusions

By Courtney Kennedy, Nick Hatley, Arnold Lau, Andrew Mercer, Scott Keeter, Joshua Ferno and Dorene Asare-Marfo

Table of Contents

While the growth of online interviewing is a prominent trend in polling, there is variation within that trend in how researchers recruit respondents. This study evaluated three respondent sourcing approaches (using workers from a crowdsourcing website, opt-in survey panels, and address-recruited survey panels) and found that sourcing affects data quality. Specifically, crowdsourced and opt-in survey panel respondents were more likely than those recruited via random sampling of addresses to give bogus data. Bogus data came in several forms including duplicate interviews, answers that had no bearing on the question, answers that were uniformly positive regardless of what was asked and interviews filled out by people who are probably not Americans. Two common data quality checks (one for speeding and another for attention) failed to detect most respondents flagged as bogus.

The study results raise concerns about how secure some public opinion polls are from fraudulent interviews. Consistent with other research, the data from this study suggests that fraudulent respondents in the crowdsourced sample are often foreign residents posing as Americans. For opt-in survey panels, there is little if any evidence of that, and it appears that widely used opt-in panels manage to keep out internet users with foreign IP addresses. With the opt-in survey panels and the crowdsourced sample, however, the study found a small but measurable segment of respondents who seem to operate on the assumption that it is a market research survey and therefore give pleasing (not genuine) answers.

Researchers using crowdsourced marketplaces for social science experiments say it is best practice to restrict participation to workers with a task completion or approval rate of at least 95%. While that may be sound advice for conducting randomized experiments with crowdsourced subjects, it is a dubious constraint when the goal is to obtain a representative sample of Americans for purposes of estimating public opinion. This is tantamount to a pollster paying one company (albeit a very decentralized one) to interview some of their better-performing employees and then describing the results as information about American public opinion. Such a process can be expected to systematically exclude those experiencing hardships affecting their work or those with lower cognitive abilities, not to mention the 99% of the public that does not work on the platform.

The data in this study were collected on the premise that the panels/marketplaces selling online interviews are responsible for ensuring data quality and that additional checking by the researcher is unnecessary. Given that many public polls are described simply as being “conducted online,” it seems likely that at least some researchers operate on this assumption. Other researchers using opt-in data presumably have their own checks in place to try to address the issues raised in this report. To help the public better differentiate trustworthy and untrustworthy polls, it would be helpful if poll methodology statements mentioned what checking, if any, was performed.