Pew Research Center’s Data Labs team uses computational methods to complement and expand on the Center’s existing research agenda. The team collects text, audiovisual and behavioral datasets; uses innovative computational techniques and empirical strategies for analysis; and generates original research. Data Labs also explores the limitations of these data and methods and works toward establishing standards for use and analysis.

The Data Labs project both produces its own reports and collaborates with other research groups at the Center, applying new computational approaches to existing research questions. Past research has explored congressional communication, looked at the ways Americans use social media, and analyzed everything from videos and images to algorithmic bias and religious rhetoric. The Data Labs team also writes about the process of computational social science research on Decoded, the Center’s behind-the-scenes blog about research methods.

In addition, Data Labs manages the Center’s computing infrastructure. That includes building high-performance computing systems and databases that facilitate web data collection and processing; deploying platforms that facilitate collaborative, replicable analysis in R and Python; and developing systems to automate research tasks such as content classification for machine learning.

As is true for Pew Research Center as a whole, Data Labs is nonpartisan and nonadvocacy. The team values independence, objectivity, accuracy, rigor, humility, transparency and innovation.

[View the latest research from Data Labs]

Why did Pew Research Center create Data Labs?

Data Labs was created as a response to the changing nature of data on human behaviors and attitudes. The public is expressing views online and leaving behind electronic trails of behavior in unprecedented ways. We can now learn about whom people connect with on social networks, what they search for, and what content they post. At the same time, institutions and groups are using the internet to convey information to diverse audiences, inviting researchers to observe what they post and how people react.

While some of these digital traces of communication and behavior are unstructured and not amenable to analysis in raw form, a number of new technologies are making it easier to collect and process these data. These technologies include:

  • Internet data collection: This includes harvesting web page content and parsing out fields (e.g., dates, names, links and tables) for analysis as well as querying APIs online to obtain formatted data.
  • Natural language processing(NLP): This includes processing text to measure concepts and extract patterns.
  • Machine vision: This refers to analyzing images using computational models that estimate what the images depict.
  • Online distributed labor platforms: These platforms allow major data collection efforts to be divided into a series of small tasks that can then be completed by external individuals. This is sometimes referred to as “crowdsourcing.”

Data Labs is a testing ground for these data sources and the different approaches to analyzing them, with the goal of extracting meaning from the data through creative design, innovative methods, thoughtful measurement and sound deployment.

The Data Labs team also employs methodologies honed across the Center, such as content analysissurvey experiments, and the analysis of open-ended survey responses.