Data Labs uses data science methods to complement and expand on the Center’s research agenda. We generate original research by collecting data from a wide range of sources and analyzing it using computational techniques.

Some of our major publications have explored topics such as Americans’ use of social media; the content of #BlackLivesMatter tweets; what school districts across the country promote in their mission statements; popular accounts on “alternative” social media platforms; and the content of sermons delivered in churches across the country.

We also explore the tools and methods behind our work on the Center’s Decoded blog and manage the research computing infrastructure of the organization.

The Center created Data Labs to take advantage of the explosion of new data sources that can shed light on public behaviors and attitudes – and of new types of tools and methods that can convert this diffuse data into unique findings. Our goal is to extract meaning from data through creative design, innovative methods, thoughtful measurement and sound deployment.

Find the latest research from Data Labs here.

Why did Pew Research Center create Data Labs?

Data Labs was created as a response to the changing nature of data on human behaviors and attitudes. The public is expressing views online and leaving behind electronic trails of behavior in unprecedented ways. We can now learn about whom people connect with on social networks, what they search for, and what content they post. At the same time, institutions and groups are using the internet to convey information to diverse audiences, inviting researchers to observe what they post and how people react.

While some of these digital traces of communication and behavior are unstructured and not amenable to analysis in raw form, a number of new technologies are making it easier to collect and process these data. These technologies include:

  • Internet data collection: This includes harvesting web page content and parsing out fields (e.g., dates, names, links and tables) for analysis as well as querying APIs online to obtain formatted data.
  • Natural language processing(NLP): This includes processing text to measure concepts and extract patterns.
  • Machine vision: This refers to analyzing images using computational models that estimate what the images depict.
  • Online distributed labor platforms: These platforms allow major data collection efforts to be divided into a series of small tasks that can then be completed by external individuals. This is sometimes referred to as “crowdsourcing.”

Data Labs is a testing ground for these data sources and the different approaches to analyzing them, with the goal of extracting meaning from the data through creative design, innovative methods, thoughtful measurement and sound deployment.

The Data Labs team also employs methodologies honed across the Center, such as content analysis, survey experiments, and the analysis of open-ended survey responses.