Pew Research Center’s team of demographers, economists and other analysts accesses and uses a wide variety of publicly available data sources to conduct its population research. Much of this data is collected and made available by governments both at home and abroad, or by quasi-governmental or cross-national organizations such as the World Bank or the United Nations. Some data have been repackaged and distributed by academic entities such as the Minnesota Population Center’s Integrated Public Use Microdata Series (IPUMS), housed at the University of Minnesota. Below, we list and describe some of our most frequently used data sources.
When researchers sit down to craft a research project, they first need to decide whether they want to look at an entire population or a subgroup such as Hispanics, employed Americans, immigrants, mothers with children at home or Asian-American Hindus. We mainly report U.S. or international data at the country level, but sometimes we look at U.S. states, metropolitan areas or counties/municipalities.
The choice of which group to look at – the “universe” of interest – to some extent dictates which dataset we use, and each dataset in turn has limits that dictate what we can do with it. A dataset with a small sample size may not allow us to look at population subgroups or local-level trends because the margin of error would be too large to say anything meaningful. For example, in our 2013 report on Asian Americans, we could only report characteristics of the 15 subgroups for which the Census Bureau makes data available (though this may change going forward). In addition, a dataset may not include a variable that we would like; for example, the U.S. Census does not ask about religion, so we have developed our own estimates based on a large national telephone survey. Researchers also must accept the way the surveys’ original designers operationalized various concepts.
U.S. Census: The United States’ decennial census, which enumerates the entire resident population, has been conducted since 1790, but most Pew Research reports highlight trends over the previous several decades. The decennial census not only supplies population counts, but also was the source until 2000 of detailed characteristics such as marital status, education levels and birth country. The 2010 census asked only basic questions about age, gender, race, ethnicity, household relationship and homeownership; researchers now obtain data about characteristics from the U.S. Census Bureau’s American Community Survey.
American Community Survey: The U.S. Census Bureau’s American Community Survey (ACS) collects detailed information on a broad range of topics that include marital status, births, education, immigration, migration, income, occupation, commuting and disability, as well as housing costs, type and value. The ACS has a continuous collection design with about 2.4 million households included in the 2012 final sample.
The ACS began full-scale operation in 2005 covering only the household population; since 2006 it also includes group quarters such as college dormitories and prisons, so it now covers the entire U.S. population. ACS data are released by the Census Bureau in September for the previous year.
Many of our reports are based on tabulations from public-use samples of individual survey records (with personal identifiers removed) from the ACS. The public-use file includes a representative 1% one-year sample of the entire U.S. population, a 3% three-year sample and a 5% five-year sample obtained from the Integrated Public Use Microdata Series or IPUMS.
Current Population Survey: The Current Population Survey (CPS) is a monthly survey currently of about 55,000 households conducted jointly by the U.S. Bureau of Labor Statistics and the Census Bureau and is the source of the nation’s official statistics on the U.S. labor market – employment, unemployment and wages. The CPS universe covers the civilian “noninstitutional” population, that is, people not living in institutions such as correctional facilities or nursing homes. The CPS is especially useful to us not only because of its rich data on labor force participation, but also because (unlike the ACS) it asks respondents where their parents were born, enabling us to report on immigrant generations.
At Pew Research Center, we often rely on the CPS Annual Social and Economic Supplements (ASEC) conducted each March, featuring a larger sample size (about 75,000 households in recent years) and an expanded set of detailed demographic questions. ASEC data serve as the basis for the well-known annual Census Bureau report on income, poverty and health insurance in the United States (DeNavas-Walt, Proctor and Smith, 2013).
But we use other CPS products as well, including data on voting and turnout from the November Voting and Registration Supplement and data on fertility from the June supplements.
As with the ACS, the CPS microdata used in many of our reports are from the Integrated Public Use Microdata Series (IPUMS) provided by the University of Minnesota. The IPUMS assigns uniform codes, to the extent possible, to data collected in the CPS over the years. More information about the IPUMS, including variable definitions and sampling error, is available at http://cps.ipums.org/cps/documentation.shtml.
Survey of Income and Program Participation: The U.S. Census Bureau’s Survey of Income and Program Participation (SIPP) is a longitudinal survey designed to follow a given panel of households over several years. The panels are nationally representative samples of the civilian, non-institutionalized population of the United States and are selected anew every few years. Each SIPP panel is divided into four interview groups. A complete cycle of interviews lasts for four months, one interview group per month, and is referred to as a “wave.” A single wave typically features a different set of questions referred to as “topical modules.” Although SIPP collects information about a variety of variables, Pew Research Center has used it mainly to analyze household wealth. SIPP has been collecting detailed wealth data periodically since 1984, and the U.S. Census Bureau periodically issues a major report on the nation’s wealth based on SIPP. Detailed information on SIPP data is available at the U.S. Census Bureau’s SIPP website, http://www.census.gov/sipp/index.html.
Other sources: Pew Research Center uses a wide variety of other data in carrying out its demographic research. For a report on “Changing Patterns of Global Migration and Remittances” researchers used migration data from the United Nations and remittance inflow data from the World Bank. For a report on young adults in the wake of the recession, we analyzed Federal Reserve Board data in the form of the Survey of Consumer Finances; the American Housing Survey (conducted by the Census Bureau and sponsored by the U.S. Department of Housing and Urban Development); and the Consumer Expenditure Survey, also conducted by Census Bureau under the auspices of the Bureau of Labor Statistics). We also have used Bureau of Labor Statistics data on employment and unemployment for reports on labor trends. Our work on immigration has benefitted from Department of Homeland Security statistics, and from census data produced by the government of Mexico.