Appendix A: Methodology

Overview of Methods

The estimates of the unauthorized immigrant population presented in this report are derived with a residual methodology that compares the size of the total foreign-born population of the U.S. (legal and undocumented) with an independent, demographic estimate of the legally resident foreign-born population. The difference between the two is the estimated unauthorized population. Variants of the residual method have been used as a basis for measuring the unauthorized immigrant population since 1980 by various analysts, most recently by the Department of Homeland Security (Hoefer et al. 2008). (See Passel 2007 for a review of methods and estimates.) This appendix includes a brief description of the estimation methods and highlights critical assumptions and parameters.

Data on the total foreign-born population for the estimates presented are based on the March Supplements to the Current Population Survey (CPS) for 2001–2008 and on the 2000 Census. The March CPS data have been modified from the official data in several ways to produce a consistent time series that is usable for these estimates and comparisons over time. Two specific modifications are discussed here. The Census Bureau occasionally changes the methods it uses to produce population estimates used as control totals for the CPS. The changes introduced for 2008 had potentially large effects on the foreign-born population, so revised weights were developed for the historical data series to make the annual estimates comparable. The other modification involves allocating to specific countries those immigrants in the CPS who had not been assigned a country of birth or who had been assigned a broad generic code (e.g., born in Central America). The revised weights had a notable impact, especially on the estimate for 2007. The country-allocation changes affect the estimated unauthorized immigrant numbers for countries and regions of birth but have essentially no impact on the U.S. totals.

This report presents annual estimates of the unauthorized population for 2000–2008, but caution should be exercised in interpreting differences from one year to the next as measures of annual change. Sampling error in the survey and nonsampling errors in both the survey and the demographic estimate may be as large as or larger than the measured change. This appendix includes a discussion of estimated sampling variability in the CPS and its potential impact on measuring change in the unauthorized immigrant population. Traditionally, time intervals of at least four to five years have been used (e.g., Passel 2006).

Residual Methodology

The residual methodology relies on a tautological relationship that the total number of unauthorized migrants residing in the country is equal to the total number of all immigrants less the total number of legal immigrants residing in the country, or:

In the Pew Hispanic Center’s application of the residual method, the legal immigrant population consists of two main groups: legal permanent residents (by far the larger) and legal temporary immigrants. The much smaller number of legal temporary immigrants, which includes groups such as foreign students in the U.S. and persons on long-term temporary work visas (H-1B or L-1 visas), is estimated by identifying individual respondents in the CPS whose characteristics align with the visa requirements. This group is then removed from the CPS population (A_total in equation 1) so the remaining comparisons are for permanent immigrants only.

Legal Immigrant Populations

The residual estimates are calculated for a number of detailed population groups subdivided by gender, age (16 groups), country or region of birth (35 areas), date of entry to the U.S., and state (California, New York, Texas, Florida, Illinois, New Jersey and the balance of the U.S.). The following components are summed to estimate the legally resident immigrant population:

a. Persons arriving in the U.S. before 1980—all are assumed to be legal by 2000 or later. The data for this groups are from the March CPS (or 2000 Census), corrected for undercount.

b. Refugees—counted in the year they arrive in the U.S., not when they obtain green cards. Data are from the Office of Refugee Resettlement (ORR) or the Office of Immigration Statistics (OIS).

c. Asylum approvals—included as legal when asylum status is approved. These, too, are counted as arriving in the year of physical arrival in the U.S., if known, or otherwise in the year of approval. Data are provided by OIS.

d. Cuban-Haitian and other entrants, Amerasians, and various groups of parolees—treated similarly to asylum approvals and refugees. They are also included as legal when approved, not when they obtain green cards; for many, these dates are the same. Data are from ORR and OIS.

e. Persons acquiring legal status under the Immigration Reform and Control Act of 1986—included as legal when they obtain their green cards, based on the Yearbooks of Immigration Statistics published by what was then the Immigration and Naturalization Service (INS). Almost all of these 2.6 million formerly undocumented immigrants obtained green cards between 1989 and the late 1990s. They are assigned to years of arrival (many before 1980) based on survey and other data for this group.

f. New legal permanent residents (or persons getting “green cards”). Information on this group comes from OIS and its predecessor offices in INS. Two groups of green card recipients are treated differently in the estimation process:

(1) “New Arrivals”—i.e., persons getting green cards as they enter the U.S.—are counted in the year they arrive (unless they have already been counted in groups b–e to avoid double counting).

(2) Persons “adjusting” to LPR status—i.e., persons getting green cards who are already in another legal status in the U.S. These people are counted as legal in the year they obtain their green card but are assigned to years of arrival based on date of nonimmigrant visa. Persons adjusting from statuses in groups b–e are excluded to avoid double counting.

Other Demographic Components

These legal immigrant population groups are combined using demographic techniques to estimate the legally resident immigrant population for each year and then carried forward one year at a time by adding new immigrants, subtracting deaths and subtracting emigrants. The data elements required for the demographic estimation process are:

a. Mortality rates to estimate deaths. The mortality rates come from official U.S. Life Tables (NCHS) applied to each age-sex-country of birth group.

b. Emigration rates to estimate movement out of the U.S. Age-sex-country-specific rates have been developed using information from Ahmed and Robinson (1994) and Van Hook et al. (2006).

c. Interstate mobility rates to estimate state-to-state movement. These rates are developed from the March CPS, which includes a question on residence one year before the survey.

CPS Coverage

Assumptions about coverage of immigrants in the CPS enter into the estimates at two different points. To compute the initial residual, the CPS data on the total foreign-born population are compared with an estimate of legal foreign-born residents. Because some immigrants are missed in the CPS, the estimate of legal immigrants is “deflated” with assumptions about coverage to develop an estimate of legal immigrants actually counted in the CPS. There are no direct measures of immigrant coverage in the CPS, but the Pew Hispanic Center has developed some estimated undercount rates for legal immigrants that vary by age, sex, race, and duration of residence from race-sex-age-specific estimates of undercount in Census 2000 (Hogan 2001; Mule 2002). For 2008, application of these rates results in an overall CPS undercount rate for legally resident immigrants of 2.0% and of 2.6% for legal immigrants who entered after 1980.

This initial residual estimate is actually an estimate of unauthorized immigrants counted in the CPS. To arrive at the overall total, it is necessary to inflate the numbers by the undercount rate of unauthorized immigrants. Again, there is limited information on census undercount of this group. A study of Mexicans in Los Angeles at the time of the 2000 Census found that unauthorized migrants had undercoverage rates that were several times higher than those of legal immigrants and that averaged 10–15% (Marcelli and Ong 2002). The Pew Hispanic Center has developed a set of assumptions consistent with the available information from the census-based studies and with historical demographic data from Mexico. The undercount rates are higher for countries where the population is largely Latino, for young adult males and for recent arrivals. Overall, in 2008, these assumptions resulted in an estimated undercount of 12.5% for unauthorized immigrants in the March CPS. This assumption is slightly higher than the undercount rate of 10% assumed in OIS estimates (Hoefer et al., 2008, 2007, 2006); however, the OIS estimates use the Census Bureau’s American Community Survey (ACS), not the CPS.

Sampling Error and Interval Estimates

The residual estimate, as computed from equation (1), is subject to sampling error because the CPS component is based on a sample. It is also subject to various nonsampling errors due to the nature of the demographic estimate and the development of the CPS estimate. While the nonsampling errors are difficult to quantify, there are established methods for estimating sampling error, in general. Because the demographic estimate is not sample-based, the sampling error estimate of the undocumented immigrant population is equal to the sampling error for the CPS estimate of the foreign-born population that entered the U.S. since 1980.

The March Supplement to the CPS contains about 80,000 households with roughly 55,000 from the regular March CPS sample and additional households from the previous November as well as some from February and April samples. The survey is not a simple random sample but consists of clusters drawn at different sampling rates to represent states and other sampling strata. As a result, computing sampling errors is not straightforward. The Census Bureau does, however, provide guidance on computing standard errors (U.S. Census Bureau 2008, 2006, for example).

For the estimates shown in this report, the Pew Hispanic Center estimated the standard errors for several different population groupings—including the total foreign-born population and the population subdivided by period of arrival. Several different sets of parameters from the Census Bureau documentation were tested in computing the sampling errors—those for Asian and Hispanic populations, those for measuring income groups, those for employment groups, those for some household members and those for all household members. Each gave slightly different estimates of the standard error for the foreign-born population.

Combining the various estimates produced an approximate standard error of 300,000 for the estimate of unauthorized immigrants in 2008; for Mexico, the standard error is about 175,000; for other Latin America, 150,000; and nations other than Latin America, 225,000. With these standard errors, the 90% confidence interval in 2008 as ±495,000 for the total unauthorized immigrant population (Table 1); ±290,000 for Mexicans; ±250,000 for other Latin Americans; and ±370,000 for non-Latin Americans. Note that the standard error for non-Latin American unauthorized immigrants is larger than for either of the Latin American groups even though the estimated undocumented population is smaller. This pattern results from the fact that the relative size of the standard errors is not a function of the relative size of the undocumented population, but of the relative sizes of the total foreign-born population entering after 1980.

The CPS has undergone a number of changes this decade. In addition, the foreign-born population has increased steadily. As a result, the standard errors of the estimates of unauthorized immigrants are smaller for years earlier in the decade than for 2008. In comparing estimates from different years, the sampling error of both years’ estimates must be taken into account. Thus, the standard error of the difference of change in undocumented population is roughly 1.4 times the standard error of the estimate for one year. When comparing consecutive years, the overlapping sample design of the CPS must be taken into account (U.S. Census Bureau 2006). In this case, the standard error of the change is about 1.2 times the standard error for the population in a single year. The 90% confidence intervals shown in the report are ±1.645 times the standard error of the estimate.

Weighting and Editing the CPS

CPS Weights

The Current Population Survey is weighted to agree with a set of population estimates, called “population controls.” These controls include national estimates by age-sex-race/Hispanic origin, a different set of national totals by age-sex-race and age-sex-Hispanic origin, and two sets of totals for states by age-sex-race (U.S. Census Bureau 2006; Killion 2007).

The population estimates used as control totals for the CPS are supplied each year by the Census Bureau. For most years, the population controls are consistent with those from previous years, but always incorporate new data for the most recent years. Each new series of estimates goes back to 2000 and is labeled with a “vintage” corresponding to the year in which they were introduced. (The March population controls for each year are based on the previous year’s vintage.)

In some years, the changes in the population estimates are larger as a result of new methods and/or data. Such a substantial revision occurred for the “Vintage 2007” estimates when the Census Bureau revised its method for measuring immigration. The revisions lowered the measured level of immigration for every year since 2000. As a result, the vintage 2007 population estimate for March 2008 was about 800,000 less than what it would have been if the vintage 2006 methods had continued; the change reduced the Hispanic population by about 400,000. While the Census Bureau releases the entire series of population estimates, it does not go back in time and revise the previous March CPS supplements.

The vintage 2007 revisions clearly had the potential to affect the measured size of the foreign-born population and thus the Pew Hispanic Center’s measures of undocumented immigration (U.S. Bureau of Labor Statistics 2008). The CPS estimate of the foreign-born population is obtained by summing the individual weights for foreign-born respondents and not directly from the population estimates. But revisions that affect weights of Hispanics and Asians can have a sizable impact on the measured foreign-born population.

Because this report includes the time series of undocumented population estimates for 2000–2008, it is important that the estimates be computed with consistent data. To correct the measures for changes in weighting and estimation methods, we reweighted the March CPS data for 2003–2007 using the vintage 2007 population estimates (U.S. Census Bureau 2008a) according to the weighting specifications used by the Census Bureau (2006 and Killion 2007). For 2003–2006, the impact of the changed population controls was negligible, affecting the estimate of undocumented immigrants by less than 100,000. However, for the March 2007 CPS, the introduction of new controls lowered the estimate by 300,000 over what would it have been using the originally published March CPS weights.

The published estimates and specifications did not permit full reweighting of the March 2000–2002 CPSs because of changes in the collection of race data. We anticipate revising the estimates for these years after vintage 2007 data that use the old race definitions are developed.

Country of Birth

The estimates of the unauthorized population shown in this report divide the world into a number of regions. “Latin America” is defined to include Mexico, Central America, Caribbean countries and South America. “Europe” includes Russia and all of the newly independent countries that were part of the former Soviet Union, even though some of the countries are geographically in Asia. This grouping is designed to maintain maximum consistency over time and with the administrative data series used. While all of these countries are separately identified in immigration statistics since their independence, they do not appear in immigration statistics of the 1980s nor are most identified as countries of birth in the CPS. “Asia,” as used in this report, is composed of the Middle Eastern countries of southwest Asia, but not the states that were part of the former Soviet Union. “Africa and Other” consists of all African countries, Oceania, and the small number of respondents not assigned a specific country of birth code.

The published CPS data assign specific countries of birth to almost the entire foreign-born population. However, several hundred thousand (weighted) cases each year are assigned as foreign born, but with their country of birth unknown. In addition, there are a number of “generic” categories used for each region of the world to encompass individuals reporting countries with too few respondents to be identified separately or individuals not giving a specific country response (e.g., Other Europe, Central America, North America). For previously published estimates (e.g., Passel 2006), many individuals with an unknown country of birth were assigned to specific countries or regions on the basis of Hispanic origin (e.g., Mexican origin and unknown country of birth to Mexico), race (e.g., Asian race to Other Asia), and country of birth of mother, father or other close relatives. However, a significant number of respondents remained in the generic categories.

For the estimates presented here, the editing process was extended to assign basically all individuals with an unknown country of birth to a specific country. Those assigned by the previous method were assigned in the same manner; the allocation process was extended to encompass a wider range of relatives and to use reports from nearby households together with the respondent’s race and Hispanic origin. In addition to assigning individuals with an unknown country of birth, the new allocation process was expanded to include some of the generic regional groupings (if all or almost all of the immigrant-sending countries in the region could be identified). For example, for 2000–2006, the CPS included a category “born in Central America” even though all Central American countries were coded individually. Thus, in the recoded data for 2000–2006, individuals are no longer coded as “born in Central America” but more individuals are assigned to each of the specific countries. In contrast, there is a category labeled “Other African Country” but so few African countries are coded individually that the generic code could not be reliably reassigned.

The groups affected by the reassignment of country of birth differed for 2000–2006 from 2007–2008 because the Census Bureau expanded and changed the country of birth codes beginning with the January 2007 CPS (U.S. Census Bureau 2008b). The groups affected by the reassignment for 2000–2006 were North America, Central America and Unknown Country. For 2007 and later, the revised coding of countries eliminated the North America and Central America codes and expanded the number of specific countries identified. As a result, a broader set of codes be reassigned. These are: Europe not specified, Asia not specified, South America not specified, the Americas; and Unknown Country. While the reassignment of country codes affects the estimates for individual countries and smaller regions, the impact on the total number of undocumented immigrants estimated for each year is negligible and only slightly larger for the three broad groups reported here.