How a coding error provided a rare glimpse into Latino identity among Brazilians in the U.S.

Many Brazilians say they are Latino, but the exact number has long been a mystery because surveys rarely explore this question. An error in how the U.S. Census Bureau processed data from a recent national survey provided a rare window into how Brazilians living in the United States view their identity.

In 2020, at least 416,000 Brazilians described themselves as Hispanic or Latino on the Census Bureau’s annual American Community Survey (ACS) and were counted that way, representing more than two-thirds of Brazilians in the U.S. In 2019, by contrast, only 14,000 Brazilians were counted as Hispanic, and 16,000 in 2021. The large number counted as Hispanic in 2020 was due to an error in the bureau’s processing of the ACS data.

How we did this

This analysis focuses on people in the United States who identify as Hispanic or Latino but do not fit the federal government’s definition of Hispanic or Latino. It uses data from the 2019-2021 American Community Survey (ACS) of the U.S. Census Bureau and the 1980 and 1990 decennial census. ACS microdata was accessed through Integrated Public Use Microdata Series (IPUMS) from the University of Minnesota.

The ACS, unlike decennial censuses since 2010, has additional data items that can be used to pinpoint some aspects of identity beyond the questions on Hispanic origin and race. For this analysis, we focus on three population groups for whom we can infer a possible identity:

  • Immigrants by country of birth
  • People born in U.S. territories by place of birth
  • People born in the 50 states and the District of Columbia by ancestry for people who gave only one ancestry response. Only single-ancestry responses are used because some people with multiple ancestries have both Hispanic and non-Hispanic origins.

We focus our analysis on five origin groups from which many respondents seem to have identified as Hispanic or Latino but do not meet the federal government’s official criteria of origins linked to Spanish culture.

Brazil: Though it is the largest country in Latin America, with a population of more than 200 million, its culture and history are linked principally to Portugal, not Spain.

Belize: Formerly British Honduras, Belize is the only Central American country that does not have a “Spanish culture” and is not Spanish speaking.

Philippines: Many Filipinos have Hispanic first names and surnames, a legacy of the Philippines’ history as a Spanish colony for three centuries. But the country, located in Southeast Asia, is not considered to have a Spanish culture.

Non-Hispanic Caribbean: Though sometimes treated as part of Latin America, many countries in this area do not have a close link to Spanish culture. For this analysis, the non-Hispanic Caribbean excludes Puerto Rico, Cuba, the Dominican Republic and Dominica (the latter because of apparent misreporting or miscoding of Dominica as Dominican Republic) as well as “generic” areas coded in the ACS (e.g., “Other West Indies”). It includes the remaining specific Caribbean countries (for example, Haiti and Jamaica) and other U.S. territories, as well as Guyana (plus Suriname and French Guiana in the 1980-2000 decennial censuses).

Portugal: A European country that shares a border with Spain, its people primarily speak Portuguese and not Spanish. For this analysis, Portugal includes the related areas of Cabo Verde and the Azores.

A chart showing that the number of Brazilians counted as Latino in the U.S. surged in 2020 due to a coding error.

Officially, Brazilians are not considered to be Hispanic or Latino because the federal government’s definition of the term – last revised in 1997 – applies only to those of “Spanish culture or origin” such as Mexican, Puerto Rican, Cuban, South or Central American or other origins, regardless of race. In practice, this means that in most cases people who report their Hispanic or Latino ethnicity as Brazilian in Census Bureau surveys are later recategorized – or “back coded” – as not Hispanic or Latino. The same is true for people from other groups as well, such as those with origins in Belize, the Philippines and Portugal.

During the data editing process for the 2020 ACS, the Census Bureau inadvertently left Brazilians and some other groups out of its back-coding procedures. That error resulted in large increases in the number of people counted as Hispanic or Latino within these groups.

In particular, the large number of Brazilians who self-identified as Hispanic or Latino highlights how their view of their own identity does not necessarily align with official government definitions. It also underscores that being Hispanic or Latino means different things to different people. Some draw distinctions between the definition of Hispanic, which for some excludes Brazilians, and the definition of Latino, which for some includes Brazilians but excludes people from Spain. All this makes measuring Hispanic or Latino identity in surveys a complex and delicate undertaking.

<strong>Why federal rules allow some Brazilians to be counted as Hispanic or Latino</strong>

Even though the federal government does not count most Brazilians as Hispanic or Latino, some U.S. residents of Brazilian origin are still counted as Hispanic or Latino after the Census Bureau recodes responses to the following question:

This is the Hispanic origin question used on the 2020 American Community Survey.
Responses to the question usually undergo only minimal edits from the Census Bureau. Respondents who check one of the “Yes” boxes are coded as Hispanic with one exception. If a respondent checks the “Yes, another Hispanic, Latino, or Spanish origin” box and writes in only origins that are not Hispanic (such as “Brazilian” or “Irish”), the response is changed to “No, not of Hispanic, Latino, or Spanish origin.” This edit has been part of Census Bureau procedures for every decennial census since 2000 and every ACS since it began in 2001.

Note that a write-in response is encouraged but not required, so if nothing is written in, no edit is done. This may result in some Brazilians or members of other non-Hispanic groups being coded as Hispanic if they do not write in anything. The “other” Hispanic groups are assigned specific codes (such as Venezuelan) for some tabulations, but for the groups not recognized as Hispanic, we can only determine their origin from other data items.

Other origin groups affected, too

Brazilians were not the only origin group affected by the Census Bureau’s data error. Several other groups also saw larger numbers identifying as Hispanic or Latino in 2020 than in 2021 or 2019. (In this analysis, “nonstandard” Hispanic origin groups, like Brazilians, include immigrants from the specific country or region and people born in the U.S. who report only that ancestry.)

A chart showing the estimated population of several origin groups in the U.S. increased in 2020 due to a coding error.

The number of self-identified Hispanic or Latino people with Filipino origin was 30,000 higher in 2020 than in 2021, while the number with origins in the non-Hispanic countries of the Caribbean – including Haiti, Jamaica, Guyana and the Virgin Islands – was 28,000 higher. The number from Belize was almost 12,000 larger than in 2021, but the number with Portuguese origin was similar to that reported in other recent years.

In total, the U.S. Hispanic or Latino population in these four “non-Hispanic” groups (Brazilians, Belizeans, Filipinos and non-Hispanic Caribbean) in 2020 exceeded that of 2021 by about 471,000 people.

The increase in the Hispanic population among Brazilians was far higher than for the other groups because such a large share of Brazilians (70%) considered themselves to be Hispanic or Latino in 2020. In comparison, 41% of Belizeans said they are Hispanic or Latino, as did 3% of Filipinos and 3% of those with origins in non-Hispanic Caribbean countries.

In fact, enough Brazilians identified as Latino in 2020 that they would fall in the middle of rankings of U.S. Hispanic or Latino origin groups by size, if they were officially counted as one. In 2020, Brazil would have been the 14th-largest Latino origin group with 416,000 who identified as Latino, ahead of Nicaragua (395,000) and below Venezuela (619,000).

Overall, the U.S. Hispanic population is large enough that the coding mistake in the 2020 data amounted to little more than a rounding error in the overall population. The 470,000 people erroneously counted as Hispanic or Latino in 2020 represent less than 1% of the 61.1 million total Hispanics living in the U.S.

‘Back coding’ at the Census Bureau

The Census Bureau didn’t always back code its data this way. In fact, when national data on Hispanic origins was first collected in the 1980 and 1990 decennial censuses, the bureau did not edit the non-Hispanic origins supplied by respondents. As a result, many individuals with origins in Brazil, Belize and the Philippines who indicated they were Hispanic were counted as such.

In the 1980 census, for example, 18% of Brazilians were counted as Hispanic, a share that rose to 33% in 1990. The back-coding process then started in 2000. By the time the American Community Survey was fully implemented in 2006, these shares fell to around 4% or less in most years – with the exception of 2020.

Similarly, the Census Bureau counted 27% of Belizeans as Hispanic in 1980 and 32% in 1990. But with back coding in place, it has counted about 8% of Belizeans as Hispanic in the years before 2020.

Among Filipinos, the Census Bureau counted 12% as Hispanic in 1980 and 6% in 1990. But it has counted fewer than 2% as Hispanic since 2000, except for the coding error in 2020.

The data on Portuguese respondents tells a similar story. In 1980 and 1990, the bureau counted between 8% and 9% of Portuguese respondents as Hispanic. But since 2000, it has coded only 1% to 2% of Portuguese respondents that way every year. The number of Portuguese respondents coded as Hispanic in the 2020 ACS did not differ significantly from the number in 2019 or 2021. This could indicate that Portuguese-origin respondents are generally choosing to self-identify as non-Hispanic or that the data was correctly back coded for this group.