Numbers, Facts and Trends Shaping Your World

How U.S. Lawmakers Have Discussed Asian Americans on Social Media

Methodology

This analysis examines a complete set of Facebook posts and tweets created on any account managed by any voting member of the U.S. Senate and House of Representatives between Jan. 1, 2016, and April 5, 2021. Researchers used the Facebook Graph API, CrowdTangle4 API and Twitter API to download the posts. The resulting dataset contains more than 1.5 million Facebook posts from 772  different members of Congress who used a total of 1,517 Facebook accounts, and more than 3.5 million tweets from 770 different members of Congress who used a total of 1,462 Twitter accounts. 

This analysis includes all text from these Facebook and Twitter posts, including image captions and emojis. Photo and video posts were not included in this analysis unless the post also contained meaningful text, such as a caption. Text that appeared only within images was not included in the analysis. Posts by nonvoting representatives were also excluded.

The broader data collection process is described in more detail here.

Identification of posts mentioning Asian Americans

Researchers from Pew Research Center identified all posts over the entire time frame that mentioned Asian Americans using a case-insentive regular expression (a pattern of keywords and text formatting). To develop the list of keywords, researchers started with a set of terms that included common acronyms, origin group names and hashtags that were clearly related to Asian Americans, then used pointwise mutual information to find related terms that were positively associated with two or more of the intial keywords.5 Researchers then examined these terms, and those that were related to the concept of interest were added into the list of keywords. This process was repeated several times until researchers could not find any additional related terms. The identification process included common variants of each term, such as singular and plural expressions. The final set of keywords used to identify these posts included those listed below. Mentions of Asian American individuals by name, outside the context of other relevant terms and keywords, were not coded for this analysis. 

Terms and acronyms commonly used to describe Asian Americans, the Asian American community or organizations that focus on the Asian American community:

  • Asian American
  • AAPI, APIA, APAICS, CAPAC, APAHM

Names of the most common Asian American origin groups as reported in the U.S. Census Bureau’s American Community Survey:

  • Bangladeshi American
  • Bhutanese American
  • Burmese American
  • Cambodian American
  • Chinese American
  • Filipino American
  • Hmong American
  • Indian American
  • Indonesian American
  • Japanese American
  • Korean American
  • Laotian American
  • Malaysian American
  • Mongolian American
  • Nepalese American
  • Pakistani American
  • Sri Lankan American
  • Taiwanese American
  • Thai American
  • Vietnamese American

Common hashtags relevant to the Asian American community:

  • #stopasianhate, #aapiequalpay, #notyourmodelminority, #aapi2016, #aapi2017, #aapi2018, #aapi2019, #aapi2020, #APAHM16, #APAHM17, #APAHM18, #APAHM19, #APAHM20, #AANHPI, #AANHPIs, #AANHPI

Mentions of specific Twitter accounts relevant to the Asian American community:

  • @NBCAsianAmerica, @NCAPAtweets, @AAAJ_AAJC

To evaluate the performance of the regular expression, researchers examined a random sample of 200 posts for each time period specified in the study: posts from Jan. 1, 2016, to April 5, 2021, and posts from Jan. 1, 2020, to April 5,2021. As new keywords were identified and added to the regular expression using the method described above, researchers pulled supplementary samples that included posts that matched the updated keyword set but not the original, then hand-coded them for validation. These supplementary samples were then combined with the original validation sample. Overall, the human decisions agreed with the keyword method 99.5% of the time. Cohen’s Kappa – a statistic that examines agreement while adjusting for chance agreement – was 0.99.

In the interest of completeness, researchers also identified an additional set of posts that did not contain any of the above terms but did mention South Asian, East Asian and/or Southeast Asian. Two researchers manually evaluated each post in this set of keywords to eliminate those that were not referencing Asian Americans. Those that did reference this group were added to the analysis sample.

In total, 13,279 posts from the entire study period were flagged as mentioning one or more of the keywords listed above and form the basis of this analysis.

Coding a subset of posts mentioning concerns about discrimination, racism or xenophobia against Asian Americans

In addition to identifying posts that mentioned Asian Americans across the entirety of the study period, researchers from the Center conducted a detailed content analysis of a subset of these posts from the time period Jan. 1, 2020–April 5, 2021. Using the validated posts about Asian Americans from this time period, researchers hand-coded each post based on the following criteria:

Does the post express concern about racism, xenophobia, discrimination, inequality, violence or lack of representation in the context of Asian Americans?

Coders were instructed to not guess at the speaker’s intent when coding these posts, or to make assumptions about whether the concern expressed was being offered in good faith. But those that explicitly downplayed these concerns or suggested that they were overblown were not counted. 

Posts that merely recognized the existence of differences between Asian Americans and other groups were not coded as meeting this criteria, unless the post also included language that suggested these differences were disproportionate or related to a broader inequality. For example, a post referencing differences in health outcomes between Asian Americans and other groups as being “disproportionate” would be included. But a post simply noting the factual existence of these differences (with no additional commentary) would not be counted as meeting this criteria.

Four researchers separately coded a random sample of 100 posts and achieved high levels of intercorder reliability on the subset they all reviewed (Fleiss Kappa = 0.89). The remaining posts were then manually coded by five researchers from the Center, including the four who coded the original test sample.

© Pew Research Center, 2021

← Prev Page
1 2 3
Next Page →
  1. CrowdTangle is a public insights tool owned by Facebook.
  2. Researchers used the find_related_keywords function, which is part of Pew Research Center’s open source Python package called pewanalytics for this step; see documentation here.

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information