Twitter Reaction to Events Often at Odds with Overall Public Opinion

Data regarding the tone of conversation on Twitter were derived by the Pew Research Center’s Project for Excellence in Journalism from a combination of traditional media research methods, based on long-standing rules regarding content analysis, along with computer coding software developed by Crimson Hexagon. That software is able to analyze the textual content from millions of posts on social media platforms. Crimson Hexagon (CH) classifies online content by identifying statistical patterns in words. The parallel opinion surveys were conducted by the Pew Research Center for the People & the Press during the same general time period as the Twitter data were aggregated.

The data on Twitter comes from an analysis of all publicly available Tweets. The time period for each event varied, but none included more than three days worth of reaction. For each subject, multiple search terms were used to identify appropriate tweets. For example, to find messages commenting on President Obama’s 2013 State of the Union Speech, Tweets were included if they appeared in the four hours following the start of his speech and used the words “state” and “union,” or “Obama,” or “SOTU.” Unlike most human coding, CH does not measure each post as a unit, but examines the entire discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than dividing each Tweet, paragraph, sentence or word, CH treats the “assertion” as the unit of measurement. If 40% of a story fits into one category, and 60% fits into another, the software will divide the text accordingly. Consequently, the results are not expressed in percent of Tweets, but rather the percent of assertions out of the entire body of stories identified by the original Boolean search terms.

Extensive testing by Crimson Hexagon has demonstrated that the tool is more than 90% reliable, that is, in more than 90% of cases analyzed, the technology’s coding has been shown to match human coding. Pew Research spent more than 12 months testing CH and its own tests comparing coding by humans and the software came up with similar results.

In addition to validity tests of the platform itself, PEJ conducted separate examinations of human intercoder reliability to show that the training process for complex concepts is replicable.