About This Study
A number of people at the Pew Research Center’s Project for Excellence in Journalism worked on PEJ’s "Winning the Media Campaign 2012." Director Tom Rosenstiel and Associate Director Mark Jurkowitz wrote the report along with senior researcher Paul Hitlin and researcher Nancy Vogt. Paul Hitlin supervised the content analysis component. Additional coding and data analysis was done by Researchers Steve Adams, Monica Anderson, Heather Brown and Sovini Tan. Nancy Vogt worked on the computer coding. Katerina Matsa created the charts. Jesse Holcomb copy edited the report. Dana Page handles communications for the project.
This special report by the Pew Research Center’s Project for Excellence in Journalism on media coverage of the 2012 presidential campaign uses data derived from two different methodologies. Data regarding the tone of coverage in the mainstream press were derived from the Project for Excellence in Journalism’s in-house coding operation. (Click here for details on how that project, also known as PEJ’s News Coverage Index, is conducted.)
Data regarding the tone of conversation on social media (Twitter, Facebook and blogs) were derived from a combination of PEJ’s traditional media research methods, based on long-standing rules regarding content analysis, along with computer coding software developed by Crimson Hexagon. That software is able to analyze the textual content from millions of posts on social media platforms. Crimson Hexagon (CH) classifies online content by identifying statistical patterns in words.
The study is based on the aggregated data collected from August 27 through October 21, 2012. This timeframe runs from the week of the Republican National Convention through five days after the second presidential debate.
Human Coding of Mainstream Media
The mainstream media content was based on coverage originally captured as part of PEJ’s weekly News Coverage Index (NCI).
Each week, the NCI examines the coverage from 52 outlets in five media sectors, including newspapers, online news, network TV, cable TV, and radio. Following a system of rotation, between 25 and 28 outlets each weekday are studied as well as 3 newspapers each Sunday.
For this particular study of campaign coverage, three commercial talk radio programs were excluded. In addition, broadcast stories that were 30 seconds or less were also excluded.
In total, the 49 media outlets examined for this campaign study were as follows:
Newspapers (Eleven in all)
Coded two out of these four every weekday; one on Sunday
The New York Times
Los Angeles Times
The Wall Street Journal
Coded two out of these four every weekday; one on Sunday
The Washington Post
The Denver Post
Coded one out of these three every weekday and Sunday
Traverse City Record-Eagle (MI)
The Daily Herald (WA)
The Eagle-Tribune (MA)
Web sites (Coded six of twelve each weekday)
Wall Street Journal Online
Network TV (Seven in all, Monday-Friday)
Morning shows – coded one or two every weekday
ABC – Good Morning America
CBS – Early Show
NBC – Today
Evening news – coded two of three every weekday
ABC – World News Tonight
CBS – CBS Evening News
NBC – NBC Nightly News
Coded two consecutive days, then skip one
PBS – NewsHour
Cable TV (Fifteen in all, Monday-Friday)
Daytime (2:00 to 2:30 pm) coded two out of three every weekday
Nighttime CNN – coded one or two out of the four every day
Situation Room (5 pm)
Situation Room (6 pm)
Erin Burnett OutFront
Anderson Cooper 360
Nighttime Fox News – coded two out of the four every day
Special Report w/ Bret Baier
Fox Report w/ Shepard Smith
Nighttime MSNBC – coded one or two out of the four every day
Hardball (7 pm)
The Rachel Maddow Show
The Ed Show
Radio (Seven in all, Monday-Friday)
NPR – Coded one of the two every weekday
All Things Considered
From that sample, the study included all campaign-related stories:
- On the front page of newspapers
- In the entirety of commercial network evening newscasts and radio headline segments
- The first 30 minutes of network morning news and all cable programs
- A 30 minute segment of NPR’s broadcasts or PBS’ NewsHour (rotated between the first and second half of the programs)
- The top 5 stories on each website at the time of capture
Click here for the full methodology regarding the News Coverage Index and the justification for the choices of outlets studied.
To arrive at the sample for this particular study of campaign coverage, we gathered all relevant stories from August 27-October 21, 2012, that were either coded as campaign stories, meaning that 50% or more of the story was devoted to discussion of the ongoing presidential campaign, or included President Obama, Governor Romney, Vice President Biden or Congressman Paul Ryan in at least 25% of the story.
This process resulted in a sample of 2,457 stories. Of those, 2,257 stories focused on the presidential election while 220 focused on another topic, such as the events in Libya or the economy, but included one of the figures as a significant presence.
Note: The sample of 2,457 stories was used for all data regarding the tone of coverage for each candidate. For the few sections where the overall framing of campaign coverage is discussed in terms of newshole, the sample was made up of 2,903 stories and included talk radio stories and those 30 seconds or less.
Coding of Campaign Stories for Tone
The data in this study derived from PEJ’s regular Index coding was created by a team of seven experienced coders. We have tested all of the variables derived from the regular weekly Index coding and all the variables reached a level of agreement of 80% or higher. For specific information about those tests, see the methodology section for the NCI.
The method of measuring tone was the same that had been used in previous PEJ studies, including the 2008 studies, in order to provide accurate longitudinal comparisons.
Unit of Analysis
The unit of analysis for this study was the story. Each story was coded for tone for each of the four candidates. If a candidate did not appear in at least 25% of the story, they were not considered a significant figure in the story and where therefore coded as "n/a" for not having a significant presence.
The tone variable measures whether a story’s tone is constructed in a way, via use of quotes, assertions, or innuendo, which results in positive, neutral, or negative coverage for the primary figure as it relates to the topic of the story. While reading or listening to a story, coders tallied up all the comments that have either a negative or positive tone to the reporting. Direct and indirect quotes were counted along with assertions made by journalists themselves.
In order for a story to be coded as either "positive" or "negative," it must have either 1.5 times the amount of positive comments to negative comments, or 1.5 times the amount of negative comments to positive comments. If the headline or lead has a positive or negative tone, it was counted twice into the total value. Also counted twice for tone were the first three paragraphs or first four sentences, whichever came first.
Any story where the ratio of positive to negative comments was less than 1.5 to 1 was considered a "neutral" or "mixed" story.
In some previous studies, PEJ used a ratio of 2 to 1 instead of 1.5 to 1 in determining the overall tone of news stories.
The 2:1 ratio makes sets the bar even higher for a story to be coded as either positive or negative overall. Prior to the 2008 election campaign, PEJ reviewed and retested both the 2:1 ratio and the 1.5 to 1 ratio. We also consulted with academics of content analysis. First, we found only minor shifts in the overall outcome of stories. Indeed, in past content studies where we coded using both ratios, the overall relationship of positive to negative stories changed very little. The bigger difference was in an increase in mixed or neutral stories. In our pre-tests in 2007, the Project felt that the 1.5 to 1 ratio more precisely represented the overall tone of the stories. The academics consulted concurred.
Testing of all variables used to determine campaign stories has shown levels of agreement of 80% or higher. For specific information about those tests, see the methodology on intercoder testing.
During coder training for this particular study, intercoder reliability tests were conducted for all the campaign-specific variables. There were two different intercoder tests conducted to assure reliability.
For this study, each of the seven coders were trained on the tone coding methodology and then were given the same set of 30 stories to code for tone for each of the four candidates. The rate of intercoder reliability agreement was 82%.
Coding of Social Media Using a Computer Algorithm
The sections of this report that dealt with the social media reaction to the campaign employed media research methods that combine PEJ’s content analysis rules developed over more than a decade with computer coding software developed by Crimson Hexagon. This report is based on separate examinations of more than 27 million tweets, 400,000 blog posts and 1.1 million Facebook posts.
Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. PEJ draws its analysis samples from several million blogs, all public Twitter posts and a random sample of publicly available Facebook posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training.
According to Crimson Hexagon: "Our technology analyzes the entire social internet (blog posts, forum messages, Tweets, etc.) by identifying statistical patterns in the words used to express opinions on different topics." Information on the tool itself can be found at http://www.crimsonhexagon.com/ and the in-depth methodologies can be found here http://www.crimsonhexagon.com/products/whitepapers/.
Crimson Hexagon measures text in the aggregate and the unit of measure is the ‘statement’ or assertion, not the post or Tweet. One post or Tweet can contain more than one statement if multiple ideas are expressed. The results are determined as a percentage of the overall conversation.
Monitor Creation and Training
Each individual study or query related to a set of variables is referred to as a "monitor."
The process of creating a new monitor consists of four steps. There were six monitors created for this study – three for Obama (Twitter, blogs and Facebook) and three for Romney (Twitter, blogs and Facebook).
First, PEJ researchers decide what timeframe and universe of content to examine. The timeframe for this study was August 27-October 21, 2012. PEJ only includes English-language content.
Second, the researchers enter key terms using Boolean search logic so the software can identify the universe of posts to analyze. For each of these monitors, the Boolean search terms simply consisted of the candidate’s last name ("Obama" or "Romney").
Next, researchers define categories appropriate to the parameters of the study. For tone monitors, there are four categories: positive, neutral, negative, and irrelevant for posts that are off-topic.
Fourth, researchers "train" the CH platform to analyze content according to specific parameters they want to study. The PEJ researchers in this role have gone through in-depth training at two different levels. They are professional content analysts fully versed in PEJ’s existing content analysis operation and methodology. They then undergo specific training on the CH platform including multiple rounds of reliability testing.
The monitor training itself is done with a random selection of posts collected by the technology. One at a time, the software displays posts and a human coder determines which category each example best fits into. In categorizing the content, PEJ staff follows coding rules created over the many years that PEJ has been content analyzing the news media. If an example does not fit easily into a category, that specific post is skipped. The goal of this training is to feed the software with clear examples for every category.
For each new monitor, human coders categorize at least 250 distinct posts. Typically, each individual category includes 20 or more posts before the training is complete. To validate the training, PEJ has conducted numerous intercoder reliability tests (see below) and the training of every monitor is examined by a second coder in order to discover errors.
The training process consists of researchers showing the algorithm stories in their entirety that are unambiguous in tone. Once the training is complete, the algorithm analyzes content at the assertion level, to ensure that the meaning is similarly unambiguous. This makes it possible to analyze and proportion content that contains assertions of differing tone. This classification is done by applying statistical word patterns derived from posts categorized by human coders during the training process.
The monitors are then reviewed by a second coder to ensure there is agreement. Any questionable posts are removed from the sample.
In the analysis of campaign coverage, PEJ uses CH to study a given period of time, and then expands the monitor for additional time going forward. In order to accomplish this, researchers first create a monitor for the original timeframe according to the method described above.
Because the tenor and content of online conversation can change over time, additional training is necessary when the timeframe gets extended. Since the specific conversation about candidates evolves all the time, the CH monitor must be trained to understand how newer posts fit into the larger categories.
Each week, researchers remove any documents which are more than three weeks old. For example, for the monitor the week of October 15-21, 2012, there will be no documents from before September 30. This ensures that older storylines no longer playing in the news cycle will be removed and the algorithm will be working with only the newest material.
Second, each week trainers add more stories to the training sample to ensure that the changes in the storyline are accurately reflected in the algorithm. PEJ researchers add, at a minimum, 10 new training documents to each category. This results in many categories receiving much more than the 10 new documents. On average, researchers will add roughly 60 new training documents each week.
How the Algorithm Works
To understand how the software recognizes and uses patterns of words to interpret texts, consider a simplified example regarding an examination of the tone of coverage regarding Mitt Romney. As a result of the example stories categorized by a human coder during the training, the CH monitor might recognize that portions of a story with the words "Romney," "poll" and "increase" near each other are likely positive for Romney. However, a section that includes the words "Romney," "losing" and "women" is likely to be negative for Romney.
Unlike most human coding, CH monitors do not measure each story as a unit, but examine the entire discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than dividing each story, paragraph, sentence or word, CH treats the "assertion" as the unit of measurement. Thus, posts are divided up by the computer algorithm. If 40% of a post fits into one category, and 60% fits into another, the software will divide the text accordingly. Consequently, the results are not expressed in percent of newshole or percent of posts. Instead, the results are the percent of assertions out of the entire body of stories identified by the original Boolean search terms. We refer to the entire collection of assertions as the "conversation."
Testing and Validity
Extensive testing by Crimson Hexagon has demonstrated that the tool is 97% reliable, that is, in 97% of cases analyzed, the technology’s coding has been shown to match human coding. PEJ spent more than 12 months testing CH, and our own tests comparing coding by humans and the software came up with similar results.
In addition to validity tests of the platform itself, PEJ conducted separate examinations of human intercoder reliability to show that the training process for complex concepts is replicable. The first test had five researchers each code the same 30 stories which resulted in an agreement of 85%.
A second test had each of the five researchers build their own separate monitors to see how the results compared. This test involved not only testing coder agreement, but also how the algorithm handles various examinations of the same content when different human trainers are working on the same subject. The five separate monitors came up with results that were within 85% of each other.
Unlike polling data, the results from the CH tool do not have a sampling margin of error since there is no sampling involved. For the algorithmic tool, reliability tested at 97% meets the highest standards of academic rigor.