Methodology

About This Study

A number of people at the Pew Research Center’s Project for Excellence in Journalism worked on Pew Research’s “News Coverage Conveys Strong Momentum for Same-Sex Marriage.” Senior Researcher Paul Hitlin, Acting Director Amy Mitchell and Associate Director Mark Jurkowitz and wrote the report. Coding and data analysis were conducted by Researchers Steve Adams, Monica Anderson, Emily Guskin and Nancy Vogt. Katerina Matsa and Monica Anderson created the charts. Jan Boyles copy edited the report. Dana Page handles communications for the project.

Methodology

This special report by the Pew Research Center’s Project for Excellence in Journalism examined news media coverage of the same-sex marriage issue from March 18 through May 12, 2013.

The study employed four different methodologies.

Data regarding coverage in the main news media sample, the Huffington Post and LGBT outlets were derived from the Pew Research Center’s in-house coding operation.

Data regarding the tone of conversation on Twitter were derived from a combination of the Center’s traditional media research methods, based on long-standing rules regarding content analysis, along with computer coding software developed by Crimson Hexagon. That software is able to analyze the textual content from millions of posts on social media platforms. Crimson Hexagon (CH) classifies online content by identifying statistical patterns in words.

Data regarding the terminology used during Google searches and the language used by newspapers were derived by the use of Google Trends and searches of LexisNexis.

Human Coding of News Media

Sample Design

The sample of mainstream news included outlets from newspapers, online news, network TV, cable TV and radio. In total, there were 58 outlets studied, which followed the system of rotation listed below.

Because the Huffington Post produced so much material on same-sex marriage, the data from that site were treated separately throughout the report. The LGBT outlets were also treated separately.

Newspapers (11 in all)

Coded two out of these four every weekday; one on Sunday The New York Times Los Angeles Times USA Today The Wall Street Journal Coded two out of these four every weekday; one on Sunday The Washington Post The Denver Post Houston Chronicle Orlando Sentinel Coded one out of these three every weekday and Sunday Traverse City Record-Eagle (MI) The Daily Herald (WA) The Eagle-Tribune (MA)

Websites (Coded every other weekday)

Huffington Post
Slate
Buzzfeed
Gawker
Newser
Politico
The Daily Caller

Network TV (Seven in all, Monday-Friday)

Morning shows – coded one or two every weekday ABC – Good Morning America CBS – Early Show NBC – Today

Evening news – coded two of three every weekday ABC – World News Tonight CBS – CBS Evening News NBC – NBC Nightly News

Coded two consecutive days, then skip one PBS – NewsHour

Cable TV (15 in all, Monday-Friday)

Daytime (2:00 to 3:00 p.m.) coded two out of three every weekday CNN Fox News MSNBC

Nighttime CNN – coded one or two out of the four every day

Situation Room (5 p.m.) Situation Room (6 p.m.) Erin Burnett OutFront Anderson Cooper 360

Nighttime Fox News – coded two out of the four every day Special Report w/ Bret Baier Fox Report w/ Shepard Smith O’Reilly Factor Hannity

Nighttime MSNBC – coded one or two out of the four every day PoliticsNation Hardball (7 p.m.) All In with Chris Hayes The Rachel Maddow Show

Radio (Seven in all, Monday-Friday)

NPR – Coded one of the two every weekday

Morning Edition All Things Considered

Radio News ABC Headlines CBS Headlines

Talk Radio Rotate between:

Rush Limbaugh Sean Hannity

Coded every other day Ed Schultz

LGBT Outlets

Weekly Newspapers

The Bay Area Reporter (San Francisco)
Bay Windows New England (Boston)
Metro Weekly (Washington, DC)
Washington Blade (Washington, DC)
Windy City Times (Chicago)
The Dallas Voice (Dallas)
Philadelphia Gay News (Philadelphia)

Bi-monthly newspapers and magazines

OutFront Colorado (Denver)
Georgia Voice (Atlanta)
Gay City News (New York)
Frontiers LA (Los Angeles)

From that sample, the study included all relevant stories:

From the entirety of newspapers, websites, evening network TV, radio news programs (headlines and NPR) and LGBT outlets
In the first hour of network morning news and all cable programs

Sample Selection

Stories were included in the study if 50% or more of their content was on same-sex marriage. To find the relevant stories in all outlets (except for radio programs and LGBT outlets), researchers used a broad set of keyword searches to collect potential stories. Then, researchers read or watched each story to see if the 50% of the story or more was on same-sex marriage.

For newspaper stories, researchers searched LexisNexis with the following terms to find potential stories:

(marriage AND same-sex) OR (marriage AND “same sex”) OR (marriage AND traditional) OR (marriage AND gay) OR (marriage AND equality) OR (marriage AND homosexual) OR (marriage AND defense) OR DOMA OR (gay AND rights) OR “proposition 8” OR “prop 8”

Note: The LexisNexis database often does not include wire and syndicated articles from other sources, even if those stories appeared in the print editions of the newspapers. Pew Research conducted a test to see if the newspaper data for tone would have been impacted had wire and syndicated stories been added to the sample. A test of 22 randomly selected stories from various wire services found 55% with supportive statements outweighing opposing by a margin of 2:1, none that had the opposite and 45% that were evenly mixed or neutral. These numbers suggest that had wire and syndicated stories been included, the percentages would not have been impacted dramatically. In fact, the percent of supportive stories in newspapers would have increased and the percent of opposing stories would have decreased.

For websites, the same terms used for newspapers were also searched using Google News to find all stories produced by each site.

For cable and network television programs, closed-captioning text was searched using the following terms (stories that came up were then viewed by coders):

Marriage OR DOMA OR “proposition 8” OR “prop 8” OR “prop eight” OR “proposition eight” OR (gay AND rights)

For NPR programs, researchers examined the run-downs of each show provided on the NPR.org website to find relevant stories.

For radio headlines, talk radio programs and the LGBT outlets, the entire publication or program was examined by a researcher to determine which stories should be included in the study.

This process resulted in a sample of 1,081 stories.

Human Coding of Stories

The data in this study were created by a team of six experienced coders. We have tested all of the housekeeping variables (such as source, date, opinion vs. reporting and format) on numerous occasions, and those variables each reached a level of agreement of 80% or higher.

The method of measuring support/opposition was the same that had been used to measure tone in previous Pew Research studies.

The unit of analysis for this study was the story. Each story was coded for tone indicating whether the story in total offered more support of same-sex marriage, opposition, or a more even mix between the two (or no views at all).

The support/opposition variable measures whether a story’s tone is constructed in a way, via use of quotes, assertions, or innuendo, that results in supportive, neutral or opposing coverage related to same-sex marriage. While reading or listening to a story, coders tallied up all the assertions that were either clearly in support of same-sex marriage, or clearly in opposition. Direct and indirect quotes were counted along with assertions made by journalists themselves.

In order for a story to be coded as either “supportive” or “opposing,” it must have either twice the amount of supportive comments to opposing comments, or twice the amount of opposing comments to supporting comments. If the headline or lead had a supportive or opposing tone, it was counted twice into the total value. Also counted twice for tone were the first three paragraphs or first four sentences, whichever came first.

Any story in which the ratio of supportive to opposing was less than 2-to-1 was considered a “neutral” or “evenly mixed” story.

The 2-to-1 ratio sets the bar even higher for a story to be coded as either supportive or opposing. The goal was to only code stories as supportive or opposing when that story was clearly emphasizing one point of view more often than the other.

Intercoder Testing

In addition to the main intercoder testing conducted on all housekeeping variables, supplemental testing was conducted on the additional variables used in this portion of the study. For the following codes, 25 randomly selected stories were coded by all members of the coding team.

The percent of agreement for each variable was as follows:

Focus: 82%
LGBT Source: 92%
Tone (support/oppose/neutral): 88%
All other sources combined: 95%
Individual arguments:
Issue of Civil Rights – 92%
Government Should Not Determine Marriage – 96%
Helps Marriage and Families – 92%
Polls & People Are in Favor of It – 82%
Homosexuality Is Immoral – 82%
Hurts Society or Traditional Family – 76%
Public Has Voted Against It – 93%
Civil Unions Are “Good Enough”- 96%
Government Should Not Impose New Definition – 96%
Public Support Has Increased Because of Obama – 100%
GOP Will Support Someday – 93%
It Is Inevitable to Become Legal – 88%
SCOTUS Should Not Have Taken the Case/Left to the States – 100%
Coding of the Tone on Social Media Using a Computer Algorithm

The sections of this report that dealt with the social media reaction employed media research methods that combine Pew Research’s content analysis rules developed over more than a decade with computer coding software developed by Crimson Hexagon. This report was based on an examination of more than 2.4 million tweets.

Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. Pew Research draws its analysis samples from all public Twitter posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training.

According to Crimson Hexagon: “Our technology analyzes the entire social internet (blog posts, forum messages, Tweets, etc.) by identifying statistical patterns in the words used to express opinions on different topics.” Information on the tool itself can be found at http://www.crimsonhexagon.com/, and the in-depth methodologies can be found here http://www.crimsonhexagon.com/products/whitepapers/.

Crimson Hexagon measures text in the aggregate. The unit of measure is the ‘statement’ or assertion, not the post or Tweet. One post or Tweet can contain more than one statement if multiple ideas are expressed. The results are determined as a percentage of the overall conversation.

Pew Research Center spent more than 12 months testing CH. To test the validity of the software, two human researchers coded 200 stories that were also coded by the algorithm. The human coders and algorithm agreed on the coding 81% of the time.

In addition to validity tests of the platform itself, Pew Research conducted separate examinations of human intercoder reliability to show that the training process for complex concepts is replicable. The first test had five researchers each code the same 30 stories, which resulted in an agreement of 85%.

A second test had each of the five researchers build their own separate CH projects to see how the results compared. This test involved not only testing coder agreement, but also how the algorithm handles various examinations of the same content when different human trainers are working on the same subject. The five separate monitors came up with results that were within 85% of each other.

For Twitter, Pew Research used the following Boolean search filter to collect potentially relevant Tweets:

(marriage AND same-sex) OR (marriage AND “same sex”) OR (marriage AND traditional) OR (marriage AND gay) OR (marriage AND equality) OR (marriage AND homosexual) OR (marriage AND defense) OR DOMA OR (gay AND rights) OR “proposition 8” OR “prop 8”

Google Searches

The portions of the report that explained how often specific terms were searched for in Google were created using data from Google Trends. The tool indicates the number of web searches for each term relative to the total number of searches on Google. Data have been normalized, and are based on a scale of 0 to 100.

The following phrases were entered into Google Trends to determine the number of searches in the United States only:

gay marriage
same sex marriage
marriage equality

LexisNexis Searches for Use of Terms by Journalists

The portions of the report that dealt with terms used by journalists were derived using keyword searches in the LexisNexis database. Researchers searched 24 of the 25 most widely circulated newspapers for the following terms:

“gay marriage”
“same sex marriage” or “same-sex marriage”
“marriage equality”
Note: The Wall Street Journal was not included because its contents are not available in LexisNexis.