Content Analysis

Human coding of news media

Human coding of data is much as it sounds: Trained coders read or look through some selected sample of media coverage and systematically note its important traits. The questions (or variables) used differ depending on the goal of the project. Questions range from ‘what is the topic’ to ‘is the treatment of the topic positive or negative.’

As with any type of research, the process of conducting a content analysis project begins with the selection of a main research question or goal. What is it we are trying to answer? All of our subsequent decisions are aimed at best answering that main question.

For all studies, we must select the sample of material to examine, which is often more difficult than it sounds.

Sample design

Choosing a sample involves both a focus on thoroughness, representativeness and the purely practical question of availability. Depending on the scope and goals of a project, we might code every story in a given news outlet, or we might use various search techniques to find specific articles relevant to our analysis.

In recent years, the center has conducted analyses of many types of outlets, including print, online news sites, network TV programs, cable TV programs, radio broadcasts, blogs and tweets.

Sometimes, the research question is focused on understanding the scope of coverage of a particular news outlet. In these cases we generally code every story – or at least a large, representative sample of stories – contained within that outlet during a given time period. Other times, the research question is focused on the way a particular news topic, event or issue is covered. In those cases, we generally use various search techniques to collect the relevant materials.

For example, if we want to study how newspapers have covered a certain topic, our researchers would use a broad set of keyword searches to collect potential stories from a given database such as LexisNexis. For websites, the same terms used for newspapers would be searched using Google News or the sites’ own archival functions. And for cable and network television programs, we often search closed-captioning text.

In the case of a study looking for coverage of same-sex marriage, the Boolean search used in the various databases might look like:

(marriage AND same-sex) OR (marriage AND “same sex”) OR (marriage AND traditional) OR (marriage AND gay) OR (marriage AND equality) OR (marriage AND homosexual) OR (marriage AND defense) OR DOMA OR (gay AND rights) OR “proposition 8″ OR “prop 8″

Training and intercoder testing

Prior to any actual coding, researchers create a detailed codebook for each project, which lists the variables to be used and the specific rules that apply to the content in question.

For an example of the specificity of these rules, consider studies that measure the tone of news coverage toward a particular person or topic. This involves assessing the way in which a story’s content is constructed via use of quotes, assertions or innuendo, which results in supportive, neutral or negative coverage. To measure tone, coders tally up all assertions that are clearly in support or opposition. Direct and indirect quotes are counted along with assertions made by journalists themselves.

Because we are interested primarily in measuring stories that are clearly emphasizing one point of view over the other, we set a high bar: In this case, for a story to be coded as either “supportive” or “opposing,” it must have twice as many of one kind of comment as the other.

All coders go through training so that the results are consistent. Training often lasts two to three weeks, depending on the difficulty of the coding.

Once the set of stories is identified and training has been completed, the  team of experienced coders begins analyzing each story according to the rules established during training, which have been fine-tuned over years of work. To add a further layer of reliability assurance, the content is divided among coders to ensure coders with a diversity of experiences interact with as wide a range of the content as possible.

In order to ensure reliability and consistency, we perform intercoder tests for all coders involved in a project. To do this, we choose a random selection of stories from the overall sample (usually 5%-7%) and have each person code them independently. We then compare the results to get the rates of agreement for all variables. The center follows rigorous academic standards, and we only publish variables where we have rates of agreement of 80% or higher (in most cases, much higher). If we test variables and find rates of less than 80%, we continue training, clarifying rules and/or revising the variable until we reach that level in subsequent tests.