Sites Studied

PEJ analyzed the components of the campaign Web sites of all nineteen candidates running for president in 2008.[1] They are (in alphabetical order with party affiliation in parentheses):

Joe Biden (D)

Sam Brownback (R)

Hillary Clinton (D)

John Cox (R)

Christopher Dodd (R)

John Edwards (D)

James Gilmore (R)

Rudolph Giuliani (R)

Mike Gravel (D)

Mike Huckabee (R)

Duncan Hunter (R)

Dennis Kucinich (D)

John McCain (R)

Barack Obama (D)

Ron Paul (R)

Bill Richardson (D)

Mitt Romney (R)

Tom Tancredo (R)

Tommy Thompson (R)

In addition to the Web sites as a whole, PEJ conducted a text analysis of each candidate’s biography section.
At each download, coders made an electronic copy of the homepage, as some Web sites were not printer-friendly. Biographies were copied and saved as Word documents.

Capture Timing

Web sites were initially coded on May 8, 2007. To compare results for accuracy and to make note of significant changes made to the Web sites over the course of a month, each Web site was tracked for a month and then coded again on June 12, 2007. There were few changes, and where they appeared, results reflect the site as of June 12.

For three specific variables–blog posts, mainstream media articles (newsroom) and press releases–we measured the frequency of updates in the following manner: We looked at the total number of items in each section's archive (present on all 19 campaign Web sites) for the week previous to our coding day. Thus, we looked at April 30-May 4 when we coded on May 8, 2007 and June 4-June 8 when we coded on June 12, 2007. We then calculated the average for the week.

The biography sections were downloaded and coded on May 8, 2007. They were also re-read during our second round of coding on June 12, 2007. The only change to a biography occurred on Bill Richardson’s Web site—the biography was expanded into seven sections from two.

Coding Scheme and Procedure

Web site Analysis: To create the coding scheme, we first worked to identify the different kinds of features available on a campaign Web site. These ranged from tools to organize fundraisers to candidate positions on the issues. After several weeks of exploratory research, we identified 22 different quantitative measures and developed those into a working codebook.
Coding was performed at the PEJ by two professional research analysts. They were trained on a standardized codebook that contained a dictionary of coding variables, operational definitions, detailed instructions and examples. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and monitor consistency of coding. Where necessary, additional Web site captures took place to verify findings. Coders followed a series of standardized rules for coding and quantifying Web site features.

Certain variables merit an explanation of their working definition as applied in the coding scheme:

Newsroom—the section on the site that lists articles not authored by the campaign. These are predominantly articles about the candidate that appear in the mainstream media (including editorials) and appear as either links to an external site or the article as a whole with the source. If these sections included press releases by the campaign, they were counted as a separate variable. We coded for presence of a newsroom section, and also coded the total number of items in the archived section for the previous week.
Official campaign blog—in counting a blog as a campaign blog, we coded only those authored and/or specifically endorsed by the candidate’s campaign. Grassroots and/or community blogs and blogrolls were not counted here. We coded for the presence of an official campaign blog, and also coded the total number of blog posts in the archive for the previous week.
User comments on campaign blogs—this variable identified whether campaign blogs permitted space for users to add their comments to the official campaign blog posts. We coded for presence of comments on the blog.
Citizen-initiated blogs—in addition to the official campaign blog, several candidates provided a tool for users to establish their own blog to show their support for the candidate. These were coded as citizen-initiated blogs. We coded for the presence of citizen initiated blogs.
Site customization—this variable looked at whether a visitor could tailor the homepage/ Web site based on their personal preferences. This feature always required users to register, and included the ability to create and access personal profiles, personal messages, personal blogs and more.
Spanish language translation—we also coded for whether or not campaign Web sites offered an option to translate content into a second language. Spanish was the only language offered for such translations at the time of publication. It should be noted that this did not necessarily include a translation of all the content on the Web site.
Grassroots activity—we coded for presence of options for grassroots activity. This variable had three dimensions: fundraising, organizing community events and voter registration information. We coded for presence of all three dimensions.
Social Networking—we coded for the presence of “social networks” and also the number of social networks that a candidate displayed on his/her Web site (on the homepage or elsewhere). They were embedded links that led the user to the candidates’ profiles on respective external social networking Web sites such as MySpace, Facebook, Flickr or YouTube.

Biography Page Key Word Search: The program CatPac® was used to analyze the candidate biographies. CatPac is a “self-organizing artificial neural network” that has been optimized for reading text (Doerfel and Barnett, 1999.) By assigning a neuron to each major word in the text, the program is able to identify the most important words in the text by measuring their frequency and co-occurrence.
CatPac also contains a default “exclude” file, which contains prepositions, articles, conjunctions and transitive verbs (such as ‘and’, ‘when’, ‘he’ etc.) that do not bear any meaning and produce clutter within the text. Thus, when the analysis is carried out, these words are excluded by the program, so that they don’t complicate the results.
Our sample consisted of the biographies of all 19 candidates. For candidates who divided their sections into more than one page, the text was combined into one text file, so that there was just one file per candidate. These were then fed into the CatPac program.


Rankings: In analyzing the data, there were two different areas in which we tallied results and ranked the various Web sites:

Engagement and Participation: To gauge how the campaign Web sites were engaging visitors PEJ assessed six variables: campaign blogs and whether they allowed comments, citizen initiated blogs, social networks and grassroots activity (fundraising, community events and voter registration).
On Demand Access: We looked at six variables again to help gauge how candidates disseminated their content: email updates/alerts, RSS (Real Simple Syndication) feeds, podcasts, mobile device delivery, search function and the ability to customize the Web site.

The PEJ research analysts produced an Excel template to tally the scores (summing the variables) for each Web site within the two categories. Thus for each of the two categories, each Web site had a final score on a scale ranging between one and six.

Key Word Usage: Our text-analysis of the candidate biographies had two inter-related components. The first was identifying and analyzing the most frequently used or top words in each biography. These were spontaneously generated by the program. The number of most frequent words analyzed for the study was set at 15. Indeed, according to the program, in most studies the first 1% of the total number of words is sufficient for text-analysis. In this case, 15 words were more than 1% of the total text analyzed.

The search was run for each candidate, and then for certain broader categories – Republicans, Democrats and all candidates combined. In our final analysis, we looked only at the top five words for individual candidates and the top five for the broader groups.

The second component was looking beyond these top words for 27 specific words important to the presidential campaign. These were chosen by the research team and were explicitly included irrespective of whether they actually occur in the candidate profiles.

These were (in alphabetical order): Bush, change, child, children, community, conservative, defend, ethics, faith, freedom, God, independent, leadership, liberal, moral, patriotism, progressive, protect, Reagan, renewal, safe, safety, strong, terrorism, traditional, values and Washington.

1. Republican Fred Thompson was not included in the study because he had not formally announced his candidacy, and only announced a preliminary intention to run at the end of May.