This study examined anonymized cellphone audience behavioral data collected by the web analytics firm Parse.ly through the lens of engaged time spent with long- versus short-form news articles. The data stem from activity on a mix of general and niche websites that are clients of Parse.ly.11
Parse.ly is a web analytics firm launched in 2012 that provides real-time and historical analytics to a broad mix of digital publishers, including over 170 top media companies.
All research was conducted on the original, raw data file by Pew Research Center staff. Pew Research Center retained control over editorial decisions, but consulted with Parse.ly data scientists to ensure the data were interpreted correctly. The report was produced in association with the John S. and James L. Knight Foundation.
The final data set consisted of all articles published across 30 news websites between April and September 2015, which met a minimum threshold of page views in September 2015 (see more on threshold levels below). The final dataset includes 74,840 articles accessed at least once by 71 million visitors. Roughly a quarter of the articles, 24%, are 1,000 words or more and are thus considered long-form. No site accounted for more than 20% of articles.
Working with large, organic datasets requires, at the outset, critical and often complex structural and methodological decisions, as well as a major time investment in data organization and data cleaning. This is particularly true for data that have been collected for other purposes. Researchers – who are coming to the data secondhand – must develop an in-depth understanding of how the data were collected, recorded and structured; determine how to identify and treat outliers; and decide how to structure the data for the respective analysis.
The steps and data decisions made to arrive at the final data set and analysis are described below.
Sample selection and data structure
While Parse.ly represents more than 170 media companies, this study focuses on the subset of 30 news sites that researchers identified as being U.S.-based, non-local sites that produce original political or general interest news content.12 In the design phase of this project, Parse.ly supplied us with an initial list of their clients that they classify as news publishers. We further narrowed this list to a mix of 30 general and niche news sites across topic areas. To do so, researchers visited each site to determine if they were based in the U.S., had a geographically broad audience (were not a local site) and published original political, general interest, or topic-specific news content. While the 30 news websites studied here cannot be said to be fully representative of all news organizations or of all digital news users, they are a mix of general and subject-specific news sites; legacy media and digital-only news entities; and of sites with large national audiences as well as those with a smaller niche user base.
Engaged time measurement
The engaged time metric functions particularly well as an indicator of actual time spent reading an article on small screens, where the smaller amount of text that can be displayed at one time and the tactile nature of the device can lead to more frequent interactions. Additionally, in recent years, the use of mobile phones for mobile news consumption has grown dramatically. Earlier research by the Center found that 39 of the top 50 digital news websites now have more traffic to their sites and associated applications coming from mobile devices than from desktop computers. And, as of late 2015, about seven-in-ten American adults (68%) owned a smartphone. Moreover, 68% of smartphone users reported getting breaking news there as of April 2015. Accordingly, this study focused in on cellphone interactions with news.
Parse.ly’s software can group all webpages within an article together. This is particularly helpful in analyzing long-form articles, which are often spread across several webpages. Thus, the data for this analysis is examined at the article level, regardless of the number of pages an article might contain.
Time spent watching video news content, or listening to audio content, could be included if it met the minimum word threshold of 101 words and the user somehow activated the screen through a touch or a scroll before the 5.5 second cutoff. It is likely, though, that in most cases a user would hit that 5.5 seconds of inactivity, therefore pausing the engaged time metric. Thus, most of the measures here tell us more about time reading than time spent watching or listening to news.
The data we analyzed was collected by Parse.ly using first-party cookies, which track a user within a web site and on a particular device.13
For this analysis, then, unique user behavior is identified within a website, on a specific cell phone.
Upon investigation of the data, we took several steps to obtain the most reliable representation of user behavior. Several initial steps were taken on the entire corpus of data:
Removed short visits
In conversations with Parse.ly, we determined that visits in which the user was only on the page for 5 seconds or less were not valid visits. These were more likely page refreshes or people clicking on an article erroneously and then closing it. Accordingly, we limited this analysis to visits longer than 5 seconds.14
Removed overseas visits
In this analysis, we were interested only in visits by individuals in the U.S. Parse.ly uses IP-based geolocation using technology from MaxMind, which MaxMind estimates to be 99.8% accurate at the country level.
Removed articles of 100 words or less
After inspecting a sample of articles, we determined that articles with fewer than 100 words were often just photo captions, headlines or teasers of longer stories. Furthermore, articles this length require such a short time for the average user to read (really just scan) that we determined it would not provide an accurate engagement metric. We therefore removed all articles with a word count below 100 words.
Created a minimum traffic threshold
Another way to reduce outliers and ensure statistical power is to include articles that had at least a minimum number of sessions. After internal testing and consultation with Parse.ly, we created the following thresholds: For articles that are between 101 and 999 words, the minimum threshold was 100 views on any device, while for longer articles the minimum threshold was 25 views on any device. In addition, because this analysis focuses on cellphone activity, articles needed to have at least one cellphone view. Subsequent analysis revealed that there were negligible differences in the rate of complete interactions per article.
Limited to recent articles
Due to logging issues in early 2015, we included only news stories published on or after April 1, 2015, comprising six months of these articles.
The data are analyzed at the level of complete interaction (unless otherwise specified) which comprises the sum of all time a user spends with an article, across all sessions. A user is defined as a unique cellphone connecting to the site. A user can visit an article over multiple sessions, which are defined as all activity between a user and a site until 30 minutes of inactivity occurs.
Most analyses here investigate the total engaged time according to an independent variable, such as word length, referral type or time of day. Because the large dataset leads to statistical significance even when the differences between tested groups are small, we only highlight findings in which there is at least a 5% difference between these groups.
Researchers conducted the analysis using two calculations: median and mean. The mean, or simply average, is calculated by taking all visitors’ engaged time, adding it together and then dividing by the number of visitors. Sometimes, by that accounting, extreme values of engaged time or length of the article might heavily influence the numbers. The median examines all the engaged time numbers in the corpus and identifies which one is most typical or falls in the middle (the middle value). It is often considered a better indicator of central tendency when there are extremely high or low observations in the distribution. After removing anomalous data, the differences between the mean and median did not alter the pattern of findings from the data. Therefore, throughout the report, researchers use the mean as the principal statistic for engaged time.
In a few analyses in which complete interactions are used, some qualifications need to be made.
For referral type, researchers used the pathway through which the user first visited the article, even if that individual used an alternate means of reaching the article in subsequent sessions. Accordingly, if a user first finds an article through a social media link, we sum the time spent over all visits as coming from social – as that is how the user first discovers the article, and the framing contained in the text surrounding the link shapes their view of and interest in the article.
For article lifespan, we similarly used the characteristics of the first visit in our analysis. In this case, we examine the number of days between the article publication date and when a user first visits that article. This is therefore a metric of how long an article continues to attract new visitors.
For time of day analysis, researchers included all interactions that occurred during the daypart studied. For example, if a user first visits an article during the morning, and then later in the afternoon, the engaged time for the first session would be included in the morning category and the engaged time for the latter session would be included in analysis of the afternoon session. If there are multiple sessions within the morning, the engaged time across those sessions would be summed to represent the total time that users spent with that article within that time period.
Before estimating any time-based metrics, however, we had to determine the local time at the user’s location, as the data are recorded in UTC/GMT time. For this, we identified the appropriate time zone from the user’s state or zip code (for users in the 14 states with multiple time zones). Parse.ly uses technology from MaxMind, which estimates that their IP-based geolocation is accurate at 90% on a state level. At the postal level, this software identifies the user within 50 km of their location with 84% accuracy.15
Return visitors are identified as those who returned to the same article across multiple sessions. Due to the difficulty in identifying when a visit begins and when it ends in some cases, we treat a visit as a return visit if it begins at least 60 minutes after the start of the preceding session. This ensures that we do not overestimate return visits.
Finally, we coded a sample of long-form articles for topic. To do so, we first randomly sampled 17% (3,118) of all long-form articles after data cleaning. Researchers then developed a codebook with 11 categories, listed below. Two of those, weather/traffic and miscellaneous, were removed from the final analysis because few articles fell into these categories. After training on this codebook, three coders tested the codebook by coding a sample of 360 stories (12% of the corpus) by visiting the associated URL and reading the article content. Through this process, the three pairs of coders achieved the following Cohen’s Kappa:
- Coder 1 and coder 2: 0.76
- Coder 1 and coder 3: 0.78
- Coder 2 and coder 3: 0.78
The final number of stories in each category were as follows:
- Entertainment or lifestyle: 1,160 articles
- Science & technology: 434
- S. Politics/government: 369
- Domestic affairs: 337
- Economics/business: 304
- Foreign politics: 266
- Sports: 115
- Defense/military: 65
- Crime incident: 45
- Miscellaneous/other: 15
- Weather/traffic: 8