1. Digital readership data and what it can tell us

By Amy Mitchell, Galen Stocking and Katerina Eva Matsa

Table of Contents

One of the less high-profile benefits of the news media’s move to online content is the fact that it has provided media researchers with an important tool: Readers now leave a digital footprint, meaning that researchers no longer need to rely only on a reader’s self-reports of what they saw, heard or read that day. A typical news website, for example, can tell you how many people read any specific article, at what times of day, how long they spent there and from where in the digital world they arrived. Amid this avalanche of news data, audience metric companies arrived on the scene to help individual organizations make sense of their own data and to provide a consistent way of measuring and interpreting behavior across numerous websites. One of these companies is Parse.ly, which agreed to share anonymized September 2015 data from 30 of their diverse news media clients with Pew Research Center. With this data, the Center sought to help answer a specific question: Are mobile readers – those accessing stories on their smartphone – actually going to and spending time with the long-form journalism available on these news websites, or is short and sweet the hallmark of the new news?

Before delving deeply into the findings, it is worth acknowledging and laying out the specific metrics involved, what each represents and how we use them in this analysis.

The 30 news websites studied here, while not fully representative of all news organizations or of all digital news users, include a mix of general and subject-specific news sites, news organizations that have non-digital arms (such as a print newspaper), digital-only news entities and sites with large national audiences, as well as those with smaller, niche user bases. The study analyzed all articles longer than 100 words that were published on these sites and met a minimum threshold of page views in September 2015. ² This resulted in 74,840 articles accessed by 71 million visitors on cellphones. Roughly a quarter of the articles, 24%, contain 1,000 words or more and are thus considered long-form for the purposes of this project.

These data include measures for unique visitors, sessions with an article and complete interactions with a given article, all key terms that are defined below.

The unique visitor metric (also referred to simply as visitor) reflects the total number of individuals that visited a particular website, as identified by first-party cookies, on a particular device during the month studied.³ This means that even if an individual visited multiple articles on a website, they are counted once.

The article count represents the full article, regardless of how many different pages that article might include.

A session rolls up all of the visits to a news article by an individual until the user hits 30 minutes of inactive time. Thus, if a user visits one article, pauses for 10 minutes and then comes back, that is still considered one session.

The complete interaction consists of all these different sessions with an article at the user level. Most of this analysis is based on the complete interaction with an article, i.e. all of a unique visitor’s sessions with one article on a cellphone. ⁴ In some cases, such as time of the day, researchers looked at the combined sessions for a particular daypart. For example, if a user read an article over multiple sessions during the morning, these sessions were combined to analyze the activity that took place within that specific daypart.

Another central metric in this analysis is engaged time. This captures how much time a user spends with content by tracking cursor movement, clicking and scrolling. The metric enables us to better understand the complexities of users’ behaviors with different types of news stories under different circumstances, such as time of day. Researchers conducted the analysis using two calculations: median and mean. The mean, or average, is calculated by taking all visitors’ engaged time, adding it together and then dividing by the number of visitors. By that accounting, extreme values of engaged time or length of the article might heavily influence the numbers. For that reason, we also looked at the median, which examines all the engaged time numbers in our data corpus and identifies which one is most typical or falls in the middle (the middle value). In some cases, the median is a better indicator of central tendency, such as when there are extremely high or low observations in the distribution. For this report, researchers made a number of methodological decisions to improve the structure of the data. What was found is that the differences between the mean and the median did not alter the pattern and findings of the data. Therefore, throughout the report, researchers are using the mean as the principal statistic for the analysis.

Next: Analysis suggests some readers’ willingness to dig into long-form news on cellphones

← Prev Page

123 4 5 6…13

For short-form articles, the minimum threshold was 100 views on any device and with at least one cellphone view; the respective number for long-form was 25.↩
Parse.ly also tracks third-party cookies, which track users across websites. But many users block third-party cookies (and indeed many modern browsers, including mobile Safari, block them by default). Thus, they are not used for analysis. For more information, see this report’s methodology.↩
Parse.ly defines the user session as the interaction with an article over an indefinite time period that expires when there has been no activity for 30 minutes. This includes visits to multiple pages within the article, as well as any instances in which the user visits the page, leaves it and returns within 30 minutes of his or her last activity.↩