Numbers, Facts and Trends Shaping Your World

Q/A: How Pew Research mapped the conversations on Twitter

Lee Rainie, Director of the Pew Research Center’s Internet Project

Twitter, which didn’t exist eight years ago, now has 241 million monthly active users around the globe who collectively generate more than half a billion tweets each day. Getting a handle on that massive volume of messaging was the challenge facing researchers at the Pew Research Center’s Internet Project. The project’s new report, produced in collaboration with the Social Media Research Foundation, identifies six basic patterns that Twitter conversations fall into. But as Lee Rainie, the project’s director, notes, the report should be seen as a first effort at understanding the Twitterverse rather than as a definitive typology. He compares the years-long endeavor to 18th- and 19th-century explorers charting unknown lands.

In this Q&A, Rainie explains how the Twitter report came together and what it means.

Q: Why even attempt to map Twitter conversations? What’s the value here?

A: For 14 years the Internet Project has been documenting the rise of networked society, and the use of social media within that networked society. Information has a highly social context and a highly social structure. This is a way to gain insight into flows of information within the context of social relationships. Over time we’ll develop more understanding of what these structures mean.

Social media is its own terrain, and one possible expectation is that we can create an atlas of this new terrain. People who use social media can use these maps to figure out where they stand. If you’re a political actor, you can ask whether a polarized structure is the best one for your cause or candidate. If you’re a business actor, you might say “Look at all these isolated people talking about me but not with anybody else – maybe it would serve me well to try to build a community around my brand.”

Q: How did you decide which conversations and topics to map? Did you look at everything above a certain critical mass of tweets, pick what seemed to reflect a range of different topics, or just do everything that you could identify within a given timeframe?

A: There was no grand pattern or system to the topics we studied. We often built maps around news stories or political controversies. We also built maps around conferences at which we were speaking — it’s fun to open your presentation sometimes with a map of the people you’re talking to. Sometimes maps were built for meetings that we were going to attend and we would look at the subjects that might be interesting to those attending the meeting. Several of the scholars got their college students to build maps around topics that interested them.

We definitely did not try to systematically capture everything that had a critical mass. NodeXL, the analytical tool we used to collect the data and generate the visualizations, doesn’t work that way. You have to tell it what keywords and hashtags to look for. It was often the case that the maps with the smallest number of tweets were the most interesting. Those are real communities and there are tens of thousands of them in social media and they address the full spectrum of human interests from the goofiest, most trivial matters to things related to life and death and the meaning of the cosmos.

[application programming interface]

A: The Twitter API allows people limited access to tweets over a given time frame. At the beginning of the effort by the Social Media Research Foundation, a lot of tweets were made available through the API. Over time, Twitter has changed the terms of engagement with the API, sometimes narrowing the range, sometimes expanding it. The decisions by the company come from an ongoing process of negotiation with developers and researchers who are interested in the platform.

Research heaven would be to have access to the firehose and the archive — all the tweets ever issued on any subject — but that would require a lot of effort and expense by the company. Interestingly enough, the company announced a special “data grants” program last week for select institutions, and Pew Research is exploring how we might take advantage of that.

Q: What surprised you the most about the conversational archetypes you found?

Well, other social-media analysts had already identified the Polarized Crowd structure, so we were certainly expecting to find that. But we didn’t know how many other structures we would find. I think that if we’d taken bets beforehand, I’d have bet there would have been more than the five additional ones we identified.

Also, there’s been a ton of commentary about how social media disrupts everything and everything is new. But one thing you see in the Brand Clusters and Broadcast Network structures is that big, important corporations and people, like Apple or Paul Krugman, still have enormous influence in this environment.

Q: Has anyone done this sort of research before?

A: Researchers have mapped wikis and message boards before, but on this scale it’s pretty much new ground. The historical analogy is Darwin going to the Galapagos Islands, looking at different plants and animals and trying to figure out how they fit together, or a 20th-century astronomer using a new telescope to look up at the heavens and see farther than ever before. This work is strictly observational – we’re making no claims about representativeness. All we’re saying is, “This is what we’ve found – there are at least these six distinct conversational types.” We are putting it out and saying, “Take a look, use the tools and see what you can come up with.”

Q: From this work do you have any thoughts about whether Twitter reflects social and conversational patterns that people already have in the offline world, or does the Twitter platform itself shape those patterns?

A: Probably a little of both. The structures probably have existed before, but now we have the tools with which to see them. There have always been polarized conversations tight communities, but social media platforms like Twitter give us ways to explore the conversations in new ways. Twitter affords certain kinds of conversations and communities. If we did this sort of analysis on Facebook, there probably would be somewhat different structures.

Also, the patterns may vary among different societies. For instance, there are a couple of researchers in Korea who’ve done some similar work, and they’ve found a “barbell” structure to their polarized conversations, where there are more people talking to each of the partisan camps, but there are links and bridges between the camps, so there is a thick batch of links between the clusters. We haven’t seen that in the work that has been done so far in the United States.

Q: So this isn’t intended to be the last word on the social geography Twitter.

A: Not at all. We hope that researchers will do what scientists do – fact check us, explore the data themselves and come up with new findings. We’ve laid down a marker and said there are six types, which means that there will be academics looking to make their scholarly marks by identifying a 7th, 8th or 9th type.

One of the most exciting things about NodeXL is that it’s simple enough to use that anyone who can make a pie chart from spreadsheet data can use it. A decade ago, you had to be a hardcore computer scientist with great programming skills to do this kind of analysis. There’s a gigantic site of maps  at NodeXL Graph Gallery where researchers are always putting up more findings and data. Anyone can just scroll around the site to see what’s there. I think there are more than 18,000 maps there now, and hundreds of thousands more that are in existence elsewhere.

But even though it is much simpler to do this kind of work now, there are challenges based on the limits of the Twitter API and what kind of data it allows you to access. There are also times when you look at a map and scratch your head about what’s going on. What does that hashtag mean? How did it get started? Why is this person or this institution at the center of a conversation? Why is that blog post link showing up in so many places? It often takes a topical expert and a lot of human probing to discern the meaning of the map.