What we learned from creating a custom graphics package in R using ggplot2

Pew Research Center illustration

Creating informative and digestible data visualizations is a foundational aspect of Pew Research Center’s work. Traditionally, our report graphics have been generated by individual researchers using a custom Excel template developed by our design team, or on an ad hoc basis by graphic designers using professional tools and mock-up designs.

In recent years, many of our researchers have expressed an interest in using R as an integrated environment for data analysis and graphics development. R allows users to generate iterative graphics directly from analysis scripts, which is helpful for quickly evaluating different ways to visualize results and findings.

However, the standard graphics libraries in R look quite different from Pew Research Center’s in-house style. As a result, our R users typically had two options: produce graphics directly in R that would need extensive reworking prior to publication or leave the R environment entirely in order to create stylistically appropriate visualizations.

To address this, some of our researchers started developing custom versions of Pew-style graphics using ggplot2, a complex yet highly customizable data visualization tool in the tidyverse suite for R. But with multiple researchers developing their own solutions — often in personal folders, and by copy-and-pasting from other projects — errors and inconsistencies accumulated.

In an effort to solve some of these issues, we set out to develop a Pew Research Center-specific custom R package containing all of the standard graphics functions we use in our work in one easy-to-install package — similar to what a number of other research and journalism organizations have done. Our main goal was to create a package that allows researchers to quickly iterate between different data visualizations that are consistent with Pew style.

Our package, which we refer to internally as “pewplots,” is a ggplot2 wrapper made up of customized versions of existing ggplot2 functions. In practice, this means that ggplot2 plotting functions are wrapped within pewplots functions, and code is added or modified to extend the functionality of the original plotting function.

In addition to ensuring that every R developer is using the same official style, we also hoped this package would make data visualization in R more user-friendly for those who are newer to programming or coding. Making wrappers for ggplot2 functions does require a fair amount of technical know-how. But once complete, wrapper functions like this take much of the complexity off of individual users’ plates and make plot generation standardized, faster and more accessible.

We developed and continue to manage the code with the help of GitHub and Nexus repositories, which allow us to work collaboratively and make the package directly accessible to any R user on our internal server.

But beyond the technical details, what are the pros and cons of an effort like this? From our experience, we have a few takeaways and tips if you want to try something like this yourself.

Things to consider before you begin, and tips for development

These are some decisions we made early in the process that helped keep us on track, as well as some more general tips for development. We’d recommend spending some time on these issues before you start — as it’ll help you save time and stress down the line — and keeping them in mind throughout the development process.

Determine if a custom graphics package is the right solution for you. The biggest question in this process is probably whether the time needed to put together a custom graphics package is worth it. Ask yourself why having custom plotting functions in a package would be useful to you or your organization.

If you’re primarily working with your data in R, it can be a hassle to switch between R and some other graphics generation program (Excel, Tableau, etc.). But if you’re able to make your graphics directly in R, there’s no need to worry about exporting data, making sure that data is always up to date, or switching between applications. Also, if there’s a specific style guide for your graphics that you’re trying to follow, existing ggplot2 graphics are unlikely to tick all the boxes you need. If this sounds like your circumstances, a custom package might be a good option for you.

A custom package might also be a good solution for you if you’re interested in automation. Having plots and plot features built into custom functions automates a lot of nuance and details in specifying plot aesthetics and alignment. Instead of having the same 20 lines of code copied and pasted, with numerous detail adjustments in your analysis 10 separate times, you can use functions to store most of the redundant information. It’s also useful in case you have multiple people making graphics and you want to standardize those graphics across users.

Establish a baseline for what you want it to accomplish. Do you want only a theme, or do you want fully customized plot types? How much detail do you want to be default behavior, as opposed to the responsibility of the user?

One of the most important steps in the process is to figure out what features you need to include so your custom graphics package does what its users need and want it to do. At some point, it’s necessary to settle on a set of features you want to include to give yourself a goal, as well as a stopping point. There will probably always be more features you can add, but if you don’t determine the baseline of what you want to include, you might find yourself adding features but working toward no specific goal.

Decide if you want an internal or external release. The main question you want to ask here is whether you want to commit to code maintenance and development for external users. At Pew Research Center, we decided to keep our package for internal use only. It’s extremely beneficial to our workflows but isn’t necessarily something we’re interested in maintaining for external use, as we do with some of our other packages. That decision influenced how we designed our vignettes and other documentation, as well as the naming conventions used for many of the functions and colors/color palettes.

Get feedback from the people who will be using the package. While certain things may make sense to you as the developer, the most important thing is that the people that you’re making the package for are able to use it. User feedback is important for any product or service, and this is no different. Also, getting feedback from people who aren’t part of the development process can bring in new perspectives and help you catch things that you, as the developer, may not have noticed or considered. In our development process, it was especially important to get feedback from our design director to make sure that our graphics were up to par. Asking end users to test out the package allowed us to catch missing features and unclear documentation.

Test a lot. If you want your package to be flexible and versatile, you should test it on as many different data sources as possible. You might not even notice a feature is missing or broken until you’ve tested it on a specific data source or trying to make a specific kind of plot.

Challenges and pitfalls to watch for

Despite the benefits we’ve seen from creating our custom graphics package, our effort has not been without challenges.

Most significantly, we realized that an effort of this scale takes a lot of time and work. We developed v1.0.0 of pewplots over a period of around six months of consistent work. There’s also a relatively high barrier of entry. Just to begin, you need a fairly high level of general R knowledge, as well as knowledge about the tidyverse and ggplot2 more specifically.

In our case, it was also necessary for developers and researchers to be familiar with the ins and outs of Pew Research Center’s official style guide for graphics. In addition, the process required a decent understanding of the graphics that different teams within the Center make, including the different features and customizations those teams require.

All of that is to say that there is a gap between generating R graphics that complement our internal analysis and workflows and generating R graphics that are publication-ready. There are also still countless features we could add in terms of tweaks and customizations. One key consideration in the future evolution of our internal graphics package is distinguishing between features that we need to have — as opposed to ones that are nice to have — and generally preventing “scope creep” as we move forward.

More from Decoded

MORe from decoded

Catgeories

To browse all of Pew Research Center findings and data by topic, visit pewresearch.org

About Decoded