News and Notes from PolicyViz - Issue #21
Hi everyone! Thanks for subscribing to the newsletter! I hope you had a relaxing, safe, and healthy summer.
All is well here in Northern Virginia. We've had a few days of cooler weather, which is always nice, especially while I sitting outside watching my son play baseball or umping little league games myself.
You'll see just a few changes in this newsletter. I'll still only send one or two each month so as to not clutter up your inbox. (If you want more regular dataviz content and references, check out the Winno community mentioned below).
Between finishing up my next book, travel, baseball, and an onslaught of regular work, I've been pretty bad at keeping up on blog posts and videos. I have a ton of things I have partly drafted or in the works, but just no time to finish. SO, what you'll see more of in this and future newsletters is more draft blog posts and links to unlisted/unfinished YouTube videos.
If you have thoughts or comments on any of the drafts I share, please feel free to reach out via email, Twitter DM, or Winno. Eventually, they'll make it onto PolicyViz.com.
With that, onto the first draft blog post.
Thanks for reading,
Jon
DRAFT BLOG POST: Avoiding the Dual Axis Chart
You’ve seen plenty of them, I’m sure. A line chart with two or more lines with two vertical axes. You read the title, the legend, the axis labels, and start contemplating the patterns. Suddenly, you realize that the metrics of the axis doesn’t match what the line is supposed to be showing. What’s going on? you wonder. Then, you notice another vertical axis on the other side of the graph. Your brow furrows and you start trying to parse the whole thing together.
Like others, I’m going to suggest you avoid dual axis charts. They are confusing, hard to read, and can be easily manipulated to suggest correlations when none exist.
I’m going make this argument by backing my way into a famous alternative to a dual line chart. To start, consider this dual-axis line chart that shows the number of auto fatalities (per 100,000 people) on the left axis and the number of miles driven per capita on the right axis in the United States from 1950 to 2011.
It’s not immediately obvious that the fatalities data are shown in the blue line and associated with the left axis, and miles driven per capita is the green line plotted along the right axis. The purpose of a graph like this is to show the decline in one series (i.e., auto fatalities) while a concurrent increase in the other (i.e., miles driven).
But there are three problems with plotting the data like this.
First, these graphs are often hard to read. Did you intuitively know which lines corresponded to which axis? I didn’t. Even if the labels and axes were colored to match the lines (which many dual-axis charts don’t include), it’s hard to discern patterns in the data. Overall, they’re extra work for the reader, especially when the labeling is not obvious.
Second, the gridlines may not match up. Notice how the horizontal gridlines in this graph are associated with the left axis, which leaves the numbers on the right axis floating in space. At the crossing point in 1989, it’s hard to see if the number of miles driven is closer to 8,000 or 9,000 because the gridlines are not lined up.
Third, and most importantly, the point where the lines cross becomes a focal point, even though it may have no real meaning. In the first version, our eye is drawn to the middle of the chart where the two lines intersect, because that’s where the most interesting thing is happening. But there’s nothing special about 1985 where the lines cross—it’s just a simple coincidence of the vertical axis scales. The intended takeaway of the chart is how the two series move in opposite directions, but that’s not what draws the eye.
Vertical Axis Ranges
The vertical axis in a line chart does not need to start at zero nor is there a distinct rule for the range of the vertical axis in our line charts. And by that logic, we could arbitrarily change the dimensions of each axis to make the lines cross wherever we like.
Each of these next two graphs are reasonable ways to set the vertical axes, and by manipulating those ranges, I can make the series look like they are closely matched for a few years at the beginning of the period and then diverge or look like they converge over the period. By arbitrarily choosing the axes range, we can make different data series look as correlated as we like.
And this is the core problem with dual-axis line charts: the chart creator can deliberately mislead readers about the relationship between the series.
Solutions to the Dual Axis Chart Problem
There are a few solutions to the dual-axis chart challenge.
First, try setting the separate line charts side by side. Not everything needs to be packed in a single graph. We can break things up and use a small multiples approach. Although ideally side-by-side graphs should have the same vertical axis range to facilitate easier comparisons, we’ve already determined that these data series are not on comparable ranges, so splitting them up and using different axis ranges can work.
If it’s important to annotate a specific point on the horizontal axis, you could also vertically arrange the two and draw a line across both. This will change the rotation of the final graphic, but may offer an easier way to label a specific value or year.
Second, we might calculate an index or the percent change from some value or year and plot the data to a single vertical axis. With this approach, the reader can see the change over time for both series and compare them along the same metric. The obvious trade-off is that we lose the level presentation of the data and instead present the change.
Third, try a different chart type. If showing the changes in the associations between the two series is important, try a connected scatterplot—a graph that is like a scatterplot with a horizontal and vertical axis, but each point represents a different unit of time, such as a quarter or a year.
The data I’ve used thus far were originally shown in perhaps one of the most famous connected scatterplots (aside from the Beveridge Curve, which economists know but that’s about it). Created by Hannah Fairfield at the New York Times in 2012, this Driving Safety, in Fits and Starts connected scatterplot shows how the two series we’ve been looking at so far moved over the 62-year period.
In this excellent presentation of the data, Hannah wrapped the graph around explanatory text positioned at the bottom-left of the page and labeled specific areas to denote important periods like the energy crisis in the early 1970s and air bags in the 1990s. (It’s also worth checking out Hannah’s 2010 connected scatterplot, Driving Shifts Into Reverse, that showed the relationship between the price of a gallon of gasoline and miles driven per capita.)
There is a caveat, here. I find that about 8 out of 10 times I ended up in one of two places with my connected scatterplots: either they are straight lines (e.g., program participation and program spending) or they are some kind of cluttered mess. Look what happens to the fatalities-miles connected scatterplot when we extend it through 2021--the 2011-2019 period is all over the place!
Zooming in on the most recent 20 years gives us a bit more insight—a decline in both metrics over the first 10 years and then a slight rise in both metrics by 2016. Thereafter, we see fatalities fall again just a bit before driving falls considerably in 2020 owing to the pandemic. Between 2020 and 2021, we see a recovery of driving and a considerable increase in auto fatalities—last year, Americans driving behavior was about what it was in 2008.
The Exceptions
There are always exceptions, right? For dual axis charts, I think there are three exceptions to consider.
First, when we are showing a translation of a single measure, for example Fahrenheit and Celsius temperatures. In these cases, we are not trying to track two different variables but showing how one maps directly onto another.
Second, a Pareto chart, in which (typically) vertical bars are showing individual values tagged to one vertical axis and a line is showing the sum of those values (either levels or percentages) on the other axis.
In both of these cases, I don’t think the usual pitfalls apply.
What does the research say?
There is not much research on how readers perceive and process dual axis charts.
In their 2011 study, A Study on Dual-Scale Data Charts, Petra Isenberg, Anastasia Bezerianos, Pierre Dragicevic and Jean-Daniel Fekete found that their study participants found the dual axis chart (or what the authors called the “superimposed chart”) “very confusing and demanding too much concentration or reflection.” Relative to other chart types, participants in the study “performed poorly both in terms of accuracy and time” when viewing dual axis charts.
There is some slightly more recent research on the effectiveness of connected scatterplots. In their 2014 paper, The Connected Scatterplot for Presenting Paired Time Series, Steve Haroz, Robert Kosara, and Steven Franconeri test how 14 study participants read and process different connected scatterplots. They conclude that the “low-complexity” versions of the connected scatterplot (like the one presented here) “can be understood with little explanation” and are useful at engaging readers over more traditional graph types—like the line chart.
Conclude
I hope I’ve demonstrated how dual axis charts can be troublesome. They can confuse your reader and they can be used to distort the presentation of the data. Try some of the alternatives listed above to make your data clearer and easier to read. In some cases, some of these alternatives—like the connected scatterplot—can be used to help your reader see how the patterns in your data are related to one another.
This post draws on Chapter 5 from my book, Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks. Thanks to Amy Cesal for suggesting I write, and reviewing, this post.
Episode #222: Richard Brath
Richard Brath is a long time visualization designer, researcher and strategist. At Uncharted Software, Richard focuses on the creation of high-value visual analytic applications that solve real-word problems in capital markets, supply chain and healt-care analytics. These solutions in use by hundreds of thousands of users around the world every day.
What I'm Reading & Watching
Books
Articles
Understanding Data Accessibility for People with Intellectual and Developmental Disabilities by Wu et al.
How we classify countries and people—and why it matters by Khan et al.
TV/Movies
The Bear--amazing! (Hulu)
Welcome to Wrexham (Hulu)
NHL Hockey season starts soon!!
Note: As an Amazon Associate I earn from qualifying purchases.
Join my Winno Community!
If you want to get some short, actionable dataviz advice, check out my new Winno community. I send about 2-3 text messages each week with some little pointers about dataviz. There is now a free tier! You get a fewer texts and giveaways, but it's a good way to test it out. If you like what you see, sign up for only $5/month. Your subscription helps support this newsletter and the podcast. I hope to see you there!