Comparing COVID-19 Mortality Rates Over Time By Country

As COVID-19 spreads its deadly effects around the world, many data analysts are struggling to track these effects in useful ways. Some attempts work better than others, however. Comparing these effects among various countries is particularly challenging. Some attempts that I’ve seen are confusing and difficult to read, even for statisticians. Here’s an example that was brought to my attention recently by a statistician who found it less than ideal:

I believe that the objectives of displays like this can be achieved in simpler, more accessible ways.

Before proposing an approach that works better, let’s acknowledge that country comparisons of deaths from COVID-19 are fraught with data problems that will never be remedied by any form of display. Even here in the United States, many deaths due to COVID-19 are never recorded. If someone with COVID-19 suffers from pneumonia as a result and then dies, what gets recorded as the cause on the death certificate: COVID-19 or pneumonia? Clear procedures aren’t currently in place. Medical personnel are focused on saving lives more than recording data in a particular way, which is understandable. This problem is no doubt occurring in every country. The integrity of the data from country to country differs to a significant degree and does so for many reasons. It’s important to recognize whenever we display this data that country comparisons will never be entirely reliable. Nevertheless, working with the best data that’s available, we must do what we can to make sense of it.

If we want to compare the number of deaths due to COVID-19 per country, both in terms of magnitudes and patterns of change over time, the following design choices seem appropriate:

  1. Assuming that we want to understand the proportional impact on countries, use a ratio such as the number of deaths per 1 million people rather than the raw number of deaths, to adjust for population differences.
  2. Aggregate the data to weekly values to eliminate the noise of day-to-day variation.
  3. Use rolling time (i.e., week 1 consists of days 1 through 7, week 2 consists of days 8 through 14, etc.) rather than calendar time, beginning with the date on which the first death occurred in each country.

The following line graph exhibits these design choices. To keep things simple for the purpose of illustrating this approach, I’ve included four countries only: the U.S., China, Italy, and Canada. Also, for the sake of convenience, I’ve relied on the most readily available data that I could find, which comes from www.ourworldindata.org.

Most people in the general public could make sense of this graph with only a little explanation. It’s important to recognize, however, that no single graph can represent the data in all the ways that are needed to make sense of the situation. Perhaps the biggest problem with this graph is the fact that the number of weekly deaths per 1 million people per country varies so much in magnitude, ranging from over 90 at the high end in Italy to less than 1 at its peak in China, the blue line representing China appears almost flat as it hugs the bottom of the graph, which makes its pattern of change unreadable. Assuming that the number of deaths in China is accurate (not a valid assumption for any country), this tells us that COVID-19 has had relatively little effect on China overall. The immensity of China in both population and geographical space is reflected in this low mortality rate. The picture would look much different if we considered Wuhan Province alone.

Obviously, if we want to compare the patterns of change among these countries more easily, regardless of magnitude, we must solve this scaling problem. Some data analysts attempt to do this by using a logarithmic scale, but this isn’t appropriate for the general public because few people understand logarithmic scales and their effects on data. Another approach is to complement the graph above with a series of separate graphs, one per country, that have been independently scaled to more clearly feature the patterns of change. Here’s the same graph above, complemented in this manner:

With this combination of graphs, there is now more that we can see. For instance, the pattern of change in China is now clearly represented. Notice how similar the patterns in China and Italy are. From weeks 1 through 7, which is all that’s reflected in Italy so far, the patterns are almost identical. Will their trajectories continue to match as time goes on? Time will tell. Notice also the subtle differences in the patterns of change in the U.S. versus Canada. In the beginning, mortality increased in Canada at a faster rate but started to decrease from the fourth to fifth week while the pattern in the U.S. does not yet exhibit a decrease as of the sixth week. Will mortality in the U.S. exhibit a decline by week 7 similar to China and Italy? When another complete week’s worth of data is added to the U.S. graph, we’ll be able to tell.

Clearly, there are many valid and useful ways to display this data. I propose this simple set of graphs as one of them.

9 Comments on “Comparing COVID-19 Mortality Rates Over Time By Country”


By Jason. April 13th, 2020 at 10:46 pm

I don’t think that the ratio by population is sensible in this case, because the growth rate doesn’t depend on the population until the late stages.

By Stephen Few. April 14th, 2020 at 12:18 am

Jason,

Please explain what you mean by “the growth rate doesn’t depend on the population until the late stages.” Whether we display the raw number of deaths or the number of deaths per 1 million people, the growth rate is the same throughout the entire period regardless of the population. In other words, the growth rate doesn’t depend on the population at any point. Have I misunderstood you?

By Bobby. April 17th, 2020 at 12:19 am

Thanks for this Stephen. Your “solution” was simpler and more understandable. It will probably get clunkier with more countries i none graph but then that is where selectivity would come in.

By Brent. April 24th, 2020 at 10:06 am

My first I thought as I scanned the original graph was, “Wow, something strange is happening in the U.S.!” Then I looked at the axis labels for a minute and thought, like John Lennon, “The more I see the less I know for sure”. Graphs like this strongly influence me (at least) to assume it’s showing amount (x-axis) over time (y-axis). (If the original *does* do that the author/designer would need to sit down with me and explain how.)

Regarding your final revised set I was initially concerned about the varying y-axis scales on the 4 breakouts, but context provided by the main chart offsets that, (and that allowing various x-axis scale was the reason for the breakouts!).

I have a half-baked philosophical musing that analysts suffer from understanding their data too much having traveled down a long and winding road to gather it, and when it comes to communicating their findings, forget that the reader (many times) has not. The designer can serve the analyst (and ultimately the reader) by bringing to the design consult “the value of ignorance” and stand between the analyst down in the hollow and the reader up on the ridge. Maybe that would have helped the the folks at JHU.

Thanks, Stephen, for providing this excellent resource.

By Stephen Few. April 24th, 2020 at 10:47 am

Hi Brent,

In the original graph, each data point represents aggregate values for a given week. Because the X-axis scale is measuring cumulative mortality, which by definition must either grow or remain constant from week to week, the sequence of data points from left to right matches the chronological sequence of weeks. However, unlike a time scale in weekly intervals along the X-axis, which would space successive weeks equally from one another, the cumulative mortality scale varies in distance from one data point to the next. A general audience would struggle to understand this, but most statisticians would interpret it correctly, given time to figure it out.

By Brent. April 24th, 2020 at 10:59 am

Thank you, Stephen. As a designer not a statistician I’ll consider it a success if I grasp that by the end of the day.

By Stephen Few. April 24th, 2020 at 11:02 am

Brent,

You and I are in the same boat. I’m not a statistican by training either.

By Blake. May 1st, 2020 at 6:31 pm

Great breakdown; well done!

I’ve seen some attempts to estimate overall Covid-19-related mortality rates by looking at the relative uptick in overall death rates.

By Stephen Few. May 1st, 2020 at 8:05 pm

Blake,

What did you think of those attempts?

Leave a Reply