One of Bill Gates’ favorite graphs redesigned
This blog entry was written by Bryan Pierce of Perceptual Edge.
The following 3-D treemap was brought to our attention by a participant in our discussion forum.
This graph was selected by Bill Gates to be included in a recent edition of Wired Magazine that he guest edited. He explained why he included the graph as follows:
I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. In fact, fewer kids are dying, more kids are going to school and more diseases are on their way to being eliminated. But there remains much to do to cut down the deaths in that yellow block even more dramatically. We have the solutions. But we need to keep up the support where they’re being deployed, and pressure to get them into places where they’re desperately needed.
This is an important message and a noble goal. But how well does the graph above tell this story? Not very well, actually.
A treemap is a space-filling graph that uses the size of rectangles to encode one quantitative variable and color intensity to encode a second. This treemap was created by Thomas Porostocky to display worldwide years of life lost by cause using data from the University of Washington’s Institute for Health Metrics and Evaluation database.
Let’s see what we can learn from this graph. First, we notice that the green section representing injuries is significantly smaller than the other two, but the relative sizes of the other two sections are difficult to judge. Next, we see that the rectangles in the yellow section are mostly light yellow. If we check the color scale at the bottom it shows us that most of the diseases in that section are decreasing at an annual rate of between -2% and -3%. We can also see the names on the larger rectangles that represent the causes responsible for more years of life lost (e.g., Malaria), and get a sense of their relative sizes based on their areas, but again, we can’t compare them with any accuracy. Treemaps were invented by Ben Shneiderman as a means to display part-to-whole relationships between huge numbers of values; data sets that are too large to display using graphs that can be more easily and accurately read, such as bar graphs. Only with a huge set of values would it make sense to rely on the areas of rectangles and the intensities of their colors to represent values, given the fact that our brains cannot interpret these attributes of visual perception easily and accurately.
The 3-D effect that’s been added to the treemap doesn’t provide us any information and makes the treemap harder to decode. One problem introduced by this effect involves the darkened colors that appear on the sides of the treemap to represent shadows, which are meaningless and misleading. 3-D graphs are rarely a good idea, but this 3-D is completely gratuitous.
If a treemap had been the best way to show this information, it would have been better to separate the three major sections using borders rather than different colors. Then a single diverging color scale could have been used for the whole treemap. For instance, negative values could have been varying shades of red, values near zero could have been gray, and positive values could have been varying shades of blue. This would have made it significantly easier to decode the values—especially the values near zero, representing little change—than the current design that uses three different sequential color scales.
There is another problem with the treemap, though it’s not apparent unless you look at the underlying data. The color scale in the treemap shows annual percentage changes ranging from -3% to +3%. However, some of the items in the treemap changed by larger amounts than this. For instance, between 2005 and 2010 the years of life lost per 100,000 people to malaria decreased by 23.80%, which is an annual percentage reduction of 4.76%. This is a great improvement, but this outlier is completely lost when viewing the treemap, which shows malaria as one of the many infectious diseases that decreased annually between -2% and -3%.
The information that appears in the treemap can be easily shown in two side-by-side bar graphs in a way that tells the story clearly and accurately and is just as visually engaging without resorting to gimmickry. In fact, by using a third variable to display information about the death rate for each cause, instead of solely showing the information in terms of years of life lost, the story can be enriched to give a clearer picture of the world. Here is our redesign:
The bar graph on the left shows the years of life lost per 100,000 people in 2010 for each cause, which is the information encoded by the areas of the rectangles in the original treemap. The bars have been ranked and color coded to make it easy to compare causes of death. The years of life lost to each cause as percentages of the whole are also shown in the column of text, just to the left of the bars.
The bar graph in the center shows the percentage change between the years of life lost per 100,000 people in 2005 and 2010 for each of the causes. Unlike the original graph, we’re showing the total percentage change between those years, rather than an annualized version.
The bar graph on the right displays information that’s not shown in the original treemap: the death rate per 100,000 people for each cause. The fact that this information can be viewed together with the years of life lost information is useful and we’ll examine it in more detail a little later.
You might notice that our bar graphs include fewer items than the original treemap. The original treemap contains a little over 100 rectangles, many of which are unlabeled. We had access to the original dataset, so we could have made bar graphs that included items for each individual disease, but we decided it would have undermined the core story to include dozens of tiny bars, so we decided to aggregate the data into useful categories. For instance we aggregated all different types of cancer into a single “Cancer” bar and all different types of heart disease into a single “Heart disease” bar. Also, for items that contributed less than 1% of total deaths, if they couldn’t already be aggregated into an obvious category like cancer, we moved them into an “Other” category. For instance, deaths from diphtheria are included in the “Other communicable diseases (including meningitis and hepatitis)” bar. In cases when access to these lower-level details is important, a table containing all individual causes of death could be included to provide this information.
Notice how much easier it is to interpret the values represented by the bars than it was to decode the rectangle sizes and color intensities in the treemap. The fact that fewer years of life are being lost to communicable, maternal, neonatal, and nutritional disorders, represented by the gray bars, is immediately obvious, because all of the gray bars are showing decreases (negative values) in the center graph. By placing the years of life lost rate and the death rate for each cause in close proximity to one another, it’s easy to find discrepancies between their patterns, which can be informative. For instance, most of the gray bars have relatively short death rate bars, in comparison to the bars that represent years of life lost. This is because many of the gray bars represent diseases or issues that tend to kill children, so each death results in many years of life lost. For instant, on average, each death from malaria robs someone of 67.2 years of estimated life. Conversely, the three largest brown bars, “Cancer,” “Heart disease,” and “Stroke” all represent things that tend to kill older people, so each death has a relatively lower impact on years of life lost. For instance, each death to heart disease, on average, is responsible for an estimated 17.5 years of life lost.
By using bar graphs, we’ve made it easier to interpret and compare the data, so that it’s easy to focus on the stories contained in the data, rather than struggling to decode an inappropriate and ineffectively designed display.
-Bryan
17 Comments on “One of Bill Gates’ favorite graphs redesigned”
You right.
But how many people will “watch” the 3D treemap and how many people will “understand” your bar graph?
You and Bill Gates have different goals.
Bryan — for me, this display is clearly a huge improvement on the original. Though interesting that some in my office still prefer the original (in short, visually engaging and with sufficient useful/interesting information they can scrape out quickly). They’re clearly wrong ;-)
I regularly produce paired graphs like this (absolute value and change over time). I’ve found that non-analytical people can appreciate the fact that information is there, but struggle to use it — it still requires ‘effort’ to scan down the list and do the comparisons between the two metrics. Not something many people do intuitively, I’ve found. Recently I’ve been plotting these kind of data on a scatterplot with simple labels for the quadrants (lots and getting worse, etc.). Where there are sufficiently few highlight in the ‘bad’ quadrant, it does help focus attention. The nature of that attention is different, too. Instead of exploratory rambling through the data, it’s more geared towards moving that point to a good quadrant — it seems intuitive to many non-analytical thinkers to move between quadrants.
The addition of the deaths beyond the years lost was intriguing. Having the two made me start thinking of the multiple reasons why there was divergence between the two columns (more people die old of heart attack and young of cancer, etc.). Might not be true, though, and would need additional information to ensure my reasoning was sound. This might be thinking a bit off piste to the purpose of the graphic.
Massimo,
Is the goal to get people to “watch” or to get them to “understand”? If understanding is the goal, knowledge of how to read a bar graph is much more common than that of reading a tree map, and human perception is better designed to compare the lengths of bars than either the sizes of color intensities of rectangles. Does Bill Gates have a different goal when reading about causes of death? From what I know about the Bill and Melinda Gates Foundation, I suspect that our goals are quite similar. We want to understand this information so we can do whatever possible to reduce preventable deaths, especially among the young. For this goal, presenting the data as clearly and simply as possible is essential. We have improved the clarity and readability of the data without losing anything of value that existed in the original tree map.
While I dislike much of the original view and appreciate the added richness of your improvement, I fear you’ve lost the main point, or at least de-emphasized it. What Bill Gates liked about the original was the prominent display of the high proportion of life lost to communicable diseases and thus largely preventable. While it’s not easy to tell if the yellow block is bigger than the pink block, you can quickly see they’re comparable. In your version, you represent that information more accurately but much less prominently as a small text table in the upper left. I suppose you can visually sum the bar lengths by color, but that seems even less precise and unnatural. (BTW, it’s wasn’t clear to me if those numbers in the upper left correspond to the left bar chart or the right bar chart.)
Of course, it’s hard to show everything well, but I think that proportion is an important part of the message. I’d rather lose the bottom 5-10 rows of the break out bars so I can see the aggregates in graphical form, such as a separate set bar graphs with 3 rows above the detail bar graphs.
Secondarily, it’s less intuitive how to interpret the “Other” bars. If each is just an aggregation of unrelated items, it seems they should be separated or at the bottom so they’re not “competing” with the individual items. For those that are related, it still seems odd to see the “other” without the context of the main item. For instance, the original has “Other neonatal conditions” surrounded by “Neonatal infections” and related items, which provide context for the word “other”.
@Xan:
Read Bill Gates’ quote again: “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down.”
That the “numbers continue to come down” is not readily apparent in the original, both because the information is encoded in color intensities, and because the 3D shading (being yet another color intensity) is visually distracting. It actually does a very poor job of showing what Bill Gates says it shows.
However, it is very clear in the revised version that communicable diseases are on the decline. Do you see the middle barchart? The fact that the gray bars stretch so far to the left is both undeniably clear and profoundly significant. And also exactly the reason Bill Gates says he loves the original.
Thanks for the correction, Andrew. Because the “numbers continue to come down” part is not so readily apparent, as you generously say, I assumed Gates was getting that part from his own experience funding such efforts instead of from the treemap. Especially since the yellow doesn’t look pale like the low colors for the other scales.
Yes, the redesign does a great job of showing communicable diseases as improving much faster than the others, at least as a percentage. And the percentage improvements are big enough that they can be mentally linked to the first column to see large absolute improvement as well.
Apologies to Bryan for my misreading. I still miss the aggregated visual. To me it makes an immediate statement about how big the problem of communicable diseases is and implies the potential improvement possible through the Gates Foundation.
I do agree that the redesigned version is, on the whole, superior. I also agree with Xan’s assessment above that the aggregate visual has essentially gone missing in the redesign.
One other thing that I miss from the original – the ability to pick out a block that looks interesting (in my case, I focused on “Drowning” and comparing it to other blocks. Because the re-designed graph is aggregated, I was not able to do that. I thought it was interesting (and surprising) that life years lost from drowning are roughly on par with those from any single type of cancer (except lung cancer) – possibly explained at least in part because drowning deaths tend to occur more frequently with children, but certainly a point worth exploring.
An important lesson to learn about data presentation is that any visualization, without exception, can be improved to better suit a different set of assumptions about the audience. Whenever we design a particular visualization, we do so based on particular assumptions about the audience’s needs and abilities. When Bryan and I redesigned the Causes of Untimely Death visualization, we started with a particular set of assumptions about the interests and abilities of the general public. We were attempting to serve the broadest audience possible, but in so doing we made design decisions that would make the visualization less than ideal for some people.
We thought about including a separate bar graph to display the proportions of deaths related to the three major categories. This would have been simple, but we decided against it because we felt that these three percentages could be communicated as well for a general audience by providing them as numbers in the upper left-corner where they would be hard to miss. Three percentages that add up to 100% are easy to understand and compare without graphical representation. Would the addition of a separate bar graph to show these values been useful? For some audiences, yes.
We thought a great deal about the level at which causes of death should be represented. If I remember correctly, the original data source from the University of Washington was arranged as a five-level hierarchy, going down to a level of detail that included many individual diseases that aren’t familiar to the general public. After a great deal of experimentation, we chose a means of grouping the causes of death that seemed most useful to the general public, which was somewhat different from the original tree map. To serve the interests of people who wanted more details about relatively minor causes of death, we decided that a table organized into major causes and arranged alphabetically by individual cause within those major categories would make it easy for readers to look up individual items that were of interest to them. As Bryan mentioned in the original blog post, we didn’t bother to actually provide this table (yeah, we were lazy, because we were primarily showing how the chart could be improved), but if we had presented this information to the general public, we would have made a table of greater detail accessible through a simple click of the mouse.
I am so glad you remade this chart. I cringed when I saw that Bill Gates had chosen this.
It is clear that the majority of visualizations chosen for the list in question were chosen for the data they are meant to represent, and not for the visualization itself.
They are almost universally bad.
This article and redesign raises another issue in my mind – which data should be displayed? You have added an extra measure to your redesigned chart that was not in the original as it provides greater insight into the ‘years of life lost’ data.
Several times in the discussion above is raised the slightly hidden information about the ages at death for the various categories. I think this is an interesting point as this is the ‘raw’ underlying data that was used to calculate the years of life lost. I imagine the calculation might be something like this …
(‘Average life expectancy’ – ‘Average age at death’) x ‘Number of deaths’
If we display these more fundamental data values, would it not provide a more informative chart? The meaning and significance of ‘Age at death’ is very clear and unambiguous, but meaning and significance of ‘Year of life lost’ is open to discussion – I think it is used here more to tug at the heart strings than to provide informative information.
What do you think?
@Barnaby:
Without the “Years of Life Lost” value, how would you sort the list? Are diseases that affect a large number of people of greater interest? Or diseases that affect younger people?
Specifically, how would we keep the various communicable/maternal/neonatal disorders near the top of the list while also keeping cancer and heart disease up there?
I’ve had a hunt for these data but can’t readily find them. Could they be shared?
Hi Neil,
The original data comes from the University of Washington’s Institute of Health Metrics and Evaluation. You can get the full data file here. It contains information for each individual disease, but it doesn’t contain categorical groupings so it’s probably not useful to you by itself, unless you want to look up values for individual diseases. They also have an interactive visualization that allows you to look up individual values via tooltips or download the data shown by the current display. It contains the basic categorical groupings that we used, though in some cases we simplified the names. The interactive treemap visualization can only be used to download part-to-whole information, not actual years of life lost or years of life lost per 100,000, but if you use the “time plot” visualization, you can get the 2005 and 2010 values for individual causes or larger categories and calculate the change from there.
A new thread about this topic has been started in our discussion forum, so participants can submit their own redesigns of this graph for comment.
Just one comment (that falls in the know-is-your-audience category):
In the blog format above, the original graphic is legible. The redesign is not at all legible (without zooming in).
I won’t argue with any of the points regarding data-display, but I deal with graphics for print in my daily work, and the redesign is totally useless for print (assuming the original graphic represents the page-space available). So I find the comparison unfair to begin with.
I’d be much more interested (again, for me – a specific audience..) in a redesign that could be used in a similar format.
Samuel,
It is true that if our redesigned chart were meant to be viewed within the narrow width of this blog, it wouldn’t work. The fact is, however, that both the original chart and ours were designed to be larger, which is why we provided the enlarged versions. If our redesigned chart were intended for print, a slight increase in the size of the fonts would be all that’s needed. In what sense is the comparison unfair?
Hi,
I like the bar chart because it displays the data better and treemaps are only rarely appropriate or effective.
The idea behind the original tree map was to give the figures impact, at the expense of displaying figures correctly. Because it is a subject that generates an emotive reaction maybe a way to keep the impact is to do, for example, a comparison between developed and developing countries or two time periods (if it’s interesting) although I’m aware we start talking about analysis here and not presentation.
Thanks,
Matt