On the home page of my website, I quote the mathematician and philosopher Alfred North Whitehead who said, “Seek simplicity and distrust it.” This is wise advice. We want to keep things as simple as possible, but we should never oversimplify to the point of losing essential complexity. As data visualization has become increasingly popular during the last decade, efforts to explain it have often become simplistic (i.e., oversimplified) to a harmful degree. We humans long for simple answers. The world, however, is in many ways complex. Data sensemaking and presentation skills are easy to learn, and once we’ve learned them, they seem simple and even obvious, but there is no denying that the concepts, principles, and practices are complex. We should “seek simplicity and distrust it.”
During the last year or so I’ve come across several people and organizations that were promoting the use of a chart selection diagram that was developed by Dr. Andrew Abela for his book Advanced Presentation by Design. The diagram, titled “Chart Suggestion—A Thought Starter,” serves as a guide for selecting an appropriate graph. This guide is simplistic and misleading. To be blunt, it is a confusing mess of internal contradictions and errors. While Abela might understand many aspects of effective presentation, his knowledge of data visualization is cursory at best.
Most recently, I encountered this diagram in an otherwise sane blog article by the software company iCharts. In the article, ironically titled “How to Avoid Misleading Your Audience,” they recommend the diagram as a useful guide. I responded immediately by warning them against it. Several months ago, I encountered the diagram when a fellow who was writing a book asked for my advice regarding a chapter about data visualization. He was planning to include Abela’s diagram in that chapter. Why? It conveniently fit on a page.
One-page diagrams are a tempting way to teach people new skills, but they often result in confusion. Those of you who are familiar with my work might be thinking, “But Steve, don’t you provide a one-page Graph Selection Matrix as a guide for novices?” I do provide a Graph Selection Matrix, but not as a guide for novices. I provide it as a single-page summary of the information about graph selection that’s covered in my book Show Me the Numbers. It’s only useful if you’ve already learned the concepts that it summarizes by reading the book or taking the corresponding course.
I first learned of Abela’s work several years ago when a large corporation asked me to provide ongoing data visualization training for its employees in conjunction with Abela’s presentation skills courses. Before responding to their request, I purchased and read Abela’s book. I not only found that his understanding of graphs was confused and fundamentally flawed, but also that his presentation principles were at times naïve. I told the company that I could not teach in conjunction with Abela because confusion would result.
Here’s Abela’s chart selection diagram. Take a few minutes to examine it on your own before reading my critique below. Does it make sense? Are its suggestions valid?
Where to begin? Let’s follow the sequence of choices moving outwards from the center: “What would you like to show?” We’re given four choices: 1) Comparison, 2) Distribution, 3) Composition, and 4) Relationship. This suggests that at the highest level we always want to show one of these four things in a graph. I call them “things” because I can’t think of a common term that describes these mismatched concepts. Comparison is an activity that all graphs are designed to support, Distribution is a specific feature of a set of quantitative values, Composition refers to that which something is comprised of, and Relationship is a feature that exists between values in some form or another in all data sets. These concepts don’t go together. Only one of the four—Distribution—clearly describes a specific attribute of data. Distribution refers to manner in which a set of quantitative values belonging to a single variable are distributed from lowest to highest. For example, we might want to show how employees in a company are distributed by age from youngest to oldest. Unless we need to display the distribution of a set of quantitative values (and we understand that this is what Abela means by the term distribution), the diagram leaves us hanging with no clear direction.
You might assume that Abela’s book clarifies these choices, but it doesn’t. Here’s an example of the brief explanations that his book provides: “The last option…is composition: this is when you want to highlight the components of your data.” In this context, what does “components of your data” mean? You and I might understand that he’s referring to the items that make up a categorical variable, but would someone with no training in data visualization understand this? How about the terms distribution and comparison? Abela doesn’t explain what he means by these terms at all. The closest that he comes to an adequate explanation of these high-level choices is when he provides an example of a relationship: “If you want to show that your data provides evidence of a relationship, for example, between advertising and sales revenues, then you should move to [that part of the diagram].” This reveals that by the term “relationship” he’s referring to a correlation between two or more quantitative variables, but this is certainly not the only relationship that exists in quantitative data.
If we choose Relationship, we can follow the flow diagram to the only section that provides valid graph suggestions: a Scatter Chart if we need to show a correlation between two quantitative variables and a Bubble Chart if we want to show a correlation between three variables. Other than here in this one section, Abela’s suggestions are seriously flawed.
Let’s move on to the Comparison section. Our first choice is to show comparisons Among Items or Over Time. Those of us who are experienced data analysts know that by “Items” Abela is referring to the items that make up a categorical variable, but would this be clear to a novice? “Over Time” is clear, but why do values that change over time belong to the comparison section any more than the many other comparisons that we routinely need to enable in graphs? Also, Changing Over Time, as distinct from merely Over Time, appears as a first-level choice in the Composition section, which we’ll encounter later. Suffice it to say, this is not a clear and useful taxonomy of graph types.
Let’s proceed. If we select Among Items rather than Over Time, we are then faced with the choices Two Variables per Item, which leads us to a graph that few products support and for good reason—a vertical bar graph that varies not only the heights of the bars, but also their widths to simultaneously display two quantitative variables—or One Variable per Item, which leads us to the following choice: Many Categories or Few Categories. If we select Many Categories, we are told that we should use a Table or a Table with Embedded Charts. So, according to Abela, when are tables useful? Apparently, only when we must display many categories. Oh my. What about the usefulness of tables when people need precise values? What about their usefulness when people need an easy way to look up specific values? And how do we choose between a Table and a Table of Embedded Charts? By a Table of Embedded Charts, Abela is referring to what Edward Tufte calls a small multiples display—specifically one that is arranged in both columns and rows. According to Abela, small multiples are only useful when we want to make comparisons among items, but this is hardly the case. What about a series of small multiples that are used to compare values “Over Time” rather than “Among Items,” composed of line graphs? Or what about a small multiples display of scatter plots for comparing correlations? Apparently, these aren’t appropriate options.
Onward once again. If we choose Few Categories rather than Many Categories, we must then choose between Many Items and Few Items. So, if we only need to display a few categories, what if some of them contain many items and others contain few items? What do we then? Let’s keep it simple so we can proceed. Let’s say that we need to display a single categorical variable that consists of many items, perhaps a category called product that consists of fifty individual products, rather than a category called product family that consists of only four items. In this case, according to the diagram, we should use a horizontal bar graph rather than a vertical bar graph (a.k.a., column chart). It is certainly true that it would be harder to fit fifty vertical bars side by side with their labels positioned underneath than it would to fit fifty horizontal bars one above the next with their labels positioned to the left, but is this the only circumstance that suggests an advantage of horizontal over vertical bars? How about when the labels are long and cannot be easily placed under vertical bars, but can be placed to the left of horizontal bars quite easily? If we have few items, and therefore choose vertical bars, here’s the illustration of this chart that the diagram provides:
Arranging bars to overlap is not an effective practice. Positioning them side-by-side without any overlap treats different series of bars equally and supports easy comparisons.
Let’s get through the remaining suggestions in the Comparison section and then put an end to this detailed review in favor of hitting the highlights only. If we need to show comparisons Over Time rather than Among Items, the diagram leads us to next choose between Many Periods and Few Periods, but this is a meaningless choice. Whether there are many or few periods actually has no bearing on the type of chart that we should choose in this case. For this reason, let’s skip this choice in the decision tree and go directly to the next four options: Cyclical Data, Non-Cyclical Data, Single or Few Categories, and Many Categories. Abela suggests that Cyclical Data should be displayed as a Circular Area Chart (actually illustrated by a radar chart, not an area chart), but this is rarely an effective choice. Cyclical data does not equate to a circular chart. For Non-Cyclical Data, he recommends a Line Chart, but line graphs usually work best for both cyclical and linear data. For Single or Few Categories, he recommends a Column Chart, but this is absurd. Several lines in a graph are much easier to interpret and compare than multiples sets of bars. Finally, for Many Categories, he suggests a Line Chart again, which is fine, but the number of categories does not determine the usefulness of a line graph. Line graphs excel in their ability to display patterns of change through time, which is something that bar graphs do poorly. Abela says nothing about featuring patterns of change rather than comparing values at particular points in time. The latter scenario is the only time when a bar graph should be used for a time series.
Now for the remaining highlights. In the Distribution section, a Column Histogram and a Line Histogram (i.e., a frequency polygon) are both appropriate, but the choice between them is not determined by Few Data Points vs. Many Data Points. The remaining suggestions—a Scatter Chart for Two Variables and a 3D Area Chart for Three Variables don’t belong here because they feature correlations, not distributions. In fact, a 3-D area graph is rarely useful and never for a general audience.
In the remaining Composition section, we are first asked to choose either Changing Over Time or Static. By continuing through the choices we learn that, by Composition, Abela is referring to part-to-whole displays, but none of the graphs that he goes on to suggest display parts of a whole effectively, despite the fact that people use them regularly for this purpose.
That’s it; we’re done. This diagram is not a “Thought-Starter,” as Abela suggests, but a thought confuser. This diagram was no doubt the good-intentioned attempt of Abela to help people select appropriate graphs for use in presentations. Despite his intentions, however, it fails because Abela lacks expertise in data visualization. He should have stuck to his area of expertise. He also should have never tried to squeeze chart selection guidance into a one-page diagram. Even if the diagram contained logical, well-organized, and valid suggestions, this guidance cannot be effectively conveyed in a single-page diagram without harmful reduction. Abela attributes great benefit to the single-page display as an aid to presentations. In his book, when describing the ideal length of a “conference room style presentation”—one that is designed to “engage, persuade, come to some conclusion, and drive action”—he writes:
The theoretical, ideal length of a conference room style presentation is one page—with lots of detail, well laid out. Why? Because if you can achieve the goals of your presentation in one page, why would you use two, or ten, or forty? If you are able to distill your message down to one page, your audience will get the sense that you have really captured the essence of the subject. They will also appreciate (and probably be stunned by) the brevity of your presentation.
There are, in fact, many reasons why you might choose to not squeeze everything onto a single page for your audience to view all at once. Sometimes information needs to be presented in a particular sequence, revealed one point at a time. Sometimes a single-page display “with lots of detail” will appear overwhelming, causing your audience’s attention to dissolve immediately upon showing it.
I could create a single-page diagram that provides valid graph selection guidance, but I won’t, even though I’ve been asked to do this several times. I’ve refrained because, by relying on the diagram for guidance without understanding why the recommended graph works best, novices would never learn the principles. They would forever follow a set of rules without understanding them, which isn’t enough. We don’t need an army of mindless robots. We need people who can think, who know how to apply the rules to new situations and when to break them.
My concern about Abela’s work runs deeper than his chart selection guidance. Despite much good advice, what Abela teaches in his book about presentations contains fundamental flaws. For example, the book’s first chapter instructs readers to identify the personality types of their audience, based primarily on the Myers-Briggs Type Indicator (MBTI) assessment, so they can create a presentation that’s tailored to their audience’s preferences. (By the way, according to Myers-Briggs, I’m an INTJ, in case that matters to you.) Even if a clear set of presentation standards could be tied to each of the four Myers-Briggs personality types, which might not be possible, unless you are presenting to one person only and can get that person to take the MBTI assessment in advance, this is the most impractical suggestion I’ve ever encountered in a book about presentation skills. It’s hard to imagine that readers and students don’t raise this objection immediately upon encountering this dumbfounding advice.
People and organizations, including software vendors in the analytics space, crave shortcuts to achievement. There are efficient paths to skill development, but no shortcuts. Data sensemaking and presentation skills require learning and a great deal of practice. When we were children, our underdeveloped brains believed in magic. Just say abracadabra, rub the lamp, or wave the wand and reality will bend to your wishes. I believed that I could run faster and jump higher if my parents would only buy me a pair of tennis shoes called PF Flyers. (If you recognize the reference, you’ve been around for a while.) We who engage in data sensemaking and then work to present what we’ve found to others are no longer children. In the Bible, the following words are attributed to the Apostle Paul:
When I was a child, I spoke as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things. (1 Corinthians 13:11; King James 2000 version)
Simplistic solutions are not the product of an adult mind. Isn’t it time to grow up?