Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
January 11th, 2016
I’ll begin this blog article by answering the question that appears in the title. I’ve found that 100% bar graphs, designed in the conventional way, are only useful for a limited set of circumstances. Unlike normal stacked bars, the lengths of 100% stacked bars never vary, for they always add up to 100%. Consequently, when multiple 100% stacked bars appear in a graph, they only provide information about the parts of some whole, never about the wholes and how they differ. Therefore, they would never be appropriate when information about totals and the parts of which they are made are both of interest, though normal stacked bars often work well in this scenario. I’ve found that 100% stacked bar graphs are only useful in three specific situations, which I’ll describe in a moment.
I was prompted to write about this when I recently read the book titled “Storytelling with Data” by Cole Nussbaumer Knafic. Cole likes 100% stacked bars. Several appear in her book. When Cole and I met for lunch last week, shortly before departing I asked if she would be interested in discussing matters on which we apparently disagree and suggested 100% stacked bar graphs as our opening topic. She graciously welcomed the opportunity, so I began the discussion via email later in the week. Our discussion focused primarily on the following graph that appears in her book as an exemplar of graphical communication.
This graph displays a part-to-whole relationship between projects for which the goals were missed, met, or exceeded by quarter. A 100% stacked bar graph never serves as the best solution for a time series. Stacked segments of bars do not display patterns of change through time as clearly as lines. In this particular example, only the bottom bar segments, representing missed goals, do a decent job of showing the quarterly pattern of change. The top segments, representing exceeded goals, invert the pattern of change (i.e., the lower the segment extends, the higher the value is that it represents), which is confusing. The middle segments, representing met goals, encode the quarterly values as the heights of the segments, not their tops, which makes the pattern of change impossible to see.
The following line graph displays the data more effectively in every respect.
Despite the perceptual problems that I identified in Cole’s 100% stacked bar graph, she feels that it is superior to the line graph above. Her preference is rooted in the fact that the stacked bar graph intuitively indicates the part-to-whole nature of the relationship between missed, met, and exceeded goals. While it is true that a line graph does not by itself state, “these are parts of a whole,” this can be easily made clear in the title, as I did above. For Cole, the stacked bar graph’s ability to declare the parts of a whole nature of the relationship without having to clarify this in the title overcomes its perceptual problems.
Let’s move on to the three occasions when I believe 100% stacked bars are useful:
- When the bars consist of only two segments (e.g., male and female)
- When we need to compare the sum of multiple parts among multiple bars
- When we need to compare the percentages of responses to Likert scales
Here’s an example of the first situation:
Because the bars are divided into two segments only (i.e., women and men), it is easy to read the values of each segment and to compare a specific segment through the entire set of bars. This comparison can be easily made because each segment is aligned through the entire set of bars (women to the left and men to the right). If a third segment were added, however, the segment in the middle would not be aligned to the left or right, which would make comparisons difficult.
I can illustrate the other occasion when 100% stacked bars are useful with the following example from Cole’s book:
The primary purpose of this graph is to compare the sum of customer segments 3, 4 and 5 in the “US Population” versus the sum of the same three customer segments in among “Our Customers.” Assuming that no other comparisons are important, the two 100% stacked bars do the job effectively. If I were creating this graph myself, however, I would be tempted to make a few minor adjustments. Assuming that the customer segments have actual names rather than numbers, which is usually the case, and that the specific order in which the segments appear above is not necessary, I would place the highlighted segments at the bottom of the stacked bars, as I’ve done below.
This gives the featured segments a common baseline, which makes the comparison of their heights easier. Although it isn’t necessary, I also placed the segment names next to both bars because the vertical positions of the segments are not aligned, which makes it easier to identify the segments on the right.
The final occasion involves the comparison of Likert scale responses (e.g., Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Cole feels that a conventional 100% stacked bar handles this well, illustrated by the following example from her book:
This particular design does work well for the following purposes:
- Comparing Strongly Disagree percentages
- Comparing the combination of Strong Disagree and Disagree percentages
- Comparing Strongly Agree percentages
- Comparing the combination of Agree and Strongly Agree percentages
- Reading the percentage values for Strongly Disagree
- Reading the percentage values for the sum of Strongly Disagree and Disagree
However, it does not work well for the following purposes:
- Comparing Disagree percentages
- Comparing Neutral percentages
- Comparing Agree percentages
- Reading percentage values of the individual segments Disagree, Neutral, Agree, or Strongly Agree, because mental math is required
- Reading the percentage values for the sum of Agree and Strongly Agree, because mental math is required
Given these particular strengths and weaknesses, a 100% stacked bar graph of this design would work well to the degree that the audience only needs to access its strengths.
Variations on the design of 100% stacked bar graphs usually work better. Most of these variations display negative results (e.g., Strongly Disagree and Disagree) as negative values running left from zero and positive results (e.g., Agree and Strongly Agree) as positive values running right from zero. Here’s an example:
Designed in this way, differences between positive and negative results now stand out a bit more, the sum of Agree and Strongly Agree are easier to read, and the Neutral values are both easier to read and compare.
For some purposes, the Neutral results may be eliminated altogether, and for some it may be appropriate to split the Neutral results down the middle, displaying half of them as negative and half as positive, as follows:
In cases when it’s important to compare each individual segment from bar to bar rather than the sum of negative results (Disagree and Strongly Disagree) or positive results (Agree and Strongly Agree), a separate column of bars for each item on the Likert scale would work best, illustrated below.
Other than these few occasions when 100% stacked bar graphs are effective, I’m not aware of any other appropriate uses of them. If you’re aware of other good uses, please post and describe your examples in my discussion forum.
January 8th, 2016
People often speak of the “art and science” of data visualization without explanation, as if their meaning is obvious. In fact, it isn’t. What is the function of art in data visualization? Art might serve a role, but if it does, an explanation is needed.
Several years ago when I was talking with Nancy Duarte, author of the books Slide:ology, Resonate, and Illuminate, I said that my work didn’t involve art. She quickly rose to my defense and said, “I disagree!” She assumed that I was admitting a deficiency in my work, but that wasn’t my intention. I was simply saying that my work is rooted entirely in science. I’m not an artist. I’m not trying to be an artist. I love art, but it isn’t what I do.
What do people mean when they talk about the art of data visualization? When they juxtapose the words art and science, they are usually using art as a synonym for creativity. I take issue with this, however, because it suggests that science lacks creativity, which is hardly the case. Good science requires a great deal of creativity. When I say that my work doesn’t involve art, I’m certainly not saying that it isn’t creative.
In the context of data visualization, we ought to use the term “art” with caution. Speaking of data visualization as art can excuse a great deal of nonsense—ineffective design—as the realm of artistic license.
Let’s be clear about something else. When I say that my work in data visualization doesn’t involve art, I am not denying the role of aesthetics. Art is not the exclusive realm of aesthetics. I care about aesthetics in data visualization because they play a role in making graphics effective. An ugly visualization is not inviting, nor does it promote the comfortable emotional state that helps to open one’s mind to information. My understanding of aesthetics and the ways that graphics can be made to please the eye is based on science. Apart from science, like everyone, I have a built-in sense of aesthetics that automatically influences my responses to things. However, the knowledge of aesthetics that primarily influences my work in data visualization—what works and what doesn’t—has emerged from scientific research (for example, from the Gestalt School).
If we’re going to talk about the art of data visualization, let’s do so clearly and meaningfully. Until someone describes the role of art in a way that makes sense to me, I’ll continue to describe my work as exclusively informed by science—both formal research and my own empirical observations.
January 5th, 2016
Last month I spent a great deal of time thinking and writing about information visualization research, mostly bemoaning ways in which it usually misses the mark. A few days ago, Steven Franconeri of Northwestern University welcomed my invitation to talk about infovis research projects that would address real problems. When he asked if I already had a list of potential projects, I admitted that I haven’t written one, but offered to do so. I really should write down ideas for potential research projects when they occur to me, but I haven’t had a convenient place to record them. I’ll fix this soon. A few days ago I took a few minutes to scour my memory for ideas from the past, and quickly made a list of 18 potential projects. Before I publish a list, however, I’d like to collect ideas from you as well.
Infovis researchers would benefit from hearing about the problems that practitioners currently struggle to solve. It can be difficult to know what’s actually needed when you spend most of your time at a university, whether you’re a student or a professor. Infovis is still a young field with much to learn and much to develop. We who use data visualization in our work can help ourselves by helping the research community to understand what’s needed.
Infovis research projects fall into a few different categories. In general, they 1) study phenomena related to data visualization that we don’t fully understand through observation and experiments, 2) develop potential solutions to problems and test them to see if and how well they work, or 3) develop new conceptual structures (a.k.a., taxonomies) for understanding data visualization. Here’s a potential example of each:
- Determine the effects on a scatterplot’s aspect ratio for interpreting the existence and nature of correlations.
- Develop and test a means to trigger blind sight (seeing things without conscious awareness) as a way of drawing someone’s attention to a particular area of a visual display, such as a particular piece of information in a dashboard that needs attention.
- Develop a clear way for people to think about the differences in which data visualizations should be designed to support data sensemaking (i.e., data exploration and analysis) versus data communication (i.e., presentation).
Please give this some thought. If you think of potentially useful infovis research projects, respond with a description. If it has already been done, I’ll let you know, assuming that I’ve come across it. After a few days of collecting your ideas, I’ll compile a full list of potential research projects and publish it on this website. I will also keep it updated with new ideas and with information about research projects that are undertaken to address them.
January 4th, 2016
I began this first workday of 2016 as usual, by walking my dogs and then reading the news while sipping coffee and eating a bowl of fruit. The world that I found in the news this morning is filled with the same tragedies that have assaulted us for years, but it was the hopeful bits that bothered me most. Hope is marketed to us today in the form of “Technology,” with a capital “T.” While it is certainly true that “technologies” with a small “t” will be needed to avert disaster and build a brighter future, our current emphasis on technologies independent of human ability, conscience, and hard work is a dangerous drug. If Karl Marx were alive today, he might write: “Technology is the opiate of the masses.”
One of the news articles that caught my attention this morning was “Storytelling By the Numbers: How to Make Visualizing Data Easy.” In this insipid marketing piece the following words were quoted:
We’re seeing more efforts to take the complexity out of data calculations, structuring, analytics and visualizations…Soon, we’ll all be…empowered with data-processing capabilities to elevate us out the world of educated guessery and [into] the heavenly realm of informed decision and probability based predictions. [Hamish McKenzie of the product company Silk]
What the writer failed to mention is that McKenzie wrote these words in 2013 and Silk has not yet delivered this “heavenly realm of informed decision and probability based predictions.” I have no idea if Silk is a decent product, but I immediately distrust any company that markets its wares with the false promises of a snake-oil salesman.
At the beginning of this new year, I’m more convinced than ever that we must stop believing in magical solutions and get down to the hard work of real problem solving. This will require skill, which leads to my next concern: most data visualization advocates who have emerged in recent years promote superficial understanding and skills. Many recent books and articles about data visualization barely scratch the surface, promoting “data visualization lite.” They usually teach a concise set of simple guidelines that don’t provide a solid foundation for skilled work. The abbreviated form of communication (and thinking) that PowerPoint began to promote 25 years ago and Twitter further endorsed beginning 10 years ago has had an effect on this new generation of thought leaders. What they teach is riddled with inconsistencies, revealing a level of understanding that is shallow. This drives me nuts, because I want data visualization to flourish in the future, but this will require new generations that extend the work rather than reducing it to pithy bullet points. A brief TED Talk may be enough to inspire, but it is not a venue for learning. Real learning—real skills development—takes thoughtful study and years of practice.
Later this month I’ll be teaching my new Advanced Dashboard Design workshop for the second time in the United States. Even though I’ve made the requirements for attending this advanced workshop clear—you must have read Information Dashboard Design and developed at least one dashboard based on its principles to be eligible—I have turned down several applicants who couldn’t demonstrate this level of skill with a reasonably well-designed dashboard. Some of them hadn’t actually read the book. This is a symptom of the same problem that I described above. People think they can develop skills using shortcuts. They can’t. I’d rather teach an advanced workshop with only a handful of people than earn greater revenues trying to corral people at widely ranging levels of experience to work together, holding back the students who are truly skilled. I had a similar experience when I taught in the MBA program at the University of California, Berkeley. Some of my students managed to get into this program without the required level of understanding. When I didn’t give them the high grades that they expected, they were shocked and angry. I felt similarly. I was shocked and angry that U.C. Berkeley admitted these students into the program.
I know that, to those of you who follow my work, I must sound like a broken record, making the same basic points over and over. “Slow down, take the time that’s necessary, think thoroughly, learn deeply, develop useful skills.” Trust me when I say that I find the repetition annoying as well. I’d like to move past this fundamental guidance to spend more of my time pursuing more advanced topics, but I keep getting dragged back into the basics because that’s where most people dwell, whether they know it or not.
If you want to help the world by doing the work of data sensemaking, set your sights high. Make this your New Year’s resolution. Begin by learning the basics and learning them thoroughly. Don’t learn a tiny bit and immediately start writing a blog to distribute mal-developed insights to a gullible world. Know, however, that the path of thorough preparation will not necessarily produce the recognition that your eventual good work will deserve. If recognition is more important to you than the personal satisfaction that comes with good work, go ahead and begin that blog immediately. There’s a world out there longing for more hype and drivel—more of the fast food that nourishes technological solutionism. If you want the fulfillment that comes with good work, which is essential to a good life, choose the road less travelled. Whether the world knows it or not, that’s path you must take to make the world better.
December 23rd, 2015
When vague or ambiguous terms are used without defining them, confusion results. The term Big Data is a prolific example. An entire industry has been built up around a term that no two people define the same. In this particular case, the confusion is useful to vendors and consultants who want to sell you so-called Big Data products and services. In the field of data visualization, the term engagement is being tossed about more and more, without a clear definition. Research papers are being written that make claims about engagement without declaring what they mean by the term. This creates a minefield of confusion.
This issue came up in a recent conversation in my discussion forum between Enrico Bertini and me. Enrico was using the term engagement to describe attributes of data visualizations that are eye-catching. I responded to Enrico that engagement, in my opinion, involves something more than merely catching the eye. When people use the term engagement when discussing data visualization, they tend to always use it in a positive manner. It is assumed that engagement is useful. This, however, is not a good assumption. We can certainly become engaged in activities that are less than useful—harmful even.
Measuring the degree to which someone becomes engaged in viewing or in some way using a visualization can be useful, but only if we clearly define what we mean by the term and choose a metric that indeed measures it, which is rarely done. Using eye-tracking technology to measure the amount of time someone looks at something in particular is a common way that engagement is measured in research studies, and in these studies it is always assumed but rarely explained why engagement defined and measured in this manner is beneficial. Someone might look at a particular visualization or a particular component of a visualization because it’s visually unusual or eye-catching, but not in a manner that is informative. It is also possible that someone may look at something for a long time because they are struggling to make sense of it, which is likely a problem. The struggle would only be useful if it results in understanding something that couldn’t be understood more easily had the data been displayed in another way.
Let me illustrate with a specific example. Recently, Steve Haroz, Robert Kosara, and Steven L. Franconeri wrote a research paper titled “The Connected Scatterplot for Presenting Paired Time Series.” The purpose of the study was to test the potential usefulness of an unusual version of a scatterplot that a few graphical journalists have produced in recent years. When we want to examine how two quantitative variables changed through time—variables that don’t share a common quantitative scale (e.g., monthly sales revenues in dollars versus the number of items sold)—we would ideally use two line graphs, one positioned above the other, or if our audience is not confused by dual-axis graphs, we could use a single line graph with a scale on the left for one variable and a scale on the right for the other. The same values can be displayed in a scatterplot, however, with a single data point for each time period, which encodes the two variables based on the position along the X axis for one variable and along the Y axis for the other. To show the chronological sequence, the dots are connected with a line and labeled sequentially. Here’s a simple example from the paper:
And here’s the same data shown as a dual-axis line chart:
In this particular case, a dual-axis graph wouldn’t actually be needed because both variables share the same quantitative scale, but you can imagine that they don’t.
Although it seems obvious that, even with an extremely simple data set such as this, the connected scatterplot is more difficult to read than the dual-axis line chart, it certainly didn’t hurt to do an experiment to confirm this. In fact, these researchers confirmed that connected scatterplots are difficult to read and produce more errors in understanding. What’s interesting, however, is that the researchers made the following statement in the last sentence of the paper’s conclusions section:
All these findings suggest that the technique, despite its lack of familiarity, has merit for presenting and communicating data.
If you carefully read the paper, however, the results did not indicate that these graphs provide any actual benefits. So, what is it to which these researchers are referring as merit? Here’s the answer:
The prioritized viewing of CSs [connected scatterplots] – at least as compared to DALCs [dual-axis line charts] – makes them good candidates when the goal is to draw a viewer’s attention.
Later in the same paragraph, however, they make the following admission:
But it is not yet clear whether the preferential viewing arises from the technique per se, or its lack of familiarity.
What do they actually mean by “prioritized viewing” and “preferential viewing,” terms that suggest usefulness? Test subjects were shown screens consisting of six blurred thumbnail versions of charts—three connected scatterplots and three dual-axis line charts. The researchers told subjects that they were “studying the types of information that most interested them.” They were further told that they had five minutes and that, during that time, they could click on any thumbnail chart to view a larger, nonblurred version of it. Using eye-tracking technology to monitor the subjects, the authors found that, during the first half of the five-minute period, subjects spent more time on average looking at the connected scatterplots. This is what led the authors to recommend connected scatterplots’ “use for engagement and communication.” I don’t consider this a meaningful assessment of engagement, and certainly not of engagement that is useful, given the fact that subjects found these graphs difficult to understand.
We should be more clear and precise in our use of terms when they constitute the object of research or are promoted as beneficial. The degree to which and manner in which someone becomes engaged in viewing and interacting with a visualization is worth consideration, but only if we define engagement clearly. To me, engagement suggests more than attracting attention. It suggests sustained attention. For engagement to qualify as useful, it must involve productive thinking. For example, here’s a definition that might work in the context of data visualization:
Useful engagement with visualized data involves a sustained period of attention on the data or in interaction with the data that increases understanding.
If we measure this and seek to achieve this in our work, we’ll be doing something worthwhile.