Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
February 23rd, 2016
Tomorrow, February 24, PBS will air a new documentary titled “The Human Face of Big Data.” I’m a longtime supporter of PBS, but on occasion they get it wrong (e.g., when they provide a showcase for charlatans such as Deepak Chopra). In this new documentary, they apparently get much of it wrong by putting a happy face on Big Data that ignores the confusion, false claims, and all but one of the risks that it promotes. The tech journalist Gil Press has skillfully revealed the documentary’s flaws in a review for Forbes titled “A New Documentary Reveals a One-Dimensional Face of Big Data.” Gil and I had a chance to get acquainted a few years ago and I’ve come to appreciate the voice of sanity that he often raises in response to technological hype and misinformation. I strongly recommend that you read Gil’s review to restore balance to the force.
January 19th, 2016
In a previous blog post titled “Potential Information Visualization Research Projects,” I announced that I would prepare a list of potential research projects that would address actual problems and needs that are faced by data visualization practitioners. So far I’ve prepared an initial 33-project list to seed an ongoing effort, which I’ll do my best to maintain as new ideas emerge and old ideas are actually addressed by researchers. These projects do not appear in any particular order. My intention is to help practitioners by making researchers aware of ways that they can address real needs. I will keep a regularly updated list of project ideas as a PDF document, but I’ve briefly described the initial list below. The list is currently divided into three sections: 1) Effectiveness and Efficiency Tests, 2) New Solution Designs and Tests, and 3) Taxonomies and Guidelines.
Some of the projects that appear in the Effectiveness and Efficiency Tests section have been the subject matter of past projects. For example, several projects in the past have tested the effectiveness of pie charts versus bar graphs for displaying parts of a whole. In these cases I feel that the research isn’t complete. Apparently, some people feel that the jury is still out on the matter of pie charts versus bar graphs, so it would be useful for new research to more thoroughly establish, more comprehensibly address, or perhaps challenge existing knowledge.
Please feel free to respond to this blog post or to me directly at any time with suggestions for additional research projects or with information about any projects on this list that are actually in process or already completed.
Effectiveness and Efficiency Tests
- Determine the effects of non-square aspect ratios on the perception of correlation in scatterplots.
- Determine the effectiveness of bar graphs compared to dot plots when the quantitative scale starts at zero.
- Determine the relative speed and effectiveness of interpreting data when presented in typical dashboard gauges versus bullet graphs (one of my inventions).
- Determine the effectiveness of wrapped graphs (one of my inventions) compared to treemaps when the number of values does not exceed what a wrapped graphs display can handle.
- Determine the effectiveness of bricks (one of my inventions) as an alternative to bubbles in a geo-spatial display.
- Determine the effectiveness of bandlines (one of my inventions) as a way of rapidly seeing magnitude differences among a series of sparklines that do not share a common quantitative scale.
- Determine if donut charts are ever the most effective way to display any data for any purpose.
- Determine if pie charts are ever the most effective way to display any data for any purpose.
- Determine if radar charts are ever the most effective way to display any data for any purpose.
- Determine if mosaic charts are ever the most effective way to display any data for any purpose.
- Determine if packed bubble charts are ever the most effective way to display any data for any purpose.
- Determine if dual-scaled graphs are ever the most effective way to display any data for any purpose.
- Determine if graphs with 3-D effects (e.g., 3-D bars) are ever the most effective way to display any data for any purpose.
- Determine which is more effective: displaying deviations in relation to zero or 100%. For example, if you wish to display the degree to which actual expenses varied in relation to the expense budget, would it work best to represent variances as positive or negative percentages above or below zero or as percentages less than or greater than 100%.
- Determine the effectiveness of various designs for Sankey diagrams in an effort to recommend design guidelines.
- Determine the best uses of various network diagram layouts (centralized burst, arc diagrams, radial convergence, etc.).
- Determine the effectiveness of word clouds versus horizontal bar graphs (or wrapped graphs).
- Determine which shapes are most perceptible and distinguishable for data points in scatterplots.
- Determine the effectiveness of large data visualization walls versus smaller, individual workstations.
- Determine if the effectiveness of displaying time horizontally from left to right depends on one’s written language or is more fundamentally built into the human brain.
- Determine if the typical screen scanning pattern beginning at the upper left depends on one’s written language or is more fundamentally built into the human brain.
- Determine the relative speed and effectiveness of interpreting particular patterns in data when displayed as numbers in tables or visually in graphs. For example, compare a table that displays 12 monthly values per row versus a line graph that displays the same values (i.e., twelve monthly values per line) to see how quickly and effectively people can interpret various patterns such as trending upwards, trending downwards, particular cyclical patterns, etc. We know that it is extremely difficult to perceive patterns in tables of numbers, but it would be useful to actually quantify this performance.
- Determine the relative speed of finding outliers in tables of numbers versus graphs.
- Determine the relative benefits of using a familiar form of display versus one that requires a few seconds of instruction. The argument is sometimes made that a graph must be instantly intuitive because making people learn how to read an unfamiliar form of display is too costly in time and cognitive effort. For example, population pyramids provide a familiar way for people who routinely compare the age distributions of males versus females in a group, yet a frequency polygon, although unfamiliar, might provide a way to see how the distributions differ much more quickly and easily. In cases when people can be taught to read an unfamiliar forms of display with little effort, does it make sense to do so rather than continuing to use a form of display that works less effectively.
New Solution Designs and Tests
- Develop an effective way to show proportional highlighting, as it pertains in brushing and linking, for portions of the following graphical objects: bars, lines, and boxplots. Various ways to show proportional highlighting have been applied to bar graphs, but not to line graphs and box plots.
- Develop a way to automatically attach data labels to the ends of lines in a line graph without overlapping.
- Develop a way to temporarily overlay or replace box plots with frequency polygons.
- Develop a way to automatically detect the amount of lag between two time series and then align the leading events with the lagging events in a line graph.
- Develop potential uses of blindsight to direct a person’s attention to particular sections of a display as needed (e.g., to something on a dashboard that needs attention).
- Develop a effective design for waterfall graphs when multiple transactions occur in the same interval of time and some are positive and some are negative.
- Develop an algorithm for automatically distributing several sets of time series values uniformly across a 100% scale when they have different starting points, ending points, and durations. For example, this would make it easy to compare the person hours associated with various projects across their lifespans, even when they differ in starting dates, ending dates, and durations.
- Develop a full set of interface mechanisms for making formatting changes to charts (turning grid lines on and off, changing the colors of objects, repositioning and orienting objects such as legends, changing the quantitative scale along an axis, etc.) that involves direct access to those objects rather than one that requires the user to wade through lists of formatting commands located elsewhere (e.g., in dialog boxes).
Taxonomies and Guidelines
- Develop a useful taxonomy or set of guidelines to help people think about the differences in how data visualizations should be designed to support data sensemaking (i.e., data exploration and analysis) versus data communication (i.e., presentation).
January 11th, 2016
I’ll begin this blog article by answering the question that appears in the title. I’ve found that 100% bar graphs, designed in the conventional way, are only useful for a limited set of circumstances. Unlike normal stacked bars, the lengths of 100% stacked bars never vary, for they always add up to 100%. Consequently, when multiple 100% stacked bars appear in a graph, they only provide information about the parts of some whole, never about the wholes and how they differ. Therefore, they would never be appropriate when information about totals and the parts of which they are made are both of interest, though normal stacked bars often work well in this scenario. I’ve found that 100% stacked bar graphs are only useful in three specific situations, which I’ll describe in a moment.
I was prompted to write about this when I recently read the book titled “Storytelling with Data” by Cole Nussbaumer Knafic. Cole likes 100% stacked bars. Several appear in her book. When Cole and I met for lunch last week, shortly before departing I asked if she would be interested in discussing matters on which we apparently disagree and suggested 100% stacked bar graphs as our opening topic. She graciously welcomed the opportunity, so I began the discussion via email later in the week. Our discussion focused primarily on the following graph that appears in her book as an exemplar of graphical communication.
This graph displays a part-to-whole relationship between projects for which the goals were missed, met, or exceeded by quarter. A 100% stacked bar graph never serves as the best solution for a time series. Stacked segments of bars do not display patterns of change through time as clearly as lines. In this particular example, only the bottom bar segments, representing missed goals, do a decent job of showing the quarterly pattern of change. The top segments, representing exceeded goals, invert the pattern of change (i.e., the lower the segment extends, the higher the value is that it represents), which is confusing. The middle segments, representing met goals, encode the quarterly values as the heights of the segments, not their tops, which makes the pattern of change impossible to see.
The following line graph displays the data more effectively in every respect.
Despite the perceptual problems that I identified in Cole’s 100% stacked bar graph, she feels that it is superior to the line graph above. Her preference is rooted in the fact that the stacked bar graph intuitively indicates the part-to-whole nature of the relationship between missed, met, and exceeded goals. While it is true that a line graph does not by itself state, “these are parts of a whole,” this can be easily made clear in the title, as I did above. For Cole, the stacked bar graph’s ability to declare the parts of a whole nature of the relationship without having to clarify this in the title overcomes its perceptual problems.
Let’s move on to the three occasions when I believe 100% stacked bars are useful:
- When the bars consist of only two segments (e.g., male and female)
- When we need to compare the sum of multiple parts among multiple bars
- When we need to compare the percentages of responses to Likert scales
Here’s an example of the first situation:
Because the bars are divided into two segments only (i.e., women and men), it is easy to read the values of each segment and to compare a specific segment through the entire set of bars. This comparison can be easily made because each segment is aligned through the entire set of bars (women to the left and men to the right). If a third segment were added, however, the segment in the middle would not be aligned to the left or right, which would make comparisons difficult.
I can illustrate the other occasion when 100% stacked bars are useful with the following example from Cole’s book:
The primary purpose of this graph is to compare the sum of customer segments 3, 4 and 5 in the “US Population” versus the sum of the same three customer segments in among “Our Customers.” Assuming that no other comparisons are important, the two 100% stacked bars do the job effectively. If I were creating this graph myself, however, I would be tempted to make a few minor adjustments. Assuming that the customer segments have actual names rather than numbers, which is usually the case, and that the specific order in which the segments appear above is not necessary, I would place the highlighted segments at the bottom of the stacked bars, as I’ve done below.
This gives the featured segments a common baseline, which makes the comparison of their heights easier. Although it isn’t necessary, I also placed the segment names next to both bars because the vertical positions of the segments are not aligned, which makes it easier to identify the segments on the right.
The final occasion involves the comparison of Likert scale responses (e.g., Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Cole feels that a conventional 100% stacked bar handles this well, illustrated by the following example from her book:
This particular design does work well for the following purposes:
- Comparing Strongly Disagree percentages
- Comparing the combination of Strong Disagree and Disagree percentages
- Comparing Strongly Agree percentages
- Comparing the combination of Agree and Strongly Agree percentages
- Reading the percentage values for Strongly Disagree
- Reading the percentage values for the sum of Strongly Disagree and Disagree
However, it does not work well for the following purposes:
- Comparing Disagree percentages
- Comparing Neutral percentages
- Comparing Agree percentages
- Reading percentage values of the individual segments Disagree, Neutral, Agree, or Strongly Agree, because mental math is required
- Reading the percentage values for the sum of Agree and Strongly Agree, because mental math is required
Given these particular strengths and weaknesses, a 100% stacked bar graph of this design would work well to the degree that the audience only needs to access its strengths.
Variations on the design of 100% stacked bar graphs usually work better. Most of these variations display negative results (e.g., Strongly Disagree and Disagree) as negative values running left from zero and positive results (e.g., Agree and Strongly Agree) as positive values running right from zero. Here’s an example:
Designed in this way, differences between positive and negative results now stand out a bit more, the sum of Agree and Strongly Agree are easier to read, and the Neutral values are both easier to read and compare.
For some purposes, the Neutral results may be eliminated altogether, and for some it may be appropriate to split the Neutral results down the middle, displaying half of them as negative and half as positive, as follows:
In cases when it’s important to compare each individual segment from bar to bar rather than the sum of negative results (Disagree and Strongly Disagree) or positive results (Agree and Strongly Agree), a separate column of bars for each item on the Likert scale would work best, illustrated below.
Other than these few occasions when 100% stacked bar graphs are effective, I’m not aware of any other appropriate uses of them. If you’re aware of other good uses, please post and describe your examples in my discussion forum.
January 8th, 2016
People often speak of the “art and science” of data visualization without explanation, as if their meaning is obvious. In fact, it isn’t. What is the function of art in data visualization? Art might serve a role, but if it does, an explanation is needed.
Several years ago when I was talking with Nancy Duarte, author of the books Slide:ology, Resonate, and Illuminate, I said that my work didn’t involve art. She quickly rose to my defense and said, “I disagree!” She assumed that I was admitting a deficiency in my work, but that wasn’t my intention. I was simply saying that my work is rooted entirely in science. I’m not an artist. I’m not trying to be an artist. I love art, but it isn’t what I do.
What do people mean when they talk about the art of data visualization? When they juxtapose the words art and science, they are usually using art as a synonym for creativity. I take issue with this, however, because it suggests that science lacks creativity, which is hardly the case. Good science requires a great deal of creativity. When I say that my work doesn’t involve art, I’m certainly not saying that it isn’t creative.
In the context of data visualization, we ought to use the term “art” with caution. Speaking of data visualization as art can excuse a great deal of nonsense—ineffective design—as the realm of artistic license.
Let’s be clear about something else. When I say that my work in data visualization doesn’t involve art, I am not denying the role of aesthetics. Art is not the exclusive realm of aesthetics. I care about aesthetics in data visualization because they play a role in making graphics effective. An ugly visualization is not inviting, nor does it promote the comfortable emotional state that helps to open one’s mind to information. My understanding of aesthetics and the ways that graphics can be made to please the eye is based on science. Apart from science, like everyone, I have a built-in sense of aesthetics that automatically influences my responses to things. However, the knowledge of aesthetics that primarily influences my work in data visualization—what works and what doesn’t—has emerged from scientific research (for example, from the Gestalt School).
If we’re going to talk about the art of data visualization, let’s do so clearly and meaningfully. Until someone describes the role of art in a way that makes sense to me, I’ll continue to describe my work as exclusively informed by science—both formal research and my own empirical observations.
January 5th, 2016
Last month I spent a great deal of time thinking and writing about information visualization research, mostly bemoaning ways in which it usually misses the mark. A few days ago, Steven Franconeri of Northwestern University welcomed my invitation to talk about infovis research projects that would address real problems. When he asked if I already had a list of potential projects, I admitted that I haven’t written one, but offered to do so. I really should write down ideas for potential research projects when they occur to me, but I haven’t had a convenient place to record them. I’ll fix this soon. A few days ago I took a few minutes to scour my memory for ideas from the past, and quickly made a list of 18 potential projects. Before I publish a list, however, I’d like to collect ideas from you as well.
Infovis researchers would benefit from hearing about the problems that practitioners currently struggle to solve. It can be difficult to know what’s actually needed when you spend most of your time at a university, whether you’re a student or a professor. Infovis is still a young field with much to learn and much to develop. We who use data visualization in our work can help ourselves by helping the research community to understand what’s needed.
Infovis research projects fall into a few different categories. In general, they 1) study phenomena related to data visualization that we don’t fully understand through observation and experiments, 2) develop potential solutions to problems and test them to see if and how well they work, or 3) develop new conceptual structures (a.k.a., taxonomies) for understanding data visualization. Here’s a potential example of each:
- Determine the effects on a scatterplot’s aspect ratio for interpreting the existence and nature of correlations.
- Develop and test a means to trigger blind sight (seeing things without conscious awareness) as a way of drawing someone’s attention to a particular area of a visual display, such as a particular piece of information in a dashboard that needs attention.
- Develop a clear way for people to think about the differences in which data visualizations should be designed to support data sensemaking (i.e., data exploration and analysis) versus data communication (i.e., presentation).
Please give this some thought. If you think of potentially useful infovis research projects, respond with a description. If it has already been done, I’ll let you know, assuming that I’ve come across it. After a few days of collecting your ideas, I’ll compile a full list of potential research projects and publish it on this website. I will also keep it updated with new ideas and with information about research projects that are undertaken to address them.