Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Lollipop Charts: “Who Loves You, Baby?”

May 17th, 2017

If you were around in the ‘70s, you probably remember the hard-edged, bald-headed TV police detective named Kojak. He had a signature phrase—“Who loves you, baby?”—and a signature behavior—sucking a lollipop. The juxtaposition of a tough police detective engaged in the childish act of sucking a lollipop was entertaining. The same could be said of lollipop charts, but data visualization isn’t a joke (or shouldn’t be). “Lollipop Chart” is just cute name for a malformed bar graph.

Bar graphs encode quantitative values in two ways: the length of the bar and the position of its end. So-called lollipop charts encode values in the same two ways: the length of the line, which functions as a thin bar, and the position of its bulbous end.

Lollipop Chart Example

A lollipop chart is malformed in that it’s length has been rendered harder to see by making it thin, and its end has been rendered imprecise and inaccurate, by making it large and round. The center of the circle at the end of the lollipop marks the value, but the location of the center is difficult to judge, making it imprecise compared to the straight edge of a bar, and half of the circle extends beyond the value that it represents, making it inaccurate.

What inspired this less effective version of a bar graph? I suspect that it’s the same thing that has inspired so many silly graphs: a desire for cuteness and novelty. Both of these qualities wear off quickly, however, and you’re just left with a poorly designed graph.

You might feel that this is “much ado about nothing.” After all, you might argue, lollipop charts are not nearly as bad as other dessert or candy charts, such as pies and donuts. This is true, but when did it become our objective to create new charts that aren’t all that bad, rather than those that do the best job possible? Have we run out of potentially new ways to visualize data effectively? Not at all. Data visualization is still a fledgling collection of visual representations, methods, practices, and technologies. Let’s focus our creativity and passion on developing new approaches that work as effectively as possible and stop wasting our time striving for good enough.

Take care,

Signature

What Is Data Visualization?

May 4th, 2017

Since I founded Perceptual Edge in 2003, data visualization has transitioned from an obscure area of interest to a popular field of endeavor. As with many fields that experience rapid growth, the meaning and practice of data visualization have become muddled. Everyone has their own idea of its purpose and how it should be done. For me, data visualization has remained fairly clear and consistent in meaning and purpose. Here’s a simple definition:

Data visualization is a collection of methods that use visual representations to explore, make sense of, and communicate quantitative data.

You might bristle at the fact that this definition narrows the scope of data visualization to quantitative data. It is certainly true that non-quantitative data may be visualized, but charts, diagrams, and illustrations of this type are not typically categorized as data visualizations. For example, neither a flow chart, nor an organization chart, nor an ER (entity relationship) diagram qualifies as a data visualization unless it includes quantitative information.

The immediate purpose of data visualization is to improve understanding. When data visualization is done in ways that do not improve understanding, it is done poorly. The ultimate purpose of data visualization, beyond understanding, is to enable better decisions and actions.

Understanding the meaning and purpose of data visualization isn’t difficult, but doing the work well requires skill, augmented by good technologies. Data visualization is primarily enabled by skills—the human part of the equation—and these skills are augmented by technologies. The human component is primary, but sadly it receives much less attention than the technological component. For this reason data visualization is usually done poorly. The path to effective data visualization begins with the development of relevant skills through learning and a great deal of practice. Tools are used during this process; they do not drive it.

Data visualization technologies only work when they are designed by people who understand how humans interact with data to make sense of it. This requires an understanding of human perception and cognition. It also requires an understanding of what we humans need from data. Interacting with data is not useful unless it leads to an understanding of things that matter. Few data visualization technology vendors have provided tools that work effectively because their knowledge of the domain is superficial and often erroneous. You can only design good data visualization tools if you’ve engaged in the practice of data visualization yourself at an expert level. Poor tools exist, in part, because vendors care primarily about sales, and most consumers of data visualization products lack the skills that are needed to differentiate useful from useless tools, so they clamor for silly, dysfunctional features. Vendors justify the development of dumb tools by arguing that it is their job to give consumers what they want. I understand their responsibility differently. As parents, we don’t give our children what they want when it conflicts with what they need. Vendors should be good providers.

Data visualization can contribute a great deal to the world, but only if it is done well. We’ll get there eventually. We’ll get there faster if we have a clear understanding of what data visualization is and what it’s for.

Take care,

Signature

We Never Think Alone: The Distribution of Human Knowledge

May 3rd, 2017

Only a small portion of the knowledge that humans have acquired resides in your head. Even the brightest of us is mostly ignorant. Despite this fact, we all suffer from the illusion that we know more than we actually do. We suffer from the “knowledge illusion,” in part, because we fail to draw accurate boundaries between the knowledge that we carry in our own heads and the knowledge that resides in the world around us and the minds of others. A wonderful new book by two cognitive scientists, Steven Sloman and Philip Fernback, titled The Knowledge Illusion: Why We Never Think Alone, describes the distributed nature of human knowledge and suggests how we can make better use of it.

The Knowledge Illusion

The following four excerpts from the book provide a sense of the authors’ argument:

The human mind is both genius and pathetic, brilliant and idiotic. People are capable of the most remarkable feats, achievements that defy the gods…And yet we are equally capable of the most remarkable demonstrations of hubris and foolhardiness. Each of us is error-prone, sometimes irrational, and often ignorant…And yet human society works amazingly well…

The secret of our success is that we live in a world in which knowledge is all around us. It is in the things we make, in our bodies and workspaces, and in other people. We live in a community of knowledge.

The human mind is not like a desktop computer, designed to hold reams of information. The mind is a flexible problem solver that evolved to extract only the most useful information to guide decisions in new situations. As a consequence, individuals store very little detailed information about the world in their heads. In that sense, people are like bees and society a beehive: Our intelligence resides not in individual brains but in the collective mind.

Being smart is about having the ability to extract deeper, more abstract information from the flood of data that comes into our senses…The mind is busy trying to choose actions by picking out the most useful stuff and leaving the rest behind. Remembering everything gets in the way of focusing on the deeper principles that allow us to recognize how a new situation resembles past situations and what kinds of actions will be effective.

In a world with rapidly increasing stores of information, it is critical that we learn how to find the best information (the signals) among the mounds of meaningless, erroneous, or irrelevant information (the noise) that surrounds us. Individually, we can only be experts in a few domains, so we must identify and rely on the best expertise in other domains. We don’t benefit from more knowledge; we benefit from valid and useful knowledge. One of the great challenges of our time is to find ways to identify, bring together, and encourage the best of what we know.

The power of crowdsourcing and the promise of collaborative platforms suggest that the place to look for real superintelligence is not in a futuristic machine that can outsmart human beings. The superintelligence that is changing the world is the community of knowledge. The great advances in technology aren’t to be found in creating machines with superhuman horsepower; instead, they’ll come from helping information flow smoothly through ever-bigger communities of knowledge and by making collaboration easier. Intelligent technology is not replacing people so much as connecting them.

 This book is well written and accessible. It provided me with many valuable insights. I’m confident that it will do the same for you.

Take care,

Signature

Can VR Enhance Data Visualization?

May 1st, 2017

In addition to the growing hype about AI (artificial intelligence) and NLP (natural language processing) as enhancers of data visualization, VR (virtual reality) is now making the same erroneous claim. How does VR enhance data visualization? Here’s an answer that was recently given by Sony Green, the head of business development for a startup named Kineviz:

For a lot of things, 2D is still the best solution. But VR offers a lot of advantages over existing data visualization solutions, especially for certain kinds of data. When you get into really high dimensional data, something like 100 different dimensions per node. It’s difficult to keep track of all that info with lots of 2D graphics and it becomes a very large cognitive load for people to track them on multiple screens at once.

VR allows us to tap into our natural ability to process special information. Without looking around, we have an innate understanding of the spaces we are in because that’s how our brains are wired. In a simulated environment created by VR, we use these natural ways of processing information that a 2D screen can’t offer.

Furthermore, VR opens up use cases that were previously impossible by lowering the barrier for common users. You don’t have to be a data scientist: anyone who can play a game can use VR to explore data science in a way that is intuitive.

TechNode, Emma Lee, April 28, 2017

So, VR supposedly “offers a lot of advantages.” What are these advantages? According to Green, VR makes it possible for our brains to process “100 different dimensions.” This isn’t true. VR adds a simulation of a single spatial dimension: depth. I can think of no way that VR can enable our brains to process more than one additional dimension of data compared to what we can process using 2-D displays. Plus, the simulation of depth is of little benefit, for we don’t perceive depth well, unlike our perception of 2-D space (up and down, left and right). And let us not forget that we can only hold from three to four chunks of visual information in working memory at once, so even if VR could add many more dimensions of data in some way, it would be of no use to our limited brains if we weren’t able to process all of those dimensions simultaneously.

What else can VR do? “VR allows us to tap into our natural ability to process special information.” Apparently, this special information has something to do with spatial awareness, but how does this help us visualize data? According to Sony Green, we’d better figure it out and get on board, because, with VR, data exploration and analysis can be done by anyone who can play a game. Who knew that data analysis was so easy? The claim that “without looking around, we have an innate understanding of the spaces we are in” is humorous. We have no understanding of the spaces that we’re in without looking around or exploring them in some other way, such as by touch.

VR attempts to simulate the 3-D world that we live in. In the actual world, I can place data visualizations throughout a room on various screens or printed pages, and I can then walk up to and examine one at a time. Similarly, VR can place data visualizations throughout a virtual room, and when it does I must still virtually walk around to view them one at a time. Are the data visualizations themselves enhanced? Not in the least. Making the graphs appear more three-dimensional than they appear on a flat screen adds no real value.

Years ago I was approached by someone who was creating data visualizations for the VR environment Second Life. She was enthusiastic about her work. When I took a look, I found a collection of 3-D bar graphs, line graphs, scatterplots, etc., which I could walk around and even upon, looking down from the lofty heights of tall bars and lines, and with virtual superpowers I could even fly around them, but this actually made the graphs harder to read. It is much easier and efficient to sit still and view 2-D data visualizations on my desktop monitor.

Just to make sure that I haven’t missed any new uses of VR for data visualization, I did a quick search and found nothing but more of the same. In the example below, the Wall Street Journal allows us to ride along a line graph of the NASDAQ, much like riding a roller coaster:

WSJ VR

Imagine that you’re viewing this using a VR headset. What useless fun! And in the example below, Nirvana Labs allows us to view a map (currently off the bottom of the screen), a bar graph (the transparent vertical cylinders), and a line graph (the bottom edge appears at the top of the screen), but they are much harder to read in VR than they would be as a 2-D screen display. A VR headset makes it possible for us to walk around the graphs, but that isn’t useful.

Nirvana VR

I have seen 3-D displays of physical objects that are actually useful, but 3-D displays of graphs are almost never useful, and placing them in VR doesn’t change this fact.

Don’t let yourself be suckered in by false marketing claims. Software vendors are always looking for some new way to separate us from our money. When you encounter people who claim that VR adds value to data visualization, ask them to prove it. Request an example of VR that works better than a 2-D display of the same data. Look past the cool factor and attempt to make sense of the data. If you come across a beneficial use case for data visualization in VR, I’d love to see it.

Take care,

Signature

Do tooltips reduce the need for precision in graphs?

April 18th, 2017

This blog entry was written by Nick Desbarats of Perceptual Edge.

Should you include grid lines in your graph? If so, how many? Is it O.K. to add another variable to your graph by encoding it using size (for example, by varying the size of points in a scatterplot) or color intensity (for example, by representing lower values as a lighter shade of blue and higher ones as darker)? How small can you make your graph before it becomes problematic from a perceptual perspective? These and many other important graph design decisions depend, in part, on the degree of precision that we think that our audience will require. In other words, they depend on how precisely we think that our audience will need to be able to “eyeball” (i.e., visually estimate) the numerical values of bars, lines, points, etc. in our graph. If we think that our audience will require no more than approximate precision for our graph to be useful to them, then we can probably safely do away with the gridlines and add that other variable as color intensity or size. If we have limited space in which to include our graph on a page or screen, we could safely make it quite small since we know that, in this particular case, the reduction in precision that occurs when reducing a graph’s size wouldn’t be a problem.

Small Graph

Our audience won’t be able to eyeball values all that precisely (what exactly are Gross Sales or Profit for the South region?), though such a design would provide enough precision for our audience to see that, for example, the West had the highest Gross Sales, and that it was about four times greater than the East, which had relatively low Profit, etc. Despite its lack of high precision, this graph contains many useful insights and may be sufficient for our audience’s needs.

If, on the other hand, we’ve spoken to members of our audience and have realized that they’ll need to visually estimate values in the graph much more precisely in order for the graph to be useful to them, then we’re going to have to make some design changes to increase the precision of this graph. We might need to add gridlines, break the quantitative scale into smaller intervals, find a more precise way to represent Profit (for example, by encoding it using the varying lengths of bars or the 2D positions of points), and perhaps make our graph larger on the screen or page.

Side-by-side graphs

As you probably noticed, adding gridlines, making the graph larger, and breaking the quantitative scale into smaller intervals all came at a cost. While these changes did increase the precision of our graph, they also made it busier and bigger. Being forced to make these types of design trade-offs is, of course, very common. In nearly every graph that we design, we must struggle to balance the goal of providing the required level of precision with other, competing goals such as producing a clean, compact, uncluttered, and quick-to-read design.

What if we know, however, that our graph will only ever be viewed in a software application that supports tooltips, i.e., that allows our audience to hover their mouse cursor or finger over any bar, line, point, etc. to see its exact textual value(s) in a small popup box?

Small graph with tooltip

In this case, perfect precision is always available if the audience ever needs to know the exact value of any bar, point, etc. via what researcher Ben Shneiderman termed a “details-on-demand” feature. In the 1990’s, Shneiderman noted that suppressing details from a visualization and showing specific details only when the user requests them enables the user to see a potentially large amount of information without being overwhelmed by an overly detailed display. A well-designed visualization enables users to see where interesting details may lie and the details-on-demand feature then enables them to see those details when they’re needed, but then quickly hide them again so that they can return to an uncluttered view and look for other potentially interesting details.

So, does the availability of details-on-demand tooltips mean that we can stop worrying about precision when making design decisions and optimize solely for other considerations such as cleanness? Can we set the “precision vs. other design considerations” trade-off aside in this case? I think that the answer to this question is a conditional “yes.” We can worry less about precision if we know all of the following:

  1. Our graph will only be viewed in a software application that supports tooltips (which most data visualization products now support and enable by default). If we think that there’s anything more than a small risk that our audience will, for example, share the graph with others by taking a screenshot of it or printing it (thereby disabling the tooltips), then precision must become one of our primary design considerations again.
  2. Our audience is aware of the tooltips feature.
  3. Our audience will only need to know precise values of the bars, points, lines, etc. in our graph occasionally. If we think that our audience will frequently need to know the precise values, giving them a lower-precision graph with tooltips will force them to hover over elements too often, which would obviously be far from ideal. In my experience, however, it’s rare that audiences really do need to know the precise values of elements in a graph very often—although they may claim that they do.

If we don’t know if all three of these conditions will be true for a given graph, we don’t necessarily have to ramp up its size, add gridlines, etc. in order to increase its precision, though. If we have a little more screen or page real estate with which to work, another solution is to show a clean, compact, lower-precision version of our graph, but then add the textual values just below or to the side of it. If the audience requires a precise value for a given bar, point, etc. in our graph, it’s available just below or beside the graph.

Small graph and table
Graph with columns of text

If we think that, for example, our audience is going to be embedding this graph into a PDF for a management meeting (thus disabling the tooltips) and that higher precision will be required by the meeting attendees, this would be a reasonable solution. For some graphs, however, the set of textual values may end up being bigger than a higher-precision of version of the graph, in which case the higher-precision graph may actually be more compact.

As with so many other data visualization design decisions, knowing how to balance precision versus other design considerations requires knowing your audience, what they’re going to be using your visualizations for, and—particularly in this case—what devices, applications or media they’ll be using to view the visualization.

Nick Desbarats