Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
May 2nd, 2013
The usefulness of understanding relationships within networks is becoming more apparent, so it is fortunate that our ability to explore and analyze networks by visualizing them is improving. Common examples of networks that analysts examine include connections between terrorists or connections between linked sites on the World Wide Web. While these networks in particular get a great deal of attention today, other more run-of-the-mill networks can be explored more insightfully as well, such as the connections between products that are often purchased together, which we’ve pursued as market-basket analysis for ages. The most common and typically most useful form of network visualization consists of nodes (things, such as people or products) and links (connections between things), displayed as a diagram in various arrangements. When networks are large, consisting of thousands or millions of nodes, node-link diagrams can become so overwhelmingly cluttered, they’re sometimes called “giant hairballs.” Consequently, those who study information visualization have been trying to develop ways to simplify and clarify these diagrams. A new approach described in a paper titled “Motif Simplification: Improving network visualization readability with fan, connector, and clique glyphs” (Proc. ACM CHI 2013, April 2013, 3247-3256) was recently introduced by Cody Dunne and Ben Shneiderman of the University of Maryland.
Here’s how Dunne and Shneiderman describe their approach in the paper’s abstract:
Analyzing networks involves understanding the complex relationships between entities, as well as any attributes they may have. The widely used node-link diagrams excel at this task, but many are difficult to extract meaning from because of the inherent complexity of the relationships and limited screen space. To help address this problem we introduce a technique called motif simplification, in which common patterns of nodes and links are replaced with compact and meaningful glyphs. Well-designed glyphs have several benefits: they (1) require less screen space and layout effort, (2) are easier to understand in the context of the network, (3) can reveal otherwise hidden relationships, and (4) preserve as much underlying information as possible.
In the paper’s introduction, they describe the problem more thoroughly:
Network visualizations are only useful to the degree they “effectively convey information to the people that use them…We believe that state of the art layout algorithms alone are insufficient to consistently produce understandable network visualizations.
One way forward is the use of aggregation, specifically by aggregating common network structures or subnetworks called motifs. Large, complex network visualizations often have motifs repeated throughout because of either the network structure or how the data was collected. Regardless of their cause, some frequently occurring motifs contain little information compared to the space they occupy in the visualization. Existing tools may highlight certain motifs, allow users to filter them out manually, or replace them with meta-nodes.
What they point out is that networks often consist of typical patterns of connection that exist in great numbers and when these patterns appear, even though they consist of many nodes and links, it isn’t necessary to see them individually in the diagram. In such cases, the complexity of many associated nodes and links can be displayed as a glyph. In this context, a glyph is a simple object—an icon of sorts—that represents a particular type of connection. They describe the approach that they tested as follows:
Many common network motifs present little meaningful information, yet can dominate much of the display space and obscure interesting topology. We believe that replacing these motifs with representative glyphs will create more effective visualizations as there will be far fewer nodes and edges [links] for layout algorithms and users to consider. We have chosen three motifs for our initial foray into motif simplification:
- A fan motif consists of a head node connected to leaf nodes with no other neighbors. As there may be hundreds of leaves, replacing all the leaves and their links to the head with a fan glyph can dramatically reduce the network size.
- A D-connector motif consists of functionally equivalent span nodes that solely link a set of D anchor nodes. Replacing span nodes and their links with a connector glyph can aid in connectivity comparisons.
- A D-clique motif consists of a set of D member nodes in which each pair is connected by at least one link. Cliques are common in biologic or similarity networks, where swapping for a clique glyph can highlight subgroup ties.
The three motifs are illustrated below using a standard node-link representation (from left to right, fan, D-connector, and D-clique):
Below are simple illustrations of the glyphs (the objects on the right of each example) that they designed to represent these motifs.
Fan(glyph on the right)
D-connector (glyph on the right)
D-clique (glyphs for 4, 5, and 6 member cliques below)
The following image illustrates how a network with a simple set of connections but many nodes and links could be simplified using simply D-clique glyphs:
And finally, here’s a more complex node-link diagram on the left displayed using glyphs on the right:
In the conclusion of the paper Dunne and Shneiderman write: “While users must learn the visual language of motifs and glyphs, there is a dramatic payoff in the usability and readability of the visualization.” From what I’ve seen, I’m confident that their conclusion is warranted.
In addition to the paper, which you can access using the link that I provided in the first paragraph above, you can also watch a video demonstration of this approach on YouTube.
If this approach interests you, there’s a way that you can play with it on your own for free:
We have implemented a reference implementation of motif simplification and made it publicly available as part of the NodeXL network analysis tool. NodeXL is a free and open source template for Microsoft Excel 2007/2010 that is tailored to provide powerful features while being easy to learn.
April 30th, 2013
I’ve written a great deal over the years about statistical narrative: telling stories with data. I often refer to the messages that we communicate with data as stories and talk about data analysis as the process of finding the stories that dwell in data. I describe one of the primary uses of data visualization as storytelling, but I’m a little uncomfortable when I do. Let me explain why.
A few days ago I was sent a link to an article by Saul Hymes, a medical doctor, titled “Give It Your Best Shot.” The subtitle, “A better narrative is required to counter the anti-vaccine movement’s fairy tales,” reveals the important concern that this article addresses. You’ve no doubt heard the claim that autism is caused by vaccinations. You’ve probably seen former centerfold Jenny McCarthy on TV warn against the risk of vaccinations as she tells the sad story of her autistic son, Evan. Perhaps you saw the Oprah Winfrey Show when McCarthy appeared as a guest and heard Oprah read a statement from the Centers for Disease Control (CDC) that “science has shown from multiple studies that there is no evidence that vaccines cause autism.” If so, you heard McCarthy reply, “My science is Evan, and he’s at home. That’s my science,” to which the audience responded with tearful applause. McCarthy made her case with a story.
The persuasive mechanism of stories is usually emotional, not rational. Stories ordinarily move us when they tap into our feelings, not our ability to reason. Storytelling with data, when true to its purpose, however, appeals to reason, asking people to think rationally and be moved primarily by what they come to understand, not by what they feel. What Saul Hymes admits in his article, much to his discomfort as a scientist and physician, is that he sometimes gives up in his attempt to persuade patients by presenting facts and switches to stories that do an end run around reason by targeting emotions. He does it, despite his scientific perspective and reliance on evidence and reason because it gets the job done, but he does so uncomfortably.
When I tell stories with data, I often include elements that people can connect with emotionally, but I try not to rely on the power of emotion as the primary mechanism of communication and persuasion. By using emotion as the primary mechanism, I would be tapping into a lesser-evolved part of humanity, and in so doing, miss an opportunity to help people learn to make better decisions through reason. I’m not saying that emotion isn’t important and that we’d be better off without it, but instead that emotion and reason have different strengths and serve different decision-making purposes. We become our better selves when we learn to recognize this difference and put reason in the driver’s seat when it’s needed. If I appealed to emotion to win my case when reason was needed, I would achieve the desired outcome, but not in a way that would help people make better decisions on their own in the future. Fighting feeling with feeling, even when done for the greater good, would seem like a betrayal of everything that I’m striving so hard to promote in my work.
We humans have evolved in a way that enables us to think rationally, which is an amazing gift. It is because of this ability that we can develop technologies that extend our reach and we can deal with one another in ways that make the world safer and more just. When emotion ties us together in compassion and mutual respect, we benefit from our pre-human ability to feel. When emotion, however, leads us to react in hateful ways that do harm, we must learn to shift into reason and rise above our emotional instincts.
By telling stories with data that rely on evidence and reason to persuade, we can help people learn to rely on this ability more naturally and habitually. If, however, we continue to replace evidence and reason with emotion rather than using it only to complement data, we encourage humanity to remain stuck, and the world will suffer. When we visualize data using means of representation that are inaccurate and difficult to perceive in an attempt to make them eye-catching and fun, we shift the means of communication and persuasion from reason to emotion. However, when we appeal to people’s emotions strictly to help them personally connect with information and care about it, and do so in a way that draws them into reasoned consideration of the information, not just feeling, we create a path to a brighter, saner future.
April 29th, 2013
In my recent article for the Visual Business Intelligence Newsletter, titled Building Insight with Bricks, I introduced “bricks” as a new way to display quantitative values geospatially (e.g., on a map), which in theory can be read and compared more quickly and precisely than bubbles. Here’s an example:
Since the publication of the article last week, my concern for a particular limitation of this approach has grown. I mentioned in the article that bricks, unlike bubbles, do not work when they overlap. Even though I recognized this deficiency from the start, I didn’t fully appreciate at the time how much this deficiency limits the usefulness of bricks. Comments from readers, however, have raised my awareness. I especially appreciate the response from Andy Cotgreave, who took the time to mock up examples of bricks vs. bubbles to illustrate the problem, and from Joe Mako, who challenged my assumption that geospatial displays don’t usually involve overlapping values.
While discussing this issue with Joe, I asserted that business uses of geospatial displays don’t typically involve overlapping values. Joe challenged my assertion, inviting me to defend it. As I began to construct my case, it gradually dawned on me that overlapping values are more prevalent than I imagined while designing bricks. This oversight occurred, not because I’m not familiar with broad and common uses of geospatial data visualization, but because I had narrowed my focus to a subset of use cases and concluded too swiftly that this subset was much larger than it actually is. I never stepped back to recognize and sufficiently test my assumption. Even an informal round of peer reviews involving some of the brightest minds in the field didn’t draw my attention to this oversight. I suffered from a blind spot while designing bricks that I never managed to correct.
Bricks are still useful; just not as broadly useful as I imagined and hoped. I failed to add as much value through the invention of bricks as I expected. I’m disappointed, but not discouraged in my effort. I was trying to solve a very real problem, and even though I’ve potentially solved it to a lesser degree than intended, I’ve succeeded more thoroughly perhaps in raising awareness about the problem. I’ll keep working on it, and I hope that you’ll join me. This is how science works. Our limited successes and even our total failures are useful and as such ought to be shared. Others can learn from our mistakes, but only if we make them known. I invite you—challenge you even—to succeed where I failed. If you do, I’ll be content that I contributed through my failure to your eventual success. After all, the benefit that our work delivers to the world is all that ultimately matters.
April 3rd, 2013
IEEE’s VisWeek Conference, the premier international conference on visualization, includes a sub-conference on Visual Analytics Science and Technology (VAST). VAST focuses on fundamental research contributions within visual analytics as well as applications of visual analytics, including applications in science, engineering, medicine, health, media, business, social interaction, and security and investigative analysis. One of the highlights of VAST is the annual challenge that invites researchers and other experts in the field to design visualization systems to solve a particular real-world analytics problem. This year’s challenge consists of three separate mini-challenges. I think that readers of this blog might find Mini-Challenge #2 particularly interesting. Here’s how the folks who run VAST describe it:
Mini-Challenge #2 tests your skills in visual design. The fictitious Big Enterprise is searching for a design for their future situation awareness display. The company’s intrepid network operations team will use this display to understand the health, security, and performance of their entire computer network. This challenge is also very different from previous VAST Challenges, because there is no data to process and no questions to answer. Instead, the challenge is to show off your design talents by producing a creative new design for situation awareness.
Much of my work focuses on the design of situation awareness displays, called dashboards. It is particularly challenging to display a great deal of information in a way that people can use to rapidly monitor what’s going on. If you’re involved in designing displays of this type, you might want to participate in this VAST challenge. You will not only have an opportunity to get feedback from some of the world’s top experts (including myself), but could also contribute to this important field of research by demonstrating design skills that are desperately needed but seldom demonstrated.
For more information about VAST Mini-Challenge #2, visit the conference website, where you can also register if you choose to participate.
April 1st, 2013
In response to my recent blog post about Tableau 8, largely a critique of packed bubble charts and word clouds, Chad Skelton of The Vancouver Sun wrote a rejoinder. In it he essentially endorsed my critique, but also argued that it is sometimes appropriate to use packed bubbles to present data to the general public, such as in a news publication, because “bar charts are kind of boring.”
Skelton’s assertion suggests that there is a hierarchy of interest among graphs, perhaps based on shapes and colors that are used to encode data. Can we place value-encoding objects such as rectangles (as in bars), lines, individual data points (such as dots), and circles (as in bubbles) on a continuum from boring at the low end of interest to eye-popping at the high end? Back in the 1980’s, William Cleveland and Robert McGill proposed a hierarchy of graphical methods based on empirical research, but theirs was a hierarchy of perceptibility—our ability to perceive the values represented by graphical objects easily and accurately. The utility of their hierarchy was obvious, because our ability to understand the information contained in a graph is directly tied to our ability to clearly and accurately perceive the value-encoding attributes (positions, lengths, areas, angles, slopes, color intensities, etc.). So, back to Skelton’s suggestion, is there a hierarchy of interest among graphical value-encoding methods, and, if so, does it trump perceptibility?
What is the opposite of boring that Skelton advocates? He provides a clue:
A lot of people who create data visualizations—whether reporters, non-profits or governments—are fighting tooth and nail to get people to pay attention to the data they’re presenting in an online world crowded with endless distractions. And when you’re trying to make someone take notice—especially if the subject is census data or transit figures—a little eye candy goes a long way.
Data visualizations aren’t just a way to present data. They’re often also the flashing billboard you need to get people to pay attention to the data in the first place.
Apparently, this quality of visual interest has nothing to do with the information that’s contained in a chart. Instead, in this case interest is a measure of someone’s willingness to look at a chart. The more eye-catching a chart is, the more interesting it is. It is meaningless to catch someone’s eye, however, if you fail to reveal something worth seeing. I have no problem with the fact that packed bubble charts are eye-catching; they trouble me because once they catch your eye they have little to say for themselves. A packed bubble chart is like a child that keeps screaming, “Look at me, look at me,” but just stands there with a silly grin on his face once you do. Unless you dearly love that child, the experience is just plain annoying.
The argument that a chart must exhibit eye-candy to catch the reader’s attention even when that is accomplished by displaying data in ineffective ways suffers from two fundamental problems:
- In a world of noise, screaming louder and louder is not an effective means of cutting through the noise and being heard. Screaming louder just creates more noise.
- It assumes that you cannot draw someone’s attention to a data display without using an inferior chart, one that is visually eye-catching but information-impoverished.
Regarding the first error, when people get tired of looking at lots of pretty-colored circles randomly arranged on a screen, what will we be forced to do next—make the bubbles constantly move around and spin? Regarding the second error, a graph that gets attention by displaying data in a manner that ineffectively informs is an unnecessary failure of design. Graphs can be designed to catch the eye and inform without compromise. Doing this, however, requires skill.
To make his case that packed bubble charts such as those introduced in Tableau 8 are useful to reporters such as him, Skelton shared the following example of a packed bubble chart that he published last year in The Vancouver Sun:
This chart addresses a potentially interesting topic to Vancouver’s residents, but it reveals very little. Only a few of the agency names are recognizable and only six of the bubbles include values. Had bars been used, the values wouldn’t need to be included, but there’s no way to decode values from the sizes of these bubbles. The size legend (0 through 10) on the bottom left cannot be used to interpret the values that the bubbles represent because it is one dimensional, based on length, which applies to the diameter of the bubbles, but the values are encoded by their areas, not their diameters. This chart is embarrassingly impoverished. If a reporter expressed information this miserably in words his job would be in jeopardy, but we give graphics a pass, treating them like decoration rather than content.
I wrote to Skelton and offered to demonstrate that a bar graph of this content need not be boring if he would kindly send me the information on which his packed bubbles chart was based. I requested all of the information that was available (apparently this information about public servant salaries resides in a publicly available database) about these agencies, salaries, etc., both current and past. Skelton graciously responded by sending the data, but only what he used to produce his own chart: the list of agencies and number of employees in each who earned the top 100 salaries. This information alone is of limited interest. As a Vancouver resident, I would want to know how this year was different from the past and not just the numbers of people, but also the amounts that they were paid. As Edward Tufte wrote long ago, “If the statistics are boring, then you’ve got the wrong numbers.” A chart of any type that contains this information alone will generate relatively little interest. Perhaps Skelton displayed it as packed bubbles rather than a bar graph to camouflage the fact that the information is rather limp. Dressing up a chart in glitter and spangles to generate interest that doesn’t exist in the information itself treats readers disrespectfully. In The Elements of Style, Strunk and White wrote: “No one can write decently who is distrustful of the reader’s intelligence, or whose attitude is patronizing.” I agree.
A chart that displays a rich set of interesting data will not be boring, even if it is a humble bar graph. To illustrate this fact, I began with the data that Skelton provided, and then added to it the kind of information that a journalist could include to engage the reader’s interest. (Please note that I fabricated much of the data in the display below to illustrate my point.)
This enriched set of information, displayed in a way that is easy to read, lends itself to more than a glance. It invites the reader to examine the story in depth.
Skelton is concerned with a significant challenge that journalists currently face:
For news organizations, this constant tension—between excitement and accuracy—is second nature. The most accurate way to portray a council meeting might be a photo of a bunch of bored looking seniors waiting in line to speak. But most newspapers would run the photo of the one, animated speaker waving their finger and shouting.
When did it become the job of journalism to manufacture interest and excitement that doesn’t exist and isn’t warranted? I know that news organizations are struggling to survive. I share Skelton’s concern that it is getting harder and harder to be heard in a world of increasing noise. Not every message is worth hearing, however, and when the message is worthwhile, cutting through the din in ways that rob the message of clarity and accuracy only adds to the noise.
How can reporters get people to read their stories and examine their graphs? I believe they can only do this in a lasting way by consistently providing content that is interesting, accurate, clear, and useful. If they do this often enough, they will become a trusted source. The New York Times is my primary source of news. I subscribe to this publication to show my support for expert journalism and to do what I can to keep it alive. I browse the articles on my iPad and read a few of them daily. How do I pick the articles that I read? I browse the titles, which clue me into the content. I don’t pick articles because they’re eye-catching. I pick those that are mind-catching.