| |
|

|
Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
my library.
|
| |
May 10th, 2013
This morning, in my personalized list of Google alerts, I spotted a link to a new video by PBS about data visualization. I’m a longtime, avid supporter of PBS, so I was hoping for something useful. I also knew, however, that PBS doesn’t always vet its content adequately (some of the self-help gurus that PBS features are laughable) and that it doesn’t always get the story right. So, I held my breath, hoping for the best when I followed the link to the PBS video “The Art of Data Visualization.”
My spirit rose when the video began with the words of Edward Tufte and his image filled the screen.
I nearly swooned as Tufte calmly and eloquently uttered statements such as “Style and aesthetics cannot rescue failed content” and “There are enormously beautiful visualizations, but it’s as a byproduct of the truth and the goodness of the information.” At last, I thought, a professionally made video that features the best of data visualization. Within seconds, however, I found that Tufte served only as the bookends of this video and that much of the content in between conflicted with his statements.
Here are a couple samples:
And, of course, a video about data visualization is not complete without at least one of the infamous monstrosities created by David McCandless.
I can only imagine how Tufte must have felt when he saw the final product and discovered how his statements were contradicted by much of the other content that PBS chose to include. It is because of this possibility that I turn down invitations to participate in projects like this video that don’t allow contributors to control the content. Unless you have a contract that grants you the right to review and approve the final product, great harm can be done to your reputation and you can unwittingly participate in a project that undermines your work.
Wouldn’t it be wonderful if PBS or some other respected media provider decided to work closely with leaders in the field of data visualization to present it at its best and most useful? I’m betting that I could get many of my colleagues—several of the best and brightest in the field—to participate in this project with enthusiasm if we were given the right to work closely with the production team and then review and approve the final content. Perhaps we could even use Tufte’s portions of “The Art of Data Visualization,” but fill the middle with a consistent message about the true potential of data visualization to enlighten with beauty as “a byproduct of the truth and the goodness of the information.”
Take care,

May 10th, 2013
Presenting quantitative information is a specialized form of communication. Like all forms of communication, quantitative data presentation is most effective when we follow a few best practices, such as the following seven tenets.
- Know your data. Until you understand the stories that live in your data, you can’t begin to tell them.
- Know your audience. Unless you understand what matters to your audience, you won’t know what is of interest and use to them.
- Determine your message. Every dataset contains multiple stories. You can’t tell them all at once. Before you present quantitative information, you must determine the specific message or messages that you want to communicate. Start by writing a sentence or two or three to express the message before moving on to determine the ideal means of expression.
- Reduce the data to what’s needed to communicate the message. Pare the data down to the essence of what your audience must see to understand the message. What’s essential usually involves more than a simple set of primary values (e.g., monthly sales figures), for without context in the form of comparisons, numbers mean little. For example, monthly sales figures compared to target values or to values for the same months last year are more meaningful than sales figures alone.
- Determine the best means of expression. Some quantitative messages are best communicated with words, some with tables of numbers, some with graphs, and some with a combination. Some messages are best displayed in a bar graph, some in a line graph, some in a scatter plot, and so on. Knowing which form of expression works best for the message that you’re trying to present requires a little training into how our eyes and brains process visual information. The principles are easy to learn, but they aren’t intuitive. I wrote the book Show Me the Numbers, in part, to teach these principles.
- Design the display to communicate simply, clearly, and accurately. Include nothing that isn’t data unless it’s needed to support the data. Unnecessary color variation and visual effects, or even grid lines in a graph when they aren’t needed, will detract from the message. Non-data elements that are needed should only be visible enough to do their job and never so visible that they call attention to themselves. Non-data elements should sit politely in the background so the information stands out clearly in the foreground. If some information is more important to the message than other information, do something visual to feature it. For example, a brighter color or thicker stroke would make a particular line in a line graph stand out more than the others.
- Suggest a way to respond. Whenever possible, make it easy for your audience to respond with appropriate action by suggesting specific steps. Most quantitative messages aren’t presented merely to inform, but also to motivate a useful response.
Take care,

May 9th, 2013
A “straw man” is a flawed form of argument that occurs when one side attacks a position that isn’t actually held by the other side (the “straw man”) and then acts as though the other side’s position has been refuted. People usually construct straw men when they cannot legitimately refute an opponent’s position. As such, a straw man is a dishonest and fallacious form of argument, but one that can be persuasive when the audience is not aware of the facts.
I learned about straw men as an undergraduate majoring in communication studies. I loved the course that I took in argumentation and debate back then because I found the rules of logic elegant, interesting, and easy to understand. I vividly remember, however, that most of my classmates didn’t take so naturally to these principles and frequently struggled to make their case. I’m ashamed to admit that I took far too much pleasure in tying my opponents into logical knots and luring them into logical traps.
Since those bygone days of youth, I have expanded what I learned in college by keeping up with work in the fields of critical thinking and brain science. I am now familiar not only with the rules of rational argument but also with many causes of flawed thinking. I have found, to my great disappointment, that this is not common knowledge, even among scientists and analysts. I am no longer surprised when academics in the field of information visualization—doctoral students and professors—conduct studies that are flawed in obvious ways.
I was prompted to think about straw men recently when I encountered a couple on the Web that were apparently constructed to fault the work of people like me who teach data visualization best practices. The first appeared in a recent series of articles about data visualization on the Harvard Business Review’s (HBR) website. I was invited to contribute an article to this series, but unfortunately didn’t have the time. I wish I could have participated, however, to correct the portrayal of business-related data visualization as skewed toward elaborate infographics rather than the simple uses of quantitative graphics that make up around 99% of the data visualizations created in organizations. The straw man that I noticed was constructed by Amanda Cox of the New York Times. I greatly admire the data graphics of the New York Times, including Amanda’s work in particular. Cox is an articulate spokesperson for journalistic uses of data visualization. For this reason, I was surprised when I read the following interaction in HBR’s interview with Amanda (emphasis mine):
[HBR]: It seems like there’s more focus on trying to get data viz to go viral than to make it “matter.”
[Amanda Cox]: There’s a lot where not much actionable comes out of it. I don’t know if the ratio is different from the ratio of bad writing to good, or bad restaurant openings to good, but I think it’s an important idea to focus on. There’s a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy.
I appreciated almost everything that Amanda said except the two sentences that I’ve highlighted above, which appear to be a jab at data visualization practitioners who promote the use of simple graphs over some of the elaborate (but often ineffective) infographics that routinely appear on the Web. Amanda’s statement is a straw man. No one “argues that everything could be a bar chart.” Anyone who did would not only be robbing the world of joy but also of meaning. Bar graphs are one effective means of displaying data among several, and they are only appropriate for particular data sets and purposes. I’m not sure why Amanda felt compelled to insert this little goad of a comment in the interview. If she has an actual case to make, she can surely do better than this.
On April 17th, I encountered a similar straw man constructed by Nathan Yau in his blog (emphasis mine):
Data is an abstraction of something that happened in the real world. How people move. How they spend money. How a computer works. The tendency is to approach data and by default, visualization, as rigid facts stripped of joy, humor, conflict, and sadness—because that makes analysis easier. Visualization is easier when you can strip the data down to unwavering fact and then reduce the process to a set of unwavering rules.
The world is complex though. There are exceptions, limitations, and interactions that aren’t expressed explicitly through data. So we make inferences with uncertainty attached. We make an educated guess and then compare to the actual thing or stuff that was measured to see if the data and our findings make sense.
Data isn’t rigid so neither is visualization.
Are there rules? There are, just like there are in statistics. And you should learn them.
However, in statistics, you eventually learn that there’s more to analysis than hypothesis tests and normal distributions, and in visualization you eventually learn that there’s more to the process than efficient graphical perception and avoidance of all things round. Design matters, no doubt, but your understanding of the data matters much more.
I agree with everything that Nathan says here, but not with what he implies in the text that I’ve highlighted. His comment about “efficient graphical perception and avoidance of all things round” appears to be a direct reaction to my position, but one that he’s morphed into a straw man. No one argues that there isn’t more to data visualization than perceptual efficiency and circle avoidance. (I suspect that Yau’s phrase “all things round” refers to an article that I wrote in 2010, “Our Irresistible Fascination with All Things Circular.”) No one who promotes the importance of efficient and accurate graphical perception argues that design matters more than understanding. In fact, it is our concern that people understand data clearly, accurately, and as fully as possible that leads us to teach people how to present data graphically in ways that work for human perception and cognition. There is indeed much more to data visualization than a rigid set of design rules, which is why, when I teach design principles, I do so in a way that enables my students to understand how and why these principles work so they can apply, bend, and sometimes break the rules intelligently.
What’s ironic about Yau’s claim is that he often features infographics as exemplary that are beautiful or otherwise eye-catching, but yield little understanding. Such examples can easily be found in his lists of the best data visualizations of the year. Given his training as a statistician, I’ve always found this puzzling.
Making data visualizations perceptible is not all there is, but it is certainly an essential requirement if we want people to understand what we’re trying to say. I’m sure that Cox and Yau agree, but they seem willing at times to sacrifice perceptual effectiveness for visual allure. When they do, understanding is diminished. There is no reason why perceptual effectiveness and visual allure cannot coexist. Leaders in the field of data visualization don’t always agree, but when we disagree and wish to state our case, we should build it on solid evidence and sound reason. Dismissive remarks and thinly veiled insinuations that aren’t accurate or backed by evidence don’t qualify as useful discourse.
Take care,

May 6th, 2013
When I fell in love with words as a young man, I developed a respect for publishers that was born mostly of fantasy. I imagined venerable institutions filled with people of great intellect, integrity, and respect for ideas. I’m sure many people who fit this description still work for publishers, but my personal experience has mostly involved those who couldn’t think their way out of a wet paper bag and apparently have no desire to try.
My most recent experience was with the academic publisher Taylor & Francis. They publish several academic journals, including one that I was asked to write for over a year ago: the “Journal of Computational and Graphical Statistics.” Specifically, I was asked to write a response to an academic paper entitled “Infovis and Statistical Graphics: Different Goals, Different Looks” by Andrew Gelman and Antony Unwin. When approached by Richard Levine of San Diego State University, the journal’s editor, I had grave reservations. Back in 2008 I wrote in this blog about an experience that I had with IEEE’s journal Computer Graphics and Applications. The article that I wrote in response to the editor’s request was pulled from publication at the very last minute, when without warning they sent me a contract that demanded the right to alter my work, without approval, however they wished. An author would be insane to grant this right. With this experience in mind, I expressed my concerns to Levine before agreeing to write for his statistics journal:
If the journal requires authors to transfer their copyrights to it, I’ll pass on the opportunity. I’m happy to grant exclusive rights of distribution to the journal, which is all that should matter. I never grant others the right to revise my work without permission, which is what can happen when copyrights are given away.
In response, Levine checked with Taylor & Francis and assured me that my terms were acceptable.
With this commitment in writing, I proceeded with confidence to write my response, which involved roughly two days of effort. Following the submission of my response in March of 2012, I began to navigate the arduous process that one inevitable faces when writing for an academic journal: rushed spurts of activity (peer reviews, copyedits, layout reviews, etc.) separated by months of inactivity. After all was complete, guess what Taylor & Francis asked me to sign? A standard contract that ignored our prior agreement. I quickly redlined the inappropriate sections of the contract and offered to sign it with my redlines intact. I didn’t hear back from them for a while, but finally received a new agreement that I assumed was a replacement for the original contract, which I signed and immediately returned. I was then informed that what I signed was an addendum to the original contract, not a replacement, and that I would still need to sign the original. I replied that I would gladly sign my redlined version of the original contract, as previously promised. At this point our correspondence was kicked upstairs to Eric Sampson, the Journals Manager at Taylor & Francis, who informed me that I must sign a copy of the original contract without redlines. You can imagine my dismay. I told Sampson that I never sign contracts that contain errors. He responded that Taylor & Francis did not have the right to publish my work unless I signed the original contract without alteration. Further, he said that I was preventing the journal from going to press and that they would therefore remove what I’d written from the journal. I couldn’t believe what I was hearing. I replied to Sampson with the following:
This is rather absurd. I agreed to sign the original agreement back when it was first sent to me provided that I could redline the parts that you agreed to negate in the addendum. In other words, there is no disagreement between us and you absolutely have the right to publish the article as originally promised. I abhor this kind of dysfunctional bureaucracy, which I’ve encountered far too often among publishers. I’ve spent a great deal of time working on this and I expect my article to be published as originally promised. Otherwise, why did I do all of this work? Before I wrote the article, I made it clear that the copyright would remain in my name and that I would not allow revisions to the article without my permission. I have honored my agreement; I would appreciate it if you would honor yours.
The following excerpts from Sampson reveals the extent to which Taylor & Francis is dysfunctional (un-italicized comments in brackets are mine):
We have done our absolute best to meet your requests. Our original agreement was that you would retain copyright provided we received a signed, unmodified license to publish. [Not true. I was never told when we forged our agreement that I would be required to sign an “unmodified license to publish”—a license that I had never seen.] When you objected to the license, T&F [Taylor & Francis] went above and beyond to create a special addendum to that license, which you signed, but the addendum is just that—the signed license to publish is still required, which you have refused to provide without significant modification. [The only modifications that I required were the removal or redlining of those sections that Taylor and Francis agreed would not apply to me.]
Our bureaucracy is required to make the publication process as smooth as possible, and we ask for copyright transfer so that, in the event of a copyright dispute, the ASA [American Statistical Association] and T&F protect authors’ rights instead of leaving them to fend for themselves. [So they’re just trying to help me? That’s what all this fuss is about? What an egregious lie!] The ASA publishes hundreds of articles a year. If we negotiated each and every one as we’ve done here, it would be all we do. [If they respected the rights of authors in their standard agreement, negotiations would seldom be necessary.]
The ASA and T&F have one of most permissive copyright agreements in publishing, and we take great pride in taking care of the authors who contribute to our journals. I’m sorry that you’ve found our efforts so lacking. We’ve genuinely done our best. [If this is their best effort, their routine efforts must be pitiful.]
At this point, as you might imagine, I was getting angry, so I wrote the following reply:
Contrary to your claim, you have not done your best. You are currently breaking an agreement that we made in the beginning. You waited until the last minute to send me a license that you knew was in conflict with my requirements. Now, despite the fact that we have agreed in principle to the terms, you are insisting that I sign a form that miss-states those terms. Please explain to me how this makes any sense.
In all of our correspondence, Sampson wrote only one sentence in response to my request that he “explain…how this makes any sense.” Here it is: “I have no idea why we can’t just edit the license, but the society and publisher prefer the addendum approach.” He, the Journals Manager, did not know why the original contract could not be redlined or revised to remove errors. After days of consultation with his superiors, he sent a final written response that still provided no explanation. Again and again in our correspondence I pointed out that there was absolutely no legal requirement that the original contract be signed without redlines or revisions. I invited him to have his legal department explain otherwise, if they disagreed. No explanation was ever provided because no rational explanation exists.
Taylor & Francis is an academic publisher. I would expect that any publisher, and especially one that serves the academic community, could respond to a clear and reasonable question with an answer that is better than the traditional response of tyrants, “Because we said so.” How do they get away with this behavior when dealing with the academics who write papers for their journals? Academics must publish in journals to advance in their careers. When academic journals demand of authors the right to alter their work without permission, students and professors feel that they have no choice but to surrender their rights, because they have no other avenue for publication to which they can turn. I think it time for this to change.
Take care,

May 2nd, 2013
The usefulness of understanding relationships within networks is becoming more apparent, so it is fortunate that our ability to explore and analyze networks by visualizing them is improving. Common examples of networks that analysts examine include connections between terrorists or connections between linked sites on the World Wide Web. While these networks in particular get a great deal of attention today, other more run-of-the-mill networks can be explored more insightfully as well, such as the connections between products that are often purchased together, which we’ve pursued as market-basket analysis for ages. The most common and typically most useful form of network visualization consists of nodes (things, such as people or products) and links (connections between things), displayed as a diagram in various arrangements. When networks are large, consisting of thousands or millions of nodes, node-link diagrams can become so overwhelmingly cluttered, they’re sometimes called “giant hairballs.” Consequently, those who study information visualization have been trying to develop ways to simplify and clarify these diagrams. A new approach described in a paper titled “Motif Simplification: Improving network visualization readability with fan, connector, and clique glyphs” (Proc. ACM CHI 2013, April 2013, 3247-3256) was recently introduced by Cody Dunne and Ben Shneiderman of the University of Maryland.
Here’s how Dunne and Shneiderman describe their approach in the paper’s abstract:
Analyzing networks involves understanding the complex relationships between entities, as well as any attributes they may have. The widely used node-link diagrams excel at this task, but many are difficult to extract meaning from because of the inherent complexity of the relationships and limited screen space. To help address this problem we introduce a technique called motif simplification, in which common patterns of nodes and links are replaced with compact and meaningful glyphs. Well-designed glyphs have several benefits: they (1) require less screen space and layout effort, (2) are easier to understand in the context of the network, (3) can reveal otherwise hidden relationships, and (4) preserve as much underlying information as possible.
In the paper’s introduction, they describe the problem more thoroughly:
Network visualizations are only useful to the degree they “effectively convey information to the people that use them…We believe that state of the art layout algorithms alone are insufficient to consistently produce understandable network visualizations.
One way forward is the use of aggregation, specifically by aggregating common network structures or subnetworks called motifs. Large, complex network visualizations often have motifs repeated throughout because of either the network structure or how the data was collected. Regardless of their cause, some frequently occurring motifs contain little information compared to the space they occupy in the visualization. Existing tools may highlight certain motifs, allow users to filter them out manually, or replace them with meta-nodes.
What they point out is that networks often consist of typical patterns of connection that exist in great numbers and when these patterns appear, even though they consist of many nodes and links, it isn’t necessary to see them individually in the diagram. In such cases, the complexity of many associated nodes and links can be displayed as a glyph. In this context, a glyph is a simple object—an icon of sorts—that represents a particular type of connection. They describe the approach that they tested as follows:
Many common network motifs present little meaningful information, yet can dominate much of the display space and obscure interesting topology. We believe that replacing these motifs with representative glyphs will create more effective visualizations as there will be far fewer nodes and edges [links] for layout algorithms and users to consider. We have chosen three motifs for our initial foray into motif simplification:
- A fan motif consists of a head node connected to leaf nodes with no other neighbors. As there may be hundreds of leaves, replacing all the leaves and their links to the head with a fan glyph can dramatically reduce the network size.
- A D-connector motif consists of functionally equivalent span nodes that solely link a set of D anchor nodes. Replacing span nodes and their links with a connector glyph can aid in connectivity comparisons.
- A D-clique motif consists of a set of D member nodes in which each pair is connected by at least one link. Cliques are common in biologic or similarity networks, where swapping for a clique glyph can highlight subgroup ties.
The three motifs are illustrated below using a standard node-link representation (from left to right, fan, D-connector, and D-clique):
Below are simple illustrations of the glyphs (the objects on the right of each example) that they designed to represent these motifs.

Fan(glyph on the right)

D-connector (glyph on the right)

D-clique (glyphs for 4, 5, and 6 member cliques below)
The following image illustrates how a network with a simple set of connections but many nodes and links could be simplified using simply D-clique glyphs:
And finally, here’s a more complex node-link diagram on the left displayed using glyphs on the right:
In the conclusion of the paper Dunne and Shneiderman write: “While users must learn the visual language of motifs and glyphs, there is a dramatic payoff in the usability and readability of the visualization.” From what I’ve seen, I’m confident that their conclusion is warranted.
In addition to the paper, which you can access using the link that I provided in the first paragraph above, you can also watch a video demonstration of this approach on YouTube.
If this approach interests you, there’s a way that you can play with it on your own for free:
We have implemented a reference implementation of motif simplification and made it publicly available as part of the NodeXL network analysis tool. NodeXL is a free and open source template for Microsoft Excel 2007/2010 that is tailored to provide powerful features while being easy to learn.
Enjoy.

Comments Off on From Giant Hairballs to Clear Patterns in Networks
|