From Giant Hairballs to Clear Patterns in Networks

The usefulness of understanding relationships within networks is becoming more apparent, so it is fortunate that our ability to explore and analyze networks by visualizing them is improving. Common examples of networks that analysts examine include connections between terrorists or connections between linked sites on the World Wide Web. While these networks in particular get a great deal of attention today, other more run-of-the-mill networks can be explored more insightfully as well, such as the connections between products that are often purchased together, which we’ve pursued as market-basket analysis for ages. The most common and typically most useful form of network visualization consists of nodes (things, such as people or products) and links (connections between things), displayed as a diagram in various arrangements. When networks are large, consisting of thousands or millions of nodes, node-link diagrams can become so overwhelmingly cluttered, they’re sometimes called “giant hairballs.” Consequently, those who study information visualization have been trying to develop ways to simplify and clarify these diagrams. A new approach described in a paper titled “Motif Simplification: Improving network visualization readability with fan, connector, and clique glyphs” (Proc. ACM CHI 2013, April 2013, 3247-3256) was recently introduced by Cody Dunne and Ben Shneiderman of the University of Maryland.

Here’s how Dunne and Shneiderman describe their approach in the paper’s abstract:

Analyzing networks involves understanding the complex relationships between entities, as well as any attributes they may have. The widely used node-link diagrams excel at this task, but many are difficult to extract meaning from because of the inherent complexity of the relationships and limited screen space. To help address this problem we introduce a technique called motif simplification, in which common patterns of nodes and links are replaced with compact and meaningful glyphs. Well-designed glyphs have several benefits: they (1) require less screen space and layout effort, (2) are easier to understand in the context of the network, (3) can reveal otherwise hidden relationships, and (4) preserve as much underlying information as possible.

In the paper’s introduction, they describe the problem more thoroughly:

Network visualizations are only useful to the degree they “effectively convey information to the people that use them…We believe that state of the art layout algorithms alone are insufficient to consistently produce understandable network visualizations.

One way forward is the use of aggregation, specifically by aggregating common network structures or subnetworks called motifs. Large, complex network visualizations often have motifs repeated throughout because of either the network structure or how the data was collected. Regardless of their cause, some frequently occurring motifs contain little information compared to the space they occupy in the visualization. Existing tools may highlight certain motifs, allow users to filter them out manually, or replace them with meta-nodes.

What they point out is that networks often consist of typical patterns of connection that exist in great numbers and when these patterns appear, even though they consist of many nodes and links, it isn’t necessary to see them individually in the diagram. In such cases, the complexity of many associated nodes and links can be displayed as a glyph. In this context, a glyph is a simple object—an icon of sorts—that represents a particular type of connection. They describe the approach that they tested as follows:

Many common network motifs present little meaningful information, yet can dominate much of the display space and obscure interesting topology. We believe that replacing these motifs with representative glyphs will create more effective visualizations as there will be far fewer nodes and edges [links] for layout algorithms and users to consider. We have chosen three motifs for our initial foray into motif simplification:

  • A fan motif consists of a head node connected to leaf nodes with no other neighbors. As there may be hundreds of leaves, replacing all the leaves and their links to the head with a fan glyph can dramatically reduce the network size.
  • A D-connector motif consists of functionally equivalent span nodes that solely link a set of D anchor nodes. Replacing span nodes and their links with a connector glyph can aid in connectivity comparisons.
  • A D-clique motif consists of a set of D member nodes in which each pair is connected by at least one link. Cliques are common in biologic or similarity networks, where swapping for a clique glyph can highlight subgroup ties.

The three motifs are illustrated below using a standard node-link representation (from left to right, fan, D-connector, and D-clique):

Below are simple illustrations of the glyphs (the objects on the right of each example) that they designed to represent these motifs.

Fan(glyph on the right)

D-connector (glyph on the right)

D-clique (glyphs for 4, 5, and 6 member cliques below)

The following image illustrates how a network with a simple set of connections but many nodes and links could be simplified using simply D-clique glyphs:

And finally, here’s a more complex node-link diagram on the left displayed using glyphs on the right:

In the conclusion of the paper Dunne and Shneiderman write: “While users must learn the visual language of motifs and glyphs, there is a dramatic payoff in the usability and readability of the visualization.” From what I’ve seen, I’m confident that their conclusion is warranted.

In addition to the paper, which you can access using the link that I provided in the first paragraph above, you can also watch a video demonstration of this approach on YouTube.

If this approach interests you, there’s a way that you can play with it on your own for free:

We have implemented a reference implementation of motif simplification and made it publicly available as part of the NodeXL network analysis tool. NodeXL is a free and open source template for Microsoft Excel 2007/2010 that is tailored to provide powerful features while being easy to learn.


Comments are closed.