Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Federal CIO Dashboard: We Can and Should Do Better Than This

August 3rd, 2009

Over the past few months, the Obama Administration has worked to apply technology to our nation’s problems and opportunities. I applaud the efforts of our recently appointed Federal CIO, Vivek Kundra, to invest more wisely in technology and to make useful data more available both within government and to the public. While welcoming and encouraging these efforts, it is important that we critique their effectiveness as well and speak up when they could be significantly improved. It is in this spirit of patriotism that I would like to point out flaws in the new Federal IT Dashboard that is currently available in beta release. As someone who has designed a great many dashboards, I can say without reservation that the Federal IT Dashboard is about as useful in its current form as a typical business dashboard, and this isn’t a compliment. Others have written about the Federal IT Dashboard in articles and blogs with nothing but praise. Although it’s tempting to rain nothing but praise on a child who’s performed poorly in the past when he makes an effort to improve, it’s important to supplement that encouragement with instruction as well, if you really care. Kundra states on the website: “We tapped the brightest and most innovative minds from Federal agencies, Congress, independent oversight organizations, and the private sector as we built the IT Dashboard.” The project team apparently failed to tap anyone who has expertise in quantitative data analysis and presentation—data visualization in particular. On the dashboard’s website, Kundra invites suggestions. I think it’s time for us who have the expertise that appears to be lacking in the dashboard’s design to lend a hand.

When we initially access the Federal IT Dashboard, here’s what appears on the site’s home page:

The pie chart and its three companion bars on the right automatically morph every few seconds to display a few measures of a different government agency’s IT projects. Unfortunately for those of us who might actually like the time we spend on this page to produce something useful, neither the slices of the pie nor the segments of the bars are labeled, so we have no idea what we’re seeing. Perhaps the home page was meant to function only as an opening splash page of sorts and we must go elsewhere for actual information.

Let’s select the Investments tab at the top and hope for something useful.

Aha! Here we see the pie chart and bars from before, but this time the parts are labeled. Now we’re getting somewhere. Well, actually, we’re not getting anywhere without a great deal of unnecessary effort. Why are the charts three dimensional? Despite their unfortunate popularity, three dimensional displays of two-dimensional data are not only superfluous, they also undermine the simple task of graph perception and comprehension. As Edward Tufte would say, this is “chartjunk.” It breaks one of the basic rules of data presentation: “Do no harm.”

Those of us with expertise in quantitative data displays almost unanimously despise pie charts. The one thing they have going for them is the clear message that they’re displaying parts of a whole. It would help, however, if we could actually compare those parts by comparing the slices of the pie, but visual perception isn’t tuned to compare areas effectively. It is, however, highly tuned to compare the lengths of bars. Had the percentages of the projects that fall into the three categories of “normal,” “needs attention,” or “significant concerns” (see the legend at the bottom) been displayed as three separate bars with a common starting position and labels to the left, rather than pie charts, we could have easily compared these percentages. As it is, to make sense of the pie chart we must keep referring to the legend and then read the numbers that appear next to each slice, because the pie doesn’t do the job on its own.

We’re faced with a similar problem when we try to use the three stacked bars to understand “project costs,” “schedules,” and “CIO evaluations,” because we can’t effectively compare segments of a bar arranged end to end. Three separate horizontal bars for each set of measures (for example, “Costs”) arranged one above the other with a common starting point, on the other hand, would be easy to compare.

Even if the information were displayed using appropriate graphs, it would still be of little use because we derive meaning from quantitative information primarily through comparisons, but for any of these measures we can only compare values related to the three qualitative states of projects—”normal,” “needs attention,” and “significant concerns”. At any one moment we can only see either all agencies combined or a single agency, but never multiple individual agencies which prevents us from comparing them, and we can only see one point in time, which prevents us from comparing what’s going on now to the past to observe how things have changed.

If we wish to compare service groups and agencies, however, we can move to another page, which displays IT projects in the form of a treemap.

Using this treemap, we can roughly compare projects among different service groups by using the sizes of rectangles to compare one measure (total IT spending in this example) and the colors of rectangles to compare a second (% change in IT spending in this example). If the treemap were better designed, we could now get a fairly good overview of how projects among service groups compare, but a couple of problems make it tough going. In the treemap above, projects are organized into four service groups: “Services for Citizens,” “Management of Government Resources,” “Service Types and Components,” and a truncated category that begins with “Support Delive…”). Unfortunately, if we want to identify individual projects in these categories, we must hover with the mouse over each in turn to get the name to appear in a tooltip window.

If we drill down into a particular service group by clicking it, we can see projects in that service group organized by agencies (“Defense and National Security,” “Health,” etc.).

Based on this view, however, can you actually see the boundaries that separate one agency from another? For some reason, the borders that separate them have become partly obscured. Eventually we can drill down to a level in the hierarchy where a treemap is no longer the best way to view the projects because the number of them could be more easily compared using one or more bar graphs, but this option isn’t available. And finally, when we’ve drilled down to the lowest level—a single project—the treemap view is entirely useless, as you can see below. The unlabeled big gray rectangle tells us only that spending on this project—whatever it is—didn’t change much since the previous year. Perhaps it didn’t even exist in the previous year.

Below the treemap in the bottom left corner we have the ability to change the colors that are currently being used to display percentage change in IT spending ranging from -10% (blue) to +10% (yellow). This ability is useful for ad hoc data analysis when flexibility is needed to respond to unanticipated conditions , but on an analytical application like this, which has been designed to display a specific set of measures for a specific set of purposes, it would make more sense to select a color ramp that works well and resist complicating the dashboard with choices that aren’t necessary.

If we wish to see how spending on federal IT projects has changed over the years, we can proceed to the Analysis section of the dashboard and select Trends. The first of two displays that are available for viewing time-series data is an animated bubble chart, which attempts to use the method popularized by Hans Rosling of www.gapminder.org.

The strength of this approach is when it’s used to tell a story. When Rosling narrates what’s happening in the chart as the bubbles move around and change in value, pointing to what he wants us to see, the information comes alive. Animated bubble charts, however, as much less effective for exploring and making sense of data on our own. I doubt that Rosling uses this method to discover the stories, but only to tell them once they’re known. We can’t attend more than one bubble at once as they’re moving around, so we’re forced to run the animation over and over to try to get a sense of what’s going on. We can add trails to selected bubbles, which make it possible to review the full path these bubble have taken, but if trails are used for more than a few bubbles the chart will quickly become too cluttered. Essentially, what I’m pointing out is that this is not the best way to display this information for exploration and analysis. A simpler display such as one or more line graphs would do the job more effectively. Perhaps you’re concerned that a line graph couldn’t display two quantitative variables at once, such as “Total IT Spending” and “Percent Change in IT Spending,” which appear in this bubble chart. Assuming that two quantitative variables ought to be compared as they change through time, two line graphs—one for each variable—arranged one above the other, would handle this effectively. One of the fundamental problems with the bubble chart above, however, is that the two quantitative variables that appear in it really don’t need to be seen together. There is no correlation between total IT spending and percentage change in IT spending from year to year, so there’s no reason to complicate the display by viewing them together.

Even if this animated bubble chart were a good visualization choice in this case, several problems in its design would undermine its usefulness. When I first look at it, I was puzzled for awhile about what “03. % Change in IT Spending” meant. I couldn’t understand the significance of “03. %…” It took awhile to figure out that each variable that appears on the graph was numbered, beginning with “01.” and ending with “05.”, which was completely meaningless and confusing.

Unlike the intuitive use of colors to that we saw in the treemap, the rainbow of colors that appear in the bubble chart are ineffective. The order of the various hues as they change from red to blue is not intuitive. Take these colors and ask people to put them in order from high to low and you’ll get a variety of answers.

Also, the ability to switch the quantitative scales from linear to logarithmic certainly makes sense to people who have been trained in statistics, but is confusing to most of the folks who would use this dashboard. For this reason, I believe this feature should be removed. While it is appropriate to include such functionality in a general purpose data analysis tool, custom analytical applications like the Federal CIO Dashboard should eliminate features that aren’t commonly useful and are potentially confusing in an effort to keep the application simple. Even those who understand how to use a log scale don’t need it available on this dashboard, because few of them would be satisfied using this bubble chart, but would rather download the data and explore it using a better analytical tool.

For those who recognize the limitations and flaws of the bubble chart, an alternative in the form of a bar graph is available. For our entertainment pleasure, when switching between the two, the bubbles morph into bars before our eyes and line themselves up along the horizontal axis.

The bar chart version is just plain silly. None of the bars are labeled until you click on them one at a time to make labels such as “Education (Dept of)” and “Homeland Security (Dept of)” appear. Knowing only the identity of the selected bars (the others remain unlabeled) and watching the bars move around as spending changes through time is eye-catching but almost totally meaningless. Once again, simple line graphs for comparing changing values for the selected items would do the job much better.

Because I wanted to learn something more useful about federal IT spending, I decided to take advantage of the data feeds that are provided, but once again ran into a wall. Unfortunately, the information that can be downloaded is limited to a current year’s snapshot, which includes three variables—total spending, new/upgrades spending, and maintenance spending—broken into three time-based categories: last year’s actual spending, the current year’s enacted spending, and next year’s budgeted spending. Time series aren’t available nor is there a way to compare actual to plan. In other words, the comparisons that I would have found most meaningful couldn’t be made based on the information that’s available.

I want to encourage Vivek Kundra to complement his fine intentions with more effective designs. There’s no need to duplicate the mistakes that most businesses still make when working with information. Data analysis and presentation best practices are not a mystery and aren’t difficult to learn. Several of us who know and care about this are available to help. I suspect that others would be willing, as I am, to assist free of charge. America can do better than this. We have a great opportunity to use information technology to make the world a better place. Let’s not miss it.

Take care,

An Excellent Primer on Geo-spatial Analysis

July 28th, 2009

In the past I’ve recommended two books on geo-spatial data analysis and presentation: Designing Better Maps by Cynthia Brewer and GIS Cartography: A Guide to Effective Map Design by Gretchen Peterson. Today I’d like to add a third to the list: The ESRI Guide to GIS Analysis, Volume 1: Geographic Patterns and Relationships by Andy Mitchell.

Although Mitchell’s book has been available since 1999, it was new to me when I recently purchased and read it. I was looking for a book that would serve as a good primer for folks who are just getting started with geo-spatial analysis, and this book does the job quite well. It assumes that you know little about geo-spatial analysis and lays out the basics clearly and simply. Mitchell outlines the books contents as follows:

In this book, we’ve identified the most common geographic analysis tasks people do every day in their jobs:

  • Mapping where things are
  • Mapping the most and least
  • Mapping density
  • Finding what’s inside
  • Finding what’s nearby
  • Mapping change

As someone who knows a great deal about data analysis and visualization in general but a limited amount about geo-spatial analysis in particular, I learned a great deal from this book. It was useful to have several gaps in my knowledge filled in as Mitchell took me on a superbly organized and simply expressed journey through the fundamentals. If you’re new to GIS and want a good primer to start the journey on the right foot, I highly recommend this book.

The Global BusinessObjects Network Has Pie On Its Face

July 9th, 2009

This blog entry was written by Bryan Pierce of Perceptual Edge.

Here at Perceptual Edge, we like to show real-world examples of poor graph design to teach people what not to do, because knowing what to avoid and why it doesn’t work is an important step in the learning process. We often receive emails from people who follow this blog or have read Stephen’s books or articles who want to share examples that they’ve come across. The pie chart below is one such example:

This graph was produced by the Global BusinessObjects Network to promote the upcoming 2009 SAP BusinessObjects User Conference. It’s supposed to show what BusinessObjects products the attendees of last year’s conference used. Regular readers of Stephen’s work know that he dislikes pie charts because they don’t work as effectively as alternatives like bar graphs, so I won’t revisit the general problems with pie charts. (If you’re interested in more information about pie charts, Stephen wrote a full review detailing their significant problems and single, rarely-needed strength.) Unfortunately, the design of this graph is quite terrible, even by pie chart standards.

This graph is dysfunctional for two major reasons. First, only the large slices have been directly labeled. All of the small slices are labeled using a legend, but there are so many slices that it’s impossible to associate the colors in the legend with the slices in the pie because so many of the colors are so similar. Sure, people can just read the values from the legend and ignore the pie, but how is that better than a simple table?

The second major problem with this graph is this: Pie charts are designed to display part-to-whole data, with each slice representing one discrete part of the whole and all of the slices adding up to 100%. For instance, you could show the breakdown of sales by region for a company. In the graph above, however, the slices add up to significantly more than 100% because the categories aren’t mutually exclusive. For instance, 67% of attendees use BusinessObjects Web Intelligence, but many of those people use other BusinessObjects software, too, so they’re being counted several times. The end result is that the blue slice that represents BusinessObjects Web Intelligence has a value of 67%, but it only takes up about 15% of the space in the pie. The visual picture conveyed by the pie chart misrepresents the data.

Both of these problems could have been solved by using a horizontal bar graph. With a horizontal bar graph, all of the bars could be the same color and there would be no problem labeling all of the slices directly, which would address the first problem, and because bar graphs are more versatile than pie charts and can be used for more than just part-to-whole relationships, it wouldn’t be confusing when the bars added up to more than 100%, which solves the second problem.

At Perceptual Edge, we’ve seen plenty of graphs like this, and worse. But it always irks us when we see examples like this coming out of the Business Intelligence industry, an industry that should know better. In this case, the Global Business Objects Network puts on a large conference with dozens of educational courses on BI related subjects, including analytics and dashboards. How do they expect people to trust them with the sophisticated visualization training, when their simple graphs are so dysfunctional?

-Bryan

Xcelsius Developers Debate the Merits of “Flashy vs. Few”

July 8th, 2009

I just ran across an interesting and thoughtful blog post at EverythingXcelsius.com by Ryan Goodman, the founder of Centigon Solutions, titled “Ryan Goodman’s Take on ‘Flashy vs. Few.'” In it, Ryan responds to a lengthy discussion that has been brewing for awhile among Xcelsius developers about the flashy features of that product versus the best practices of dashboard design that I teach. If the tension between making dashboards flashy or making them effective interests you, I think you’ll find Ryan’s thoughts and the ensuing discussion worthwhile.

Here’s the URL: http://everythingxcelsius.com/2009/07/ryan-goodmans-take-on-flashy-vs-few.html

Take care,

At Last, a Scientific Approach to Infographics

June 24th, 2009

If you’ve been reading this blog regularly for awhile, you know that I occasionally bemoan the sad state of most information graphics (infographics). Most of the folks who produce infographics lack guidelines based on solid research. In their attempt to inform, describe, or instruct, most of the infographics that I’ve seen fail-many miserably. I’m thrilled to announce, however, that a new book is now available that takes a great step toward providing the guidelines that are needed for the production of effective infographics.

If you were to browse the books in my library, you would soon discover that it’s easy to tell which I like the most: they’re the ones that have a large number of pen marks in them-mostly lines to delineate passages as important, with occasional checks and asterisks, along with annotations. If you flipped through my new copy of Visual Language for Designers: Principles for Creating Graphics that People Understand by Connie Malamed, you would see lines and notes on almost every page. Its contents are important, interesting, spot on, and beautifully expressed.

Visual Language for Designers: Principles for Creating Graphics that People Understand,
Connie Malamed, Rockport Publishers, Inc., 2009

Malamed is a cognitive scientist, artist, and educator. As such, she recognizes the need for infographics to be designed with an understanding of what actually works, based on empirical research. She proposes design principles that have emerged from an understanding of how the eyes and mind function, drawn from research in the fields of visual communication and graphic design, learning theory and instructional design, cognitive psychology and neuroscience, and information visualization. If the folks who produce infographics read this book and follow the scientifically-based principles that it teaches, they will move the field of infographics to a new level of usefulness.

This book is a great foundation on which to build more specialized principles for the design of effective infographics. To extend and deepen these guidelines beyond the general principles that Malamed has synthesized from related fields, the field now needs research that focuses on infographics in particular. Organizations that claim expertise in infographics and “visual thinking” should encourage this research, in part by reaching out to universities and other research organizations.

Visual Language for Designers is affordably priced: only $40 for a large hardbound book, printed in color. For this we can thank Rockport, the publisher of a growing body of work on graphics. Unfortunately, the paper that Rockport chose is a bit too glossy, which causes light to reflect off the pages into the reader’s eyes, making clear viewing of images a bit difficult at times. Using paper with a matte finish works better, which is a practice that Rockport should consider.

The only downside of the book’s contents is that some of the examples of infographics fail to effectively illustrate Malamed’s informative text. Malamed chose to illustrate each concept and principle using existing information graphics that were produced by others. Although this works most of the time, this approach fails at times to illustrate her points as clearly and specifically as possible. In some cases this might be because good examples don’t exist. In others, the examples fail because they include too much visual complexity to clearly feature the point that Malamed is trying to illustrate. In several cases Malamed could have illustrated her points more effectively by creating her own illustrations, specifically designed for the task. Several examples in the book fail in minor ways, but a few fail altogether. Despite the examples that fall short, many are wonderful examples of well-designed infographics. For instance, the well-crafted work of Nigel Holmes is prominently featured through the use of several examples.

It’s interesting that Malamed used examples only to illustrate the right way to design infographics. I believe the book’s ability to instruct would have benefited from the use of poor examples as well-examples of the common mistakes that are often made, which undermine effectiveness. Knowing the mistakes to avoid, understanding why they don’t work, and learning to recognize them in actual practice is a big part of learning effective design. In several of the examples that are included, mistakes can be found, but Malamed misses the opportunity to point them out.

If you design infographics, by all means buy this book, read it, put its principles into practice, and keep it handy for occasional review. Malamed has provided a wonderful resource for infographic design that is sorely needed. I suspect that Visual Language for Designers will become a classic. If it doesn’t, the field of infographics may continue to produce a great many ineffective displays.

Take care,