Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Microsoft vs. Oracle business intelligence – Does Dundas make a difference?

July 10th, 2007

Today I ran across a story published in Australian IT, a web-based news site, entitled “SQL Service put Oracle on Notice,” by Barbara Gengler (July 10, 2007). In it, Gengler pitted the growing business intelligence (BI) capabilities of Microsoft against Oracle, citing the acquisition of Dundas’ data visualization product for SQL Server Reporting Services as a significant boon for Microsoft. I’m not prepared to compare these two behemoth’s BI capabilities (both lack much of what I consider vital), but I can’t resist stating that the acquisition of Dundas’ so-called data visualization capabilities doesn’t count for much. In fact, in my opinion, the inclusion of Dundas in SQL Server Reporting Services is in many respects a setback for Microsoft.

Rather than demonstrating even the slightest understanding of data visualization, the folks who made the decision to acquire Dundas’ software have reinforced my opinion that they still haven’t got a clue. Typical of most producers of visual display widgets, Dundas offers a vast library of charts that look like the work of engineers who sit around saying “Hey, look at what I can make a chart do…isn’t this cool?!” They forgot to include designers who understand that the goal of charts is to communicate data clearly, efficiently, and accurately, not to scream, “Forget the data, look at how cute I am.”

Here are two examples:

Dundas Chart
Dundas Dashboard

I want more from a software company with the resources of Microsoft. I would love to see Microsoft advance BI to a new level by introducing thoughtful and innovative data visualization capabilities. There are some very talented people at Microsoft Research who have the ability to do this, but I have yet to see any evidence of their influence in Microsoft’s products. Rather than banking on its ubiquitous presence and influence throughout the world as an assured fast track to BI dominance, how about demonstrating some of the innovation and thoughtful work? If Microsoft understood data visualization and took pride in its work, the addition of Dundas’ charts to SQL Server Reporting Services would be seen as a source of embarrassment rather than featured news in its BI marketing campaign.

Take care,

Signature

Visual Statistics — A worthwhile new book, but one that is definitely for statisticians

July 2nd, 2007

I returned late last week from nearly three weeks of work in Europe, which ended with a two-day workshop that I taught for the Swiss Statistical Society. Nestled in a majestic valley in the Swiss Alps, we spent our days talking about how these talented statistical analysts could enhance their work by learning to communicate their findings more clearly and by using their eyes to supplement abstract statistical techniques. Later this year at their annual conference, they will hear a keynote presentation from Michael Friendly, Ph.D., who is a professor in the Department of Psychology at York University in Toronto, Canada. Among his many talents, Friendly is a trained statistician and an aficionado in the use of visual techniques for statistical analysis. Along with two other authors, Forrest W. Young, Ph.D., of the University of North Carolina (recently deceased), and Pedro M. Valero-Mora, Ph.D, of the University of Valencia in Spain, Friendly has written a new book on the topic entitled Visual Statistics: Seeing Data with Dynamic Interactive Graphics (John Wiley & Sons, Inc., 2006). Always eager to find new sources of insight into data visualization, especially as it applies to analysis, I read the book during my recent stay in Europe.

I don’t intend to review the book comprehensively in this brief blog post, but I would like to comment on its potential usefulness for my primary audience, which consists largely of business people who work with data, but lack advanced statistical training. I was encouraged when I began to read the introduction that this might be a book I could recommend to this audience. The authors’ message rang true to my experience and seemed to share my goals:

Statistical data analysis provides the most powerful tools for understanding data, but the systems currently available for statistical analysis are based on a 40-year-old computing model, and have become much too complex. What we need is a simpler way of using these powerful analysis tools.

Visual statistics is a simpler way. Its dynamic interactive graphics are in fact an interface to these time-proven statistical analysis tools, an interface that presents the results of the hidden tools in a way that helps ensure that out intuitive visual understanding is commensurate with the mathematical statistics under the surface. Thus, visual statistics eases and strengthens the way we understand data and, therefore, eases and strengthens our scientific understanding of the world around us.

It is our aim to communicate the intrigue of statistical detective work and the satisfaction and excitement of statistical discovery, by emphasizing visual intuition without resorting to mathematical callesthenics [sic]…Seldom is there mention of populations, samples, hypothesis tests, and probability levels…This book is written for readers without strong mathematical or statistical background, those who are afraid of mathematics or who judge their mathematical skills to be inadequate; those who have had negative experiences with statistics or mathematics, and those who have not recently exercised their match or stats skills. Parts I, II, and III are for you.

The book only seems to consist only of Parts I, II, and III, so I interpret the final statement to mean that non-statisticians should find the book non-intimidating and accessible. What I discovered in reading the book, however, is that, despite how useful it might be as a primer in visual analysis for statisticians, it is steeped in the concepts and language of statistics, and lacks the explanations that would be needed by non-statisticians to make use of the material. I have no doubt that the authors attempted to reach out to non-statisticians. I suspect, however, that they are too immersed in an academic statistical mindset to recognize when they are using terms and discussing concepts that are unfamiliar to the uninitiated. Terms such as Box-Cox transformation, Euclidean space, kernel density curve, p-value, and Pearson’s chi square are par for the course. Early in chapter 2, which provides some actual data sets and analytical challenges that are used throughout the book, the reader is already faced with material like the following:

The spreadplot (a kind of multiplot visualization that is introduced in chapter 4) for the initial model, (GPE)(M) is shown in Figure 2.9 (on the following two pages). This model fits very poorly, of course (G2 = 107, df = 7, p < 0.001). The G2 measure is a badness-of-fit measure. Low values are good, high values are bad. The empty model, reported here, has a very large value of G2, meaning the fit is very poor, which, of course, it must be, since it has no terms. The hypothesis test, when rejected, as is the case here, indicates that the model does not fit the data. 

At this point, as someone whose statistical knowledge can fit comfortably in a thimble, my eyes began to glaze over. Please don’t misunderstand me. I am not saying that this is not a good book. I suspect that this is a very important book for statisticians, because it introduces them to the power of visual analysis, which most statisticians under-appreciate. This just isn’t a book for non-statisticians.

One more observation that I want to make about this book is one that applies to many books on data visualization: the value of books on this topic is dramatically undermined when they are not printed in color. I felt badly for the authors when they bemoaned this unfortunate decision by the publisher to save costs by printing the book in black-on-white:

Unfortunately, mosaic displays are best viewed in color, and we are forced to use black and white. (We do the best we can, but to be honest, the black-and-white versions…do not do justice to the mosaic displays. If you can view this online, please do; it will help).

It wasn’t only the mosaic displays that would have benefited from color. Perhaps the authors already had their contract in place with John Wiley & Sons, Inc., before they realized that color was not an option, and then found that they had no power to change this. If you ever plan to write a book about data visualization, get an up-front guarantee from the publisher that the book will be printed in color, or you’ll end up having to make sad disclaimers to your readers like the one above.

Take care,

Signature

Microsoft Excel’s Idea of Visual Data Analysis

June 3rd, 2007

No software product is used more than Microsoft Excel for the analysis and presentation of quantitative data. While its use is prolific, and it does some things very well, its charting functionality is rather sad. The charts feature dazzling visual effects that are perhaps useful for marketing, but you can only use them to present data effectively with discouraging effort. Nevertheless, it is fair to say that you can use Excel to present data effectively, but within severe limits, and only if you’re willing to work around its problems. To say it can be used for visual data analysis, however, is a stretch that exceeds its reach. To date, Excel is at best an infant in the world of visual data analysis, barely able to roll over.

Those of you who are familiar with visual data analysis and what good software does to support it will find Microsoft’s notion of visual data analysis entertaining. I invite you to watch Microsoft’s demo and let me know what you think of it. I suspect that the folks at Microsoft Research who understand information visualization must look at this and cringe.

Take care,

Signature

Business Objects Insight – A mind grind and waste of time

May 21st, 2007

You’ve got to hand it to the marketing folks at Business Objects: they’ve got balls. They don’t hesitate to make claims that are backed up by nothing but illusion. With the introduction of their new website called Business Objects Insight, however, they’ve taken marketing chutzpah to a whole new level. Want to solve the world’s great problems? Welcome to Business Objects Insight, the “world’s first mind grid,” the only site that provides “tools for data visualization, data collaboration, and a platform to publish challenges to the online community.” The challenges take on great problems of the modern world, such as global warming. Ignoring the fact that they are not the only site that does this (I’ll tell you about Many Eye’s in a moment), let’s look at what they’re actually providing.

Data visualization: What they call data visualization is really just Crystal Xcelsius, their product that makes the analysis and presentation of data look like a video game and work about as effectively as a eunuch in heat.

Data collaboration: I can’t tell that any collaborative functionality has been built into the site, other than a blog and the fact that people can display their Xcelsius applications there and others can look at and use them. As far as data collaboration goes, this is rather anemic.

Platform for challenges: This isn’t really a feature; it’s the declared purpose of the site. Participants are being challenged to develop data visualization’s using Xcelsius that are designed to solve major world problems. And why should people make the effort to save the world and why should they channel their world-saving talent into learning and using Xcelsius to do so? Because Business Objects is going to pay a heart-stopping million dollars to the creators of the best world-saving applications (or actually “up to a million dollars”, which, if you think about it could actually mean nothing at all).

This strikes me as a thinly-veiled marketing scheme to sell more copies of Xcelsius under the guise of solving world problems. Business Objects’ founder and Chairman Bernard Liautard declares:

Today the world becomes more intelligent. While there are a number of sites dedicated to aggregating and analyzing data, Insight is unique in providing members with tools for data visualization, data collaboration, and a platform to publish challenges to the online community. Our goal is to change the way problems get solved, to work on issues that have a global impact, and to challenge the conventions and paradigms of online communities.

Wow, this is quite a claim. If only Business Objects had the know-how and technology to do it. Until they actually develop or hire some expertise in the field of data visualization, they should stop claiming that they are using visualization methods to tackle even the simplest problems, let alone the great problems that plague our world. And until they have tools that provide effective visualization functionality, rather than the child’s toy of a product called Xcelsius, they should stick to selling data reporting tools that depend on the conventional paradigm of purely text-based displays.

If you’re interested in seeing a site that effectively uses data visualization as a means for people to exchange information and insights related to world problems, and does so in a way that supports true collaboration, take a look at Many Eyes, which was developed by IBM Research. The reason this site succeeds where Business Objects Insight does not is because it was designed by people who are experts in data visualization and data collaboration. Although the folks at Many Eyes are not making any grand claims about saving the world, they are providing a platform that could actually be used to support this effort.

What’s so sad about this is that there are real problems in the world that need solving, but Business Objects Insight, with its dysfunctional tools, will only waste people’s time, frittering away well-intentioned efforts and potentially good ideas that could be better applied elsewhere. If Business Objects really wants to help solve the problems of the world, why not throw their weight behind a data visualization and collaboration site that really works? Perhaps they have an ulterior goal.

Take care,

Signature

Business Object's Insight Screenshot

Dental work by road workers with jackhammers — Dashboard design gone awry

May 10th, 2007

Alright, I’m not really going to write about the having your dental work done by road workers with construction equipment, but I am going to write about something that is just as painful and absurd: information displays that are designed by software engineers who know nothing about design.

A few days ago a press release was published by the George S. May International Company to announce its dashboard solutions for “small to mid-sized” companies. Here’s a quote from the press release:

We have found that two major obstacles stand in the way of business owners managing their companies more effectively. One is the difficulty in understanding the data they have. The second is difficulty in determining the cause-and-effect relationships among the different data. Management Dashboards helps business owners overcome these obstacles.

While this accurately describes two common problems in business today, I don’t agree that the dashboards that George S. May offers do much to solve the problem. Like most dashboard providers, this company’s solutions communicate information poorly. Effective dashboards result from a combination of good technology and good design. These dashboards look like they’ve been designed by technologists who sit in their dimly lit cubicles all day banging out code, isolated from the world of people. Dashboards are a medium of communication. To work effectively, they must be designed to present the information that people need to do their jobs in a way that is clear, accurate, and efficient.

It makes me sad and even a little angry when software and service companies advertise information solutions that work this poorly, because it isn’t that difficult to learn how to do this right. To illustrate this point, I asked Bryan Pierce, the Operations Manager at Perceptual Edge who has been working with me since last December, to critique one of the dashboards that are featured by George S. May. Before coming to Perceptual Edge, Bryan had no experience with data visualization, and because his work doesn’t require him to be an expert in this field, what he has learned he has picked up mostly indirectly, by reviewing my articles, books, and blogs. In a short time, he has developed the skills that a company such as George S. May could use to produce solutions that really work. The rest of this blog entry was written by Bryan to illustrate how easily the visual design skills that are needed to dramatically improve dashboards can be learned.

Take care,

Signature

 


My name is Bryan Pierce. I am not a rocket scientist or a brain surgeon, nor do I need to be to understand and apply the principles of good information dashboard design. For the last six months, I have worked with Stephen Few at Perceptual Edge, handling the day-to-day operations. Using the skills I have picked up in that time, I am critiquing and providing recommendations for the improvement of a dashboard I recently found online, which was created by the George S. May International Company (http://www.gsmdashboards.com/): 

Prior to working with Stephen, I had no exposure to information dashboards; I wasn’t even familiar with the term. Just after Stephen offered me this job, I decided to read Information Dashboard Design so that I’d have a better understanding of Stephen’s work. Over the past few months, I’ve also read most of his articles and blog posts that address the subject. With the exception of a few conversations we’ve had on the subject, everything I know about effective dashboard design can be learned from Information Dashboard Design or http://www.perceptualedge.com/.

My discussion of this dashboard’s problems is broken into sections. First, I’ll discuss the overall problems, and then I’ll point out some of the problems that are specific to individual components.

Overall Problems:

  • Layout: Each component on the dashboard fits into an equally sized “box,” which scales when the window is resized. All of the components resize along with their containing boxes, except the table, which does not scale. Depending on your resolution, this can cause the dashboard to be unbalanced, as in the screenshot above, where some of the items are unnecessarily large, while the table is almost illegibly small. Besides poor scaling, the dashboard’s layout is hindered by the fact that it is based on a grid. As mentioned before, each component fits into an equally sized box, even though all components probably shouldn’t be equally sized. For instance, the heatmap (bottom left) has a much higher data density than the upper bar graph and should probably be allowed more space, yet the dashboard’s grid system gives them the same amount of “real estate.” More thought should have been given to the size and placement of each of these components, based on the nature of the data and its intended use.
  • Fill Color: Fill color is used to separate the dashboard into four sections. This makes it unnecessarily difficult for your eyes to track between the differently colored columns. In this case, white space alone would have probably been enough to delineate the sections (with a proper layout); if not, very thin, light gray lines could have been used. In the instances where it is necessary to use fill color to separate sections, a very light color is all that is needed.
  • Contrast: The contrast of the graphs compared to the background colors varies significantly. For instance, the heatmap and the gauge use bright colors on a dark background, so they are the most visually salient objects on the entire dashboard. But, are they really that much more important than everything else? Now look at the table and the upper bar graph. They both use blues that are very similar to the background color. As such, they fall away into the background. While a good design can use differences in contrast to direct our eyes, in this case, I think these differences are arbitrary.
  • Lack of Context: Many of the graphs are hard to decipher due to insufficient explanatory text. For instance, the “Average Loan Size” bar graph would be easier to understand if it said what units it was being measured in (e.g. U.S. dollars, thousands of U.S. Dollars, etc.). In some cases, such as in the pie charts, the missing information can be obtained from a pop-up legend, by clicking the small eyeball icon to the bottom-right of the charts. However, many of these graphs could have easily been put in context through clearer titles and labels, making the pop-up legend unnecessary. Also, even with the assistance of the legend, some of the graphs are still indecipherable. For instance, notice that both the gauge and the table display the “loan count,” but represent drastically different values. In context, it’s likely that both of these values would make sense. Unfortunately, that context has not been provided.
  • Vertically-Oriented and Angled Text: The line graph, the two bar graphs, and the Pareto chart (top right) use vertically-oriented text for their axis titles. The bar graphs and the Pareto chart also use angled text for some of their labels. Vertically-oriented or angled text is harder to read than horizontally-oriented text and should not be used if it can be avoided. On this dashboard, the vertical axis titles could easily be moved to the top of the axis and horizontally-oriented, while the labels could all be oriented horizontally without moving them at all (although some of the labels in the Pareto chart would need to be split onto two lines).
  • Unnecessary Precision: Graphs are used to show the shape of data, to compare magnitudes, spot exceptions, etc. If exact values are necessary, a table works best. As such, it’s usually not necessary to show actual values of bars; a scale along the axis will likely provide sufficient precision. In this dashboard, the numbers have been written directly on the bars and pie slices, and in many cases they have been written to two decimal places of precision. This clutters the graphs and distracts us from the shape of the data. On the rare occasions when the exact values and the shape or magnitude of the values are both necessary, a table and graph should be used in conjunction. It’s less distracting and more efficient to look up values on a table that is below or next to the graph than it is when they’re integrated.
  • Use of Color Gradients: It’s a rare occasion when the use of a gradient in a dashboard actually serves to enhance its usability. Most often, color gradients are used in a misguided attempt to make a dashboard more visually interesting. At best, this is useless decoration; at its worst (such as when a gradient is used in the plot area of a graph), it can actually cause optical illusions that can adversely affect perception of the data. In this dashboard, gradients are used to decorate many of the graph and axis titles. This does nothing to enhance communication and only serves to give these titles unnecessary visual salience.

The Heatmap:

  • A heatmap is a poor choice for the display of time-series data. In addition to the actual values, which a heatmap can only display in a very rudimentary manner, based on color, it’s often useful to see the shape of change through time. If a line graph were used instead of a heatmap, it would be much more enlightening. For instance, in June, the heatmap shows us that for all but one day, the amount of loans funded was considered “Poor.” If a line had been used, we could see whether the amount of funded loans is trending upwards, downwards, or remaining flat, whether the loan amount is fluctuating significantly between days or remaining fairly steady, etc. Reference lines could be used to signify the division between “Good,” “Fair,” and “Poor” performance.
  • The horizontal axis of the heatmap is inadequately labeled and very confusing. The axis label says that each number represents the “Date,” but last time I checked, February had more than 19 days. After working at it, I was able to decipher the meaning of the days. Each number represents a business day for a given month in the year 2005. In addition to ignoring the weekends, the heatmap also ignores certain holidays. For instance, February only had 19 work days if you exclude President’s Day. As you can see, the problem with this is that nobody thinks in terms of work days. You don’t think, “Today is the 13th work day of the month.” You think “Today is the 17th day of the month.” The heatmap’s design would have been much more effective if every day of the month was included and non-business days were simply left blank or “grayed out” in some manner.
  • Color is used poorly in the heatmap. The use of similar intensity reds and greens together makes the heatmap useless to the 10% of men and 1% of women who are colorblind. Additionally, by only encoding three different values (“Good,” “Fair,” and “Poor”) we lose out on some of the depth the heatmap could have provided. For instance, currently, a single loan could mean the difference between a day being considered Fair or Good. If a divergent color scale were used—that is, one that uses different intensities of two different colors, to encode the data—the heatmap would provide much more insight. For instance, red could be used for poor loan days and blue could be used for good loan days. Days with extremely high or low loan volumes would show up as bright red or blue, average days would appear gray, and everything else would fall somewhere in between. While we still wouldn’t know exact values (color doesn’t work for this), we would have a better idea of just how “good,” “poor,” or even “fair,” each day was.

The Gauge:

  • The primary problem with the gauge in the second column is, well, that it’s a gauge. Gauges are “all the rage” on information dashboards; unfortunately, they take the dashboard metaphor too far. One of the strengths of gauges on real automobile dashboards is that they are an easy way for a mechanical device to show change through motion. For the needle to change, its base only needs to rotate, instead of physically moving from one place to another. However, computerized dashboards need not share the same physical restraints that real world gauges do. On a computer screen, the circular shape of the gauge only wastes space and makes it unnecessarily difficult to read. Additionally, it’s rare that the data on a dashboard is updated so frequently that motion will actually be used to show change. This dashboard is no exception. The “real-time” gauge represents the “loan count” for a given month. You would want to know this number, but would you really watch the gauge to see how fast it changed, the way you look at the speedometer in your car? No. The information contained in the gauge could be displayed more efficiently in a variety of ways that would also save room. One of the best ways to display this information is through the use of a bullet graph, which Stephen developed specifically as an effective, compact replacement for gauges.

The Upper Bar Graph:

  • Can you name the two months that follow May? June and July, right? When you think of them, you never think “July and June,” because that is not their order in the year. Anytime time-series data is shown on a graph, it should always be sorted from earliest to latest and never any other way. Unfortunately, in an attempt to rank the months by “Average Loan Size,” the creators of this dashboard put July before June, as seen in the upper bar graph. They shouldn’t have.

The Pareto Chart:

  • The graph in the top right corner is called a Pareto chart. In this type of graph, the bars indicate the magnitude of individual items, while the line indicates the cumulative totals of those items from left to right. For instance, the line starts at the top of the first bar and then as we move to the second bar, the line displays the combined total of the first and second bars. Once we reach the right-most bar, the line equals the total of all items. The problem here is not the Pareto chart itself, but the scale on the left. The scale on the right expresses each bar’s ACH Pull as a percentage of the whole (which is why the line ends at 100% on this scale), but it is not clear what the left scale represents. Also, the unequal precision used on the left scale, with numbers containing anywhere from 0 to 2 decimal places, makes it more difficult to read than necessary.

The Pie Charts:

  • Pie charts should never be used. Research has found that people have a much harder time accurately judging 2-D areas, such as slices of a pie, than they have comparing lengths, such as the lengths of bars. These pie charts also exhibit another problem that most other pie charts do not. Because, both dollar values and percentages are provided on the pies, it’s natural to assume that comparisons can be made between the slices of two different pies. However, this can only be done with the percentages, not with the dollar values. For instance, look at the blue portion of the pies that represent February and March. Although February’s blue slice is larger and represents a larger percentage of that month’s total, the dollar value that it represents is significantly less than the dollar value for March. This problem and the problem of pie charts in general could be avoided if this was redesigned as a bar chart. Each pie could be replaced with a pair of bars, and all of the bars could be placed on a single axis. The vertical axis scale could be in dollars. This design would be more compact than six pie charts, and it would make comparisons between the magnitudes of two bars in a single month or between two bars in different months accurate and efficient. If the exact percentages were necessary—which they probably wouldn’t be, given the higher accuracy of magnitude comparisons with bar graphs—they could be included in a small table below the graph.

There are other tweaks and polishes that could be made to the dashboard; I have only discussed the most egregious problems. So, why do so many companies make dashboards that have so many design problems? I don’t know. Maybe they are unaware of the problems or perhaps they just don’t care. I can tell you, however, that it’s not difficult to learn the principles of good dashboard design. I have picked them up through a few hours of reading. Once they’re explained to you, they make sense and eventually seem intuitive. It only takes a little effort to learn how to create dashboards that are as effective as they should be.