Visual Business Intelligence


	Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

Tableau Veers from the Path

March 13th, 2013

I’ve seen it happen many times, but it never ceases to sadden me. An organization starts off with a clear vision and an impervious commitment to excellence, but as it grows, the vision blurs and excellence gets diluted through a series of compromises. Software companies are often founded by a few people with a great idea, and their beginnings are magical. They shine as beacons, lighting the way, but as they grow, what was once clear becomes clouded, what was once firm becomes flaccid, and what was once promising becomes just one more example of business as usual. The prominent business intelligence (BI) software companies of today have become too big to easily change course in necessary ways and too focused on quick wins to ever make the sacrifices that would be needed to do so. Once upon a time, however, these companies were vibrant, filled with the exuberance and promise of youth. As they grew, however, it became harder and harder to maintain their original vision. It’s easy for a few people to share an inspiring vision, but it is difficult for that vision to remain pure when the organization grows to 50, 100, 1,000, or more people. The demands of payroll and release schedules make it easier and easier to justify compromises and to chase near-sighted wins. Add to these challenges the demands of taking a company public and the alchemy seldom produces gold.

What does this have to do with Tableau? I believe that this wonderful company, which I have uniquely appreciated and respected, is losing the clear vision of its youth. Even though Tableau distinguished itself by a courageous commitment to best practices, which I believe is why it has done so well, it now seems to be competing with the big guys by joining in their folly. Tableau seems to have forsaken the road less travelled of “elegance through simplicity” for the well-trodden super-highway of “more and sexier is better.”

Tableau has a special place in my heart. Not long after starting Perceptual Edge, I discovered Tableau in its original release and wrote the first independent review of Tableau 1. I was thrilled, for in Tableau I found a BI software company that shared my vision of visual data exploration and analysis done well. Since then I’ve used Tableau, along with Spotfire, Panopticon, and SAS JMP, to illustrate good data visualization functionality in my courses and lectures. Until recently, I assumed that Tableau, of all these vendors, would be the one mostly likely to continue its tenacious commitment to best practices. However, what I’ve seen in Tableau 8, due to be released soon, has broken my heart. Tableau is now introducing visualizations that are analytically impoverished. Tableau’s vision has become blurred.

I recently received an email promoting the merits of Tableau 8. It included a link to more information, and when I clicked on it, this is what I read:

“Crave More Bling?” I couldn’t believe my eyes. Could I have clicked on a link to SAP Business Objects by mistake? This is not the Tableau that I know and respect. As it turns out, someone in the Tableau’s Marketing Department thought twice about the term “bling” and removed it before my screams reached Seattle, but in truth, whoever called this bling was just being honest; some of the items in this list of new visualizations are nothing but fluff.

I won’t write a full review of Tableau 8 here. Despite the problems that I’m focusing on, this version of the software includes many worthwhile and well-designed features. For the time being, it will remain one of the best visual data exploration and analysis tools on the market, but I’m concerned that its current direction does not bode well for Tableau’s future. To express my concern, I’ll focus primarily on three new visualizations that are being added in Tableau 8 and why, in two cases, they should have never been added and, in one case, how its design fails in a fundamental way.

Word Clouds

Back in 2008 my friend Marti Hearst, who teaches information visualization and search technologies at U.C. Berkeley, wrote a guest article for my newsletter about word clouds. In the article, Marti described some of the fundamental flaws of word clouds, which she referred to in the article as tag clouds, because these visualizations were always based on HTML tags at the time.

I was confused about tag clouds in part because they are clearly problematic from a perceptual cognition point of view. For one thing, there is no visual flow to the layout. Graphic designers, as well as painters of landscapes, know that a good visual design guides the eye through the work, providing an intuitive starting point and visual cues that gently suggest a visual path.

By contrast, with tag clouds, the eye zigs and zags across the view, coming to rest on a large tag, flitting away again in an erratic direction until it finds another large tag, with perhaps a quick glance at a medium-sized tag along the way. Small tags are little more than annoying speed bumps along the path.

In most visualizations, physical proximity is an important visual cue to indicate meaningful relationships. But in a tag cloud, tags that are semantically similar do not necessarily occur near one another, because the tags are organized in alphabetical order. Furthermore, if the paragraph is resized, then the locations of tags re-arrange. If tag A was above B initially, after resizing, they might end up on the same line but far apart.

Tag clouds also make it difficult to see which topics appear in a set of tags. For example, in the image below, it’s hard to see which operating systems are talked about versus which ones are omitted. Intuitively, to me, it seemed that an ordinary word list would be better for getting the gist of a set of tags because it would be more readable.

Since Marti wrote this article, what was once reserved for HTML tags has become a popular way to display words from many contexts, such as books and speeches. Here’s a word cloud that Tableau is currently featuring on its website to showcase this new addition to Tableau 8:

What this tells me is that the candidates said the following words quite a bit: “people,” “going,” “governor,” “president,” “government,” “we’ve,” “make,” “more,” along with a few others that are legible. These individual words without context are not very enlightening. “More” what? “Going” where?

Filters have been added to this word cloud for selecting words spoken by Obama, Romney, or both candidates and for removing words that were spoken outside a specified number of instances. Combining a word cloud with filters gives it an appearance of analytical usefulness, but the appearance is deceiving. A word cloud is as useful for data analysis and presentation as a cheap umbrella is for staying dry in a hurricane. Assuming that an analysis of these words in isolation from their context is useful, a horizontal bar graph would have displayed them far better. Bars would provide what the word cloud cannot, a relative representation of the values in a way that our brains can perceive. Words differ in length, so in a word cloud a long word that was spoken 100 times would appear much more salient than a short word that appeared the same number of times. You might wonder, “What if there are too many words for a horizontal bar graph?” In that case, another one of Tableau’s new visualizations—a treemap—could handle the job more effectively. More about treemaps later.

Word clouds are fun, but they lack analytical merit. When did Tableau, which was originally developed for visual analysis, become a tool for creating impoverished infographics? Did they add this feature to satisfy one of their prominent UK customers, the Guardian? Whatever the reason, with the addition of word clouds, how many of Tableau’s customers will waste their time trying to analyze data using this ineffective form of display?

Packed Bubbles

Bubbles have their place in the lexicon of visual language, but only when encoding values using the sizes of circles is the best choice available because the most effective means—2-D position (e.g., data points along a line in a line graph) and length (e.g., bars in a bar graph)—cannot be used. This is the case when we display quantitative values on a map, because 2-D position is already being used to represent geographical location and bars cannot be aligned for easy comparison because they cannot share a common baseline when they’re geographically positioned. Bubbles are also useful when, in a scatter plot, which uses horizontal position to represent one variable and vertical position to represent another, we also want to make rough comparisons among the values of a third variable in the form of a bubble plot. Using bubbles by themselves, however, is never the best way to display values, but this is what you’ll soon be able to do with Tableau 8. Here’s a simple example that displays sales per country:

How are sales in South Africa? Yes, it’s there in the list; it just isn’t labeled. We can solve this omission by forcing all of the labels to appear, as follows:

Now, can you find South Africa? Given enough time, you can spot it at the top. What is the value of sales in South Africa? Nothing about the bubble reveals this, but we can hover over the bubble to access its value. We could do this for all the bubbles if we don’t mind taking forever to see what a bar graph would reveal to an approximate degree automatically.

How much greater are sales in the United States than Argentina? Come on, give it a try. Finding it difficult? Try it now using the bar graph below.

We can now see that sales in the U.S. were approximately nine times greater than sales in Argentina.

What if there are too many values to display in a bar graph without having to scroll? Let’s look at larger set of data—sales to 3,356 customers—first using bubbles.

Cool! Assuming they’re all there, that’s a lot of bubbles. What do these bubbles tell us? Some customers buy a lot and some buy little and a bunch buy amounts in between. Anything else? No, that’s pretty much it. And why is only the one bubble that represents Jim Hunt labeled? As we’ll see in a moment, it isn’t the largest.

I can put 3,356 horizontal bars on the screen at once, but they will appear as thin, indistinguishable lines. Also, there won’t be any room for the customer names, but those names don’t appear in the bubble display either, so that’s not a problem. Let’s see how it looks.

This isn’t ideal by any means, but it provides a more informative overview than the bubbles. For example, we can easily see that approximately 10% of the customers made purchases totaling more than $35,000, approximately 70% of them made purchases totaling less than $5,000, and approximately half of them made purchases totaling more than $2,500. We could continue citing similar observations. If we wanted to see and compare individual customers, a sorted, scrolling version of a normal bar graph like the one below would often work when comparing similar values, and we could filter the data to compare specific customers that don’t simultaneously appear on the screen.

When I first learned that packed bubbles would be included in Tableau 8, I immediately sent an email to Chris Stolte, the Chief Development Officer, Pat Hanrahan, the CTO, and Jock Mackinlay, the Director of Visual Analytics. These guys are all respected leaders in the field of information visualization and friends of mine. “Why are you doing this?” I asked with an only slightly veiled sense of disgust.

Chris provided the answers, which I’m sure all make sense to him, but don’t make sense to me. Our perspectives are different. Chris lives and breathes Tableau almost every minute of his life. To the same degree, I live and breathe data visualization, independent of particular products. I believe that a feature should only be added to software that meets the following requirement: it is the best way to do something that really matters. I have the sense that Chris uses a different yardstick. He believes that packed bubbles add value, which he explained as follows:

There were two driving reasons for us building Bubble Charts into the product:

Incremental construction of views. The first was that we want to create an application where you can incrementally build up visualizations of your data and easily change from one view to another. This is the journey we have been on since the day we wrote the first line of code for Tableau. The vision is that you should be able to easily explore the space of possible visualization of your data to find just the right view to answer your question and to tell the stories in your data. As you place your data on the canvas, you should always be creating useful visual presentations of your data and getting immediate and useful feedback. We have done this well but there have always been places in the flow where users create views that feel broken or that don’t present the user’s data in a natural way. To really help people feel comfortable, I passionately feel that we need to invest in making sure every step is a reasonable view of their data to create a feeling of “safe exploration”. Supporting this flow is what drove us to invest a lot in “best practice defaults”. It is also what led to introducing Bubble Charts (and other charts) in v8.

Visualization of networks, relationships, and paths. We want to extend Tableau to include visual presentations of data that support answering questions about networks, relationships, and paths. This includes node-link diagrams, adjacency matrices, Sankey diagrams, etc. This introduces the need for additional encodings and visualizations—and flows.

Here’s how I responded to Chris:

Incremental construction of views. A visualization that represents data in a way that doesn’t support a better understanding should not be in the flow. Packed bubbles fall into this category. The user should always be directed to different views that are useful and optimally informative. Sticking an ineffective visualization between two effective visualizations, which does nothing but serve as a transition between them, adds no value. It will waste time. More concerning is the fact that people will not use packed bubbles as a mere transition but as a visualization that has value in and of itself. Even if you could demonstrate a case when packed bubbles were actually useful, which I did not find, the value that they add would still need to be significantly better than the harm that they will cause if you support them.

Visualization of networks, relationships, and paths. This is not a justification for packed bubbles. What you must do architecturally to support network displays, etc. is not a valid argument for exposing people to the steps that you took to get there. In the past, you have exposed people to unnecessary requirements that got in their way merely because of architectural constraints. For example, forcing people to use two quantitative scales to combine two types of marks in a single chart (e.g., bars and lines) was unnecessary and harmful to the interface and the user experience.

When I visited Tableau’s website to find an example of packed bubbles, this is what I found:

A small set of bubbles has been combined with a map and two bar graphs. Do the bubbles provide an effective way to display the causes of fires? Hardly. Obviously, a bar graph would handle the task effectively, but these bubbles cannot. Tableau is not only enabling people to display data in this ineffective way, but by showcasing this example they are encouraging the practice. Damn it, when did Tableau decide to nudge people in useless and potentially harmful directions?

The Marketing Department is even featuring bubbles as the new face of Tableau, as you can see below:

Maybe this is what it took to impress Gartner enough to elevate Tableau to the leaders’ section of the BI Magic Quadrant. What will it take for Gartner to grant them a more prominent position than Microsoft—spinning 3-D pie charts that sing? Once a software company starts down the path of adding silly features to a product, it becomes nearly impossible to remove them. Just ask the folks responsible for the charting features of Excel.

Treemaps

Many months ago, when I first learned that Tableau would be adding treemaps, I welcomed the news, but cautioned them to implement them well and to only nudge people to use them when appropriate. Ben Shneiderman created treemaps to display large numbers of values that exceed the number that could be displayed more simply and effectively using a bar graph. Here’s an example that appears in my book Now You See It that was developed using Panopticon:

(Click to enlarge.)

What we see here is the stock market. Each small rectangle is an individual stock. The larger rectangles group stocks into sectors of the market (financial, healthcare, etc.). Imagine that the size of each rectangle represents the price of the stock today and the color represents its change in price since yesterday (blues for gains and reds for losses). If it could be done, I’d rather view this information in a bar graph, but if I need an overview that includes all of these stocks, there are far too many values to put in front of my eyes at once using bars. A treemap is called a space-filling display, because it takes full advantage of the available space. A treemap displays parts of a whole and does so in a way that handles hierarchies. In this example, stocks are displayed as a three-level hierarchy: the entire market (level 1), individual sectors of the market (level 2), and individual stocks (level 3). Rectangles within rectangles are used to separate the groups.

A treemap belongs among Tableau’s visualizations, but I expected the implementation to be more thorough and reserved for large sets of values. If we look for examples on Tableau’s website, here’s one that we find:

This is a relatively small set of values. Could they be better displayed using a different graph? You bet. A simple bar graph would do the job. Good data visualization software should never encourage us to display small sets of data as a treemap—never! Here’s another example that’s featured on Tableau’s website:

Imagine how much better this would work using two horizontal bar graphs, side-by-side, with the items in each sorted in the same order.

Besides the fact that we’re being encouraged to use treemaps when they aren’t appropriate, Tableau’s treemap itself suffers from a fundamental flaw: it cannot effectively display values in groups. I’ll illustrate. Imagine that we want to compare sales and profits per customer, so we begin with the following treemap.

(Click to enlarge.)

The rectangles sizes represent sales and their colors represent profits (green for profits and red for losses, which you can see assuming you’re not color blind). This is a decent start, except for the fact that the labeling seems entirely arbitrary. Let’s continue by breaking customers into continents to show a hierarchy (customers by region), which treemaps were designed to support. When I double-clicked the Continent field, this is what I saw:

Hold on…what just happened? When I added continents, rather than organizing the customers into their respective regions within a single treemap, the visualization automatically switched to separate treemaps of a sort displayed within bars, but little that is useful can be done with this view. The individual rectangles for customers are too small and anonymous without labels to be of much use. Unlike packed bubbles, however, I can imagine occasions when these treemap bars could be enlightening, but I’m concerned that they’ll mostly get used when other forms of display would work better.

When I questioned the folks at Tableau about this, I was told that groups could be formed within a single treemap by dropping a categorical field of data on either the Details or Labels attributes in what Tableau calls the Mark’s card, but I did this and the treemap remained unchanged. After further correspondence, I learned that in addition to dropping the categorical field onto the Mark’s card, you must make sure that it appears in the list prior to other fields that are lower in the hierarchy. By placing Continent on the list before Customer Name, the treemap was rearranged as follows:

(Click to enlarge.)

This is a step in the right direction, even though something is definitely amiss with the labeling, but it isn’t the view that I expected and wanted. As you can see, the continents are not clearly delineated. To delineate groups clearly, we must relinquish the quantitative value represented by variations in colors (in this case profit) so that discrete colors can be used to represent the categorical groups (in this case continent), resulting in this view:

(Click to enlarge.)

When I asked Jock about this lack of standard treemap functionality, he replied:

We focused in Tableau 8 on the mapping of data fields to graphic properties rather than the formatting techniques for showing nesting found is some treemap implementations such as border width, spacing, “cushioned” rendering, etc.

I’ve seen many implementations of treemaps, but this is the first one I’ve seen that doesn’t allow values to be easily and clearly grouped.

I’m concerned that Tableau is becoming more and more like other software vendors that prioritize product release schedules over quality. This isn’t the only sign of this that I’ve seen. When brushing and linking (coordinated highlighting) was introduced, it couldn’t support proportional brushing, which in my opinion is essential. It still cannot. When extensive formatting capabilities were introduced, the initial interface was designed as a complicated panel with pages and pages of options, much like an old-fashioned dialog box. I can never remember where the option that I need. They realized that the interface sucked before it was released, but there wasn’t enough time to fix it, and it remains to this day. When they introduced a way to combine different types of marks (bars, lines, data points, etc.) in a single chart, it required a second quantitative scale and axis, even though it is rarely needed. Why? Because this was the easiest way to deliver the functionality based on the underlying architecture of the software. Convenience of development team trumped our needs as users. This approach involves unnecessary steps and a confusing interface, but it has not been fixed.

For years I’ve been encouraging my friends at Tableau to add a box plot to its library of visualizations—a form of display that is fundamental to a data analyst’s needs—but it has never risen high enough in the list, while word clouds and packed bubbles, two useless forms of display, have now been added. Obviously, the guys at Tableau have a different perspective and set of priorities than I do. I believe that mine correlates more closely with the needs of data analysts.

The Story of Show Me

In an early version of Tableau, a great little feature called Show Me was introduced. Show Me remains always available with a list of chart types to assist us in selecting appropriate forms of display. Based on the data we’ve selected, Show Me recommends charts that might be useful.

I have always loved Show Me, but my love for it has waned as it’s gradually grown over-complicated and lost site of its purpose. Rather than suggesting charts that are useful, somewhere along the line Show Me began instead to suggest chart types that are possible. In the beginning, Show Me was simple and elegant, but this is no longer the case. How Show Me has changed represents in microcosm the larger problems that I’ve observed in Tableau as a whole. It is difficult to add features to a product without over-complicating the interface. Good interface design isn’t easy, but it’s necessary.

Below on the right is how Show Me will look in Tableau 8:

It includes 23 types of charts from which to choose. In order, starting at the upper left, the list includes the following:

Text table
Symbol map
Filled map
Heat map
Highlight table
Treemap
Horizontal bar
Stacked bar
Side-by-side bars
Lines (continuous)
Lines (discrete)
Dual lines
Area chart (continuous)
Area chart (discrete)
Pie chart
Scatter plot
Circle view
Side-by-side circles
Dual combination
Bullet graph
Gantt
Packed bubbles
Histogram

Show Me has grown to include far too many choices, which undermines its ability to guide us. It has become bloated, awkward, and confusing. I believe that the following shorter list of chart types would serve its purpose more effectively:

Text table
Bar: When selected, either a vertical or horizontal bar could be automatically selected based on factors such as the length of labels and the number of values. Regular bars could be easily transformed into stacked bar by dropping a categorical field of data onto the color attribute, so there’s no reason to include stacked bars as a separate choice in Show Me.
Line: Only one type of line graph needs to appear in Show Me: the type that displays a series of values continuously, without breaks in the line. On rare occasions when what Show Me calls a discrete line—one that has breaks in it—is useful, this could be made available as a simple formatting option. What Show Me calls dual lines is merely a line graph with two quantitative scales. This also could be handled as a simple formatting option, not just for line graphs, but for bar graphs and area charts as well.
Area chart: Similar to a line graph above, only one kind of area chart is needed in Show Me: the one that displays values continuously, without breaks.
Dot plot: This is the conventional name for what Show Me currently calls a circle view and side-by-side circles.
Scatter plot: (Dropping a quantitative field of data on the size attribute turns a scatter plot into a bubble plot.)
Histogram (In appearance, this is a bar graph, but it possesses special functionality that is needed to display frequency distributions, and as such it deserves to be on the list.
Box plot: A visual analysis product without a box plot is missing something essential.
Geographical map: Symbols would serve as the default mark, but color fills would be readily available as an option.
Heat map: This is a heat map arranged as a matrix of columns and rows. What is currently named a heat map in Show Me is misnamed, for it uses symbols that primarily vary by size, not color. This functionality is already readily available whenever symbols such as circles or squares are being used, which is the default when a dot plot is selected, by dropping a categorical variable on the size attribute.
Treemap
Gantt chart

This list reduces our choices from 23 to 12. Not bad. Each item on the list actually qualifies as a different type of chart; none are mere variations of another chart on the list with different formatting. Navigating these choices would be so much simpler.

Notice that I left out the pie chart. When the guys at Tableau first added the pie chart to the product, they explained to me that they were doing so only because pie charts are sometimes useful on geographical maps to divide bubbles into parts of a whole. For this sole purpose, there is no reason to include the pie chart in Show Me. Bubbles on maps can easily be subdivided like pies into parts by dropping a categorical field of data onto the color attribute. By including the pie chart in Show Me, people are encouraged to use it when other forms of display, usually a bar graph, would work more effectively.

Also notice that there is no chart type in my list that serves as a combination of marks (e.g., bars and lines). It shouldn’t be necessary to list separate chart types for every possible combination of marks. A good interface would allow us to select a series of values in a graph and change its mark to any type that’s appropriate.

And finally, notice that I’ve removed from the list my own invention, the bullet graph. I did this because a bullet graph can be treated as a variation of a bar graph or a dot plot, which could be easily invoked as a formatting option.

In Show Me, too many charting options are suggested as viable views, often when they’re ineffective. Here’s Show Me again:

The following chart types are highlighted in Show Me because they supposedly provide useful ways to view the two fields of data that I’ve selected: Market and Sales:

Text table
Heat map
Highlight table
Treemap
Horizontal bar
Stacked bar
Pie chart
Circle views
Packed bubbles

Let’s see which views are actually useful based on the data that I’ve selected. We’ve already seen the text table, which is definitely useful.

Here’s what Show Me is calling a heat map:

This isn’t actually a heat map, as I mentioned before, because colors are not being used to encode the values; sizes are. Rarely would we choose to display these four sales values using the sizes of squares.

Here’s a highlight table:

This is slightly more useful as a way to add a visual representation (color) to a text table, but other forms of display will usually work better.

Here’s a treemap:

It would never make sense to display a small set of values such as this as a treemap.

Here’s a horizontal bar graph:

Unless we need precise values, as provided in the text table, this is clearly the most effective way to display these values. Fortunately, this is the chart that’s featured as the best choice by Show Me.

Here’s a stacked bar graph:

Try to compare the stacked market values. To compare them easily and accurately, we would definitely want a regular bar graph. The only time this stacked bar would suffice is when we primarily want to know total sales with only a rough sense of how its divided into markets.

Here’s a pie chart:

Try comparing the slices or determining individual values. This is never going to serve as a useful view of these values.

Here’s a circle view:

Only the bar graph does a better job than this circle view (a.k.a., strip plot). Using the positions of the circles to encode their values works well for our brains.

What’s left? Oh yeah, the new packed bubbles chart. Here it is:

Thank God Tableau decided to add this chart to its Show Me recommendations. Just imagine the enlightening views this will provide! (Yes, I’m being sarcastic.)

Why Is This Happening?

What’s causing Tableau to compromise the integrity of its product? Given the circumstances, here’s a list of the possibilities that I would usually consider:

Has their mission to go public led them to add ineffective but eye-popping sparkle to the product to entice investors?
Has their marketing plan led them to curry favor with analysts who don’t understand analytics in general or data visualization in particular, such as Gartner and Forrester, by adding features that might appeal to them?
Have they been tempted by the possibility of big deals with prominent companies to add senseless features to win those deals?
Is the development organization, which has now become huge, inclined to add features merely because they are easy to implement based on the underlying architecture or to design them in hobbled ways because the best design cannot be implemented without changing the architecture?
Have they lost touch with their roots in academic research and no longer remember how the human brain works?
Have they gradually become like most other software organizations, limited to an engineering perspective and driven by sales, as opposed to one that is balanced by a commitment to design and inspired by opportunities to provide the best possible product?
Have they become slaves to product release schedules even when that means that new features will be implemented poorly?
Have they become so immersed in the product and out of touch with those who rely on it that they’ve forgotten what really matters?

These are the questions that I would ask about any vendor in these circumstances, but they aren’t questions that I want to consider when trying to understand Tableau. Although no organization is immune to losses in effectiveness as it grows, I’ve always believed that if any software company could stand firm in its vision and commitments, Tableau was that company. The truth is, I really don’t know what’s driving their decisions to compromise the integrity of the product and its usefulness to the world. Even for the folks at Tableau, I suspect that most of the reasons behind their choices are unconscious. I suspect that most of the poor design choices stem from the fact that when people become immersed in a product, their perspective becomes myopic and skewed and they can no longer see the product through the eyes of its users or the guiding lens of best practices.

What criteria should be used to guide product development? A product should never be designed merely to do what’s easy…to do what’s fun for the developer…to do what can be done in the allotted time…to do what will produce the highest revenues…or even to do what people are asking for most. There’s nothing wrong with design that is easy, fun, profitable, or any of the other benefits in this list, but these should not serve as the primary drivers of product design and development.

Here’s a simple guideline for good design: “Design the product to do what’s most useful in the best way possible.”

What’s “most useful?” The fact that customers are asking for something in particular does not in and of itself make it useful. People sometimes want things that are far from useful or they want things to be designed in ways that don’t actually work. Anything that helps us solve real-world problems is useful. Problems are not created equal, however. Solving some problems is more important than solving others. The value of a solution should be tied to its potential for making the world a better place. It not that word clouds and packed bubbles are far down the list of problem-solving features for a visual data analysis product; they aren’t on the list at all.

Great products are never those that try to please everyone and do everything. Rather, they begin with a clear, focused, and coherent vision, and they persistently do what’s necessary to express that vision as effectively as possible without detour or compromise. Based on what I’m seeing, I fear that Tableau is trying to make its product do everything one can imagine doing with data. This can’t be done. It shouldn’t be done. Making the attempt will result in a product that is complicated beyond usability. It is reasonable for Tableau to make it easy for analysts to share the results of their work with others in polished ways by providing basic data presentation functionality, but it doesn’t make sense for the existing product to become a comprehensive platform for the development of infographics. Such a tool would require functionality similar in complexity to Adobe Illustrator, very little of which would ever be needed by a data analyst. If Tableau wants to compete with products such as D3, which provides a flexible programming environment for the development of interactive, web-based infographics, it would make sense to develop a separate tool that integrates nicely with the existing product. Sometimes good design involves setting practical limits and sticking to them.

I believe that Tableau is increasingly veering from the path. It pains me to say so. It’s all right for a company’s vision to evolve, and I’m sure that Tableau’s has in meaningful ways, but what I’ve described doesn’t qualify as progress. If you’re one of Tableau’s customers and you share my opinion, I urge you to make your concerns known. Please join me in reminding the good folks at Tableau that they are better than some of the design choices that they’re currently making.

Take care,

50 Comments

O’Reilly Media Has Lost Its Soul

March 11th, 2013

Several years ago I was courted by O’Reilly Media. After the success of my first book, Show Me the Numbers, the folks at O’Reilly were interested in publishing my second. I responded cautiously at first, because my prior self-publishing experience was extremely positive. Analytics Press, the publisher of Show Me the Numbers and Now You See It, is owned by my friend Jonathan Koomey; by working with Jon I was able to manage the entire process myself from beginning to end without interference—essentially self-publishing. Nevertheless, I wanted to see if a large publisher such as O’Reilly could deliver on its promise to add value to the process. After two months of negotiating a contract to guarantee my right of approval over design decisions (book layout, paper quality, binding quality, printing quality, etc.), I agreed to work with O’Reilly to publish Information Dashboard Design. This is the story of that collaboration.

I worked with two good editors at O’Reilly who made the book production process navigable whenever rough seas arose, which occurred more than once. The acquisitions’ editor, Steve Weiss, and my primary editor, Colleen Wheeler, both possessed integrity and practical minds. Working with O’Reilly took much longer than my self-publishing experience had—bureaucracy tended to get in the way—but the book was finally shipped off to the printer and we all breathed a sigh of relief. While waiting for the finished product, I eagerly imagined how O’Reilly’s marketing and distribution prowess would extend the reach of my work. I was hoping that it would do so significantly, because I was paying for it dearly, allowing O’Reilly to retain all but a small fraction of net sales.

When the book was finally published, I soon discovered the sad reality of working with a large publisher: unless you’re a celebrity, publishers do nothing that you can’t do on your own just as well or better for a fraction of the cost. What did O’Reilly do to publicize my book? They listed it on their website. That’s it. Information Dashboard Design has, with rare exceptions, been the bestselling book about data visualization since it was originally published back in 2006, but O’Reilly doesn’t even include it on its list of data visualization books. What did O’Reilly do to put my book within reach of readers? They worked with the same distributors and retailers, such as Ingram and Amazon, that I could have worked with directly. So, if O’Reilly didn’t promote the book and didn’t get it into sales channels that I couldn’t reach myself, what did they do? What they did was screw up and break our contract time after time.

Approximately two years after the book was published, O’Reilly sub-contracted with an unapproved printer and allowed them to produce my book using thin, cheap paper in direct violation of our contract. Not only did this cause images to bleed through the pages, which is unacceptable for any book, let alone one about design, but this caused the book to be so much thinner when bound that the artwork on the spine—an example of my bullet graph—was cut in half. I discovered this while distributing books to my students at the end of a workshop. I was mortified, and then I became angry. After weeks of heated negotiations, which seemed doomed to failure, there seemed to be no alternative but to launch a legal battle to force O’Reilly to honor the contract. In a final desperate effort to resolve matters, I wrote to the founder, Tim O’Reilly, who no longer served as the publisher but remained at the helm of the company. Thankfully, Tim possessed what the publisher who replaced him, Laurie Petrycki, seemed to lack: integrity and good business sense. Within a day or two, much to my relief, the matter was resolved. The remaining defective copies of the book were destroyed, a new printing was set in motion, and an amendment to the contract was written to prevent this from happening again, or in the event that it did, to make sure that O’Reilly surrendered its rights to the book, posthaste.

One provision of the amended contract required that O’Reilly allow me to review and approve final printer’s proofs before each new printing to make sure that no unauthorized and harmful changes were introduced. Since that time, they only remembered to honor this agreement once. At another time I discovered that my book was being sold in a Kindle edition without my approval, also in direct violation of the contract. This Kindle edition automatically reformatted the book, ignoring the careful page layout that I worked long and hard to produce, and it also reduced the colors, which were critical, to monotones of gray. On another occasion they forgot to ship a large order to a conference where I was teaching, which left over 100 students without their books until we could collect each student’s mailing address and individually ship the books.

I doubt that any breaches of contract were willful. The root of the problem, however, was O’Reilly’s lack of concern for me and my rights as an author. The authors who write the books that keep O’Reilly in business are mere cogs in the wheels of O’Reilly’s churn-‘em-out business model. It is in part because so many publishers treat their authors poorly that the traditional publishing model is dying. Alternatives to working with a traditional publisher would not be so attractive if authors were respected in the manner that they deserve. Rather than adapting to a business model that will work in the modern world, O’Reilly’s publishing wing has dug in—an act of insanity.

Two years ago when I started planning a second edition of Information Dashboard Design, because my editor Colleen Wheeler was still at O’Reilly (she has since moved on), I agreed to let her pitch the benefits of publishing the second edition through them, despite their many errors. Colleen did her best, but our discussions made it clear to me, and I believe to her as well, that I would be a fool to work with O’Reilly again. Several weeks ago I contacted Steve Weiss of O’Reilly to remind him that I would be publishing a second edition of the book but wouldn’t be working with O’Reilly. I wanted to make sure that the path was clear of any effort by O’Reilly to oppose my decision. Eventually, I was routed to the publisher, Laurie Petrycki, who informed me that I could not publish a second edition of the book except through O’Reilly. She insisted that our contract locked me forever into working with O’Reilly when writing anything that was derived from Information Dashboard Design. I interpreted the meaning of a “derived work” differently, but instead of debating semantics, I decided to cut through the murkiness by pointing out that O’Reilly had broken our agreement on several occasions and was therefore contractually obligated to surrender its rights to my book. At that point, Petrycki turned matters over to her legal team, which put into motion a series of maneuvers that were designed to waste time and discourage opposition.

I generously offered to resolve matters simply, peacefully, and to both parties’ advantage. I agreed to sign a release that O’Reilly provided, dismissing them of fault, if they would continue selling the first edition of Information Dashboard Design until the second edition was published in July of this year. This would serve their interests, allowing them to earn additional revenue and maintain good will with me, but more importantly, it would serve the interests of potential readers. How did O’Reilly respond? They stopped production of the book immediately and have refused to continue selling it. This was an act of pure spite; the reaction of a petulant child. No wonder authors are increasing looking for alternatives to established publishers like O’Reilly.

O’Reilly Media—the publishing wing at least—appears to have lost its soul. I have no doubt that Tim O’Reilly founded the company with a great vision and high respect for authors. I don’t know when things changed, but it’s obvious that they have. It’s hard to value anything that O’Reilly Media is doing today, including its conferences, when its publishing wing is this dysfunctional. Nevertheless, I would never tell people who are looking for useful content to avoid O’Reilly’s books; that would deny readers useful content and deny authors the revenues and audience they deserve. What I will do without hesitation, however, is encourage authors who might be considering O’Reilly Media as a publisher to look elsewhere.

Perhaps other publishers are equally soulless. I suspect that many are. If you’re looking for a publisher and can’t find one with integrity that offers real value and is willing to commit to it contractually, I would encourage you to do what I’ve done: skip the middleman—a book-mill that does the least possible for 85% to 90% of the revenues—and self-publish. This is increasingly what authors are doing, and for good reason. If there remains even a glimmer of decency within O’Reilly’s management (Tim, are you still there?), I hope they wake up and show some care and intelligence before the disease that is rotting their core fouls the memory of O’Reilly Media forevermore.

Take care,

52 Comments

Naked Statistics

February 15th, 2013

I spend most of my time teaching people the basic concepts and skills of data visualization. You can’t learn data visualization by memorizing a set of rules. You must understand why things work the way they do. Stated differently, you must understand data visualization on a conceptual level. To engage in data sensemaking, which is central to data visualization, you must be able to think statistically. This doesn’t mean that you must learn advanced mathematics, nor can you do this work merely by learning how to use software to calculate correlation coefficients and p-values. Thinking statistically involves an understanding of the basic concepts of statistics. Most books about statistics, even introductory textbooks, do a horrid job of teaching these concepts in a way that makes sense. People who are immersed in statistics often have a hard time remembering how to talk to people who are not. Because statistical thinking is so central to the work that I do, I am constantly on the lookout for books that teach the essential concepts of statistics well. I am happy to announce that I’ve just found the book that does this better than any other that I’ve seen: Naked Statistics: Stripping the Dread from the Data, by Charles Wheelan (W. W. Norton & Company, 2013).

This is a marvelous book. If you’re a fan of the classic How to Lie with Statistics by Darrel Huff (1954), as I am, this is its rightful heir. Wheelan teaches public policy and economics at Dartmouth College and is best known for a similar book written several years ago titled Naked Economics. In Naked Statistics, he selects the most important and relevant statistical concepts that everyone should understand, especially those who work with data, and explains them in clear, entertaining, and practical terms. He wrote this book specifically to help people think statistically. He shows how statistics can be used to improve our understanding of the world. He demonstrates that statistical concepts are easy to understand when they’re explained well.

If you grow faint at the sight of mathematical notation, as I do, this is the book for you. Wheelan is in a good position to empathize with us. He hated calculus in high school. Why?

In hindsight, I now recognize that it wasn’t math that bothered me in calculus class; it was that no one ever saw fit to explain the point of it. If you’re not fascinated by the elegance of formulas alone—which I am most emphatically not—then it is just a lot of tedious and mechanistic formulas, at least the way it was taught to me.

The paradox of statistics is that they are everywhere—from batting averages to presidential polls—but the discipline itself has a reputation for being uninteresting and inaccessible. Many statistics books and classes are overly laden with math and jargon. Believe me, the technical details are crucial (and interesting)—but it’s just Greek if you don’t understand the intuition. And you may not even care about the intuition of you’re not convinced that there is any reason to learn it.

Wheelan explains most of the statistical formulas in chapter appendices so you can easily skip them if you wish. Statistical terms are introduced throughout, but only when necessary and always in ways that make perfect sense. The real-world examples that Wheelan uses to explain statistics are all interesting and familiar.

Because statistics and the scientific method are intimately intertwined, Wheelan talks a great deal about social science and how researchers use statistics to test hypotheses and validate findings. He even explains the “null hypothesis” in a way that will make you sit up and pay attention, which is quite a feat. If you think this isn’t relevant to you because you’re a business data analyst, think again. Even in business we should be formulating and testing hypotheses rather than making judgments based on intuition alone.

What is the point of learning statistics?

To summarize huge quantities of data.

To make better decisions.

To answer important social questions.

To recognize patterns that can refine how we do everything from selling diapers to catching criminals.

To catch cheaters and prosecute criminals.

To evaluate the effectiveness of policies, programs, drugs, medical procedures, and other innovations.

And to spot the scoundrels who use these very same powerful tools for nefarious ends.

If you read this book, you’ll come to understand statistical concepts and methods such as regression analysis and probability as never before. I have many statistics books in my library. I’ve read them all. Not until I read Naked Statistics, however, did some of the concepts that I’ve contemplated dozens of times come to life for me. Even if you’re a statistician, you should read this book. It could help you explain what you do to others in a way that they can finally understand and appreciate.

In the concluding chapter, Wheelan says:

Statistics is more important than ever before because we have more meaningful opportunities to make use of data. Yet the formulas will not tell us which uses of data are appropriate and which are not. Math cannot supplant judgment.

This book was written to improve human judgment through better statistical thinking. In the final sentence, Wheelan says something that is almost word-for-word what I often say at the end of lectures and classes: “Go forth and use data wisely and well!” I heartily recommend this book.

Take care,

9 Comments

Big Data Disaster

February 11th, 2013

Access to lots of fast moving data without the thoughtful and ethical involvement of human beings spells disaster. A poignant example of this is the use of so-called Big Data by the three major credit agencies in America. Experian, Equifax, and Transunion have done great harm to the lives of millions of Americans through their irresponsible use of data. According to a new FTC report published today, mistakes exist on the credit reports of 20% of Americans. Credit reports are used to grant or deny access to loans and other services. Not only are these agencies getting the facts wrong far too often, but they are making if virtually impossible for people to get these mistakes corrected.

If you want to understand the horror faced by millions of Americans, watch the report that was aired by 60 Minutes last night. It will chill you to the bone and make you angry. Credit agencies are hiding behind walls of Big Data, using it as an impenetrable barrier to the fair treatment that people deserve. The algorithms that these agencies use for credit scoring are enshrined in mystery, hidden from public scrutiny.

Data, no matter what its size, speed, or source, must pass through the hands and minds of thoughtful and ethical people. It is necessary for people who understand data and are committed to using it ethically are part of the process, otherwise, Big Data can become an oppressor much like Big Brother, holding senseless sway over our lives. You can’t fight data! You can only appeal to human beings. When human beings are removed from the process, when human brains and empathy are circumvented, to whom will you turn for reason and justice?

Take care,

11 Comments

A Pie in the Face for Information Visualization Research

February 5th, 2013

Even though several research studies over the years have sought to compare the relative effectiveness of pie charts vs. bar graphs, only for one task have bar graphs failed to outperform pie charts. The one potential advantage of pie charts was identified in a study by Spence and Lewandowsky titled “Displaying Proportions and Percentages” (Applied Cognitive Psychology, Vol. 5, 1991). This study has probably been cited more often than any other to support the pie chart’s worth. I suspect that most of these citations, however, were made by researchers who never actually read the original paper, so they tend to give pie charts more than their due. In all fields of research, not just information visualization, studies are routinely cited that weren’t actually read, resulting in misrepresentations of the original work’s findings. According to a study by Mikhail Simkin and Vwani Roychowdhury, only about 20 percent of scientists who cite an article have actually read the paper (“Read Before You Cite!”, Complex Systems, 14 , 2003). In most cases, researchers have only read comments in secondary sources about the studies that they cite—sources that were often written by others who also relied on secondary sources. This is one of the ways that errors proliferate and sometimes become common knowledge, even in scientific circles.

Few researchers bother to mention that the study by Spence and Lewandowsky robbed bar graphs of their quantitative scales. Perhaps, because pie charts lack quantitative scales, Spence and Lewandowsky felt that scales should be removed from the bar graphs to even the playing field. In fact, a pie chart has an implied scale that goes from 0% to 100% in a circle around the perimeter of the pie, but it is never shown because it isn’t helpful. By removing the scales from bar graphs, however, their study failed to measure the effectiveness of bar graphs as actually used.

Nevertheless, even when hamstrung in this way, bar graphs performed better than pie charts for every task except comparisons of summed parts. Imagine a pie chart with four slices, labeled A through D, and a bar graph with four bars, one for each of the same values. Now imagine the following task: either compare the sum of slices A and B to the sum of slices C and D to determine which is greater or perform the same comparison using the corresponding bars in the bar graph. The study found that test subjects could estimate the sums of two slices and compare them to the sums of another two slices more effectively than they could estimate and compare the combined lengths of bars. This isn’t surprising, but even this one advantage of pie charts might not have been found had the bar graphs possessed their scales.

Comparing the lengths of two bars that share a common baseline is handled by the visual cortex of the brain in a preattentive manner that is fast and as precise a comparison as visual perception supports. Comparing the sizes or angles of pie slices is also handled by the visual cortex, but not as precisely and usually not as quickly either, because we typically strive for a level of precision that the pie chart doesn’t support, which slows us down. Decoding the value represented by a slice of pie requires us to estimate the percentage of the circle that belongs to the slice, which is difficult. Decoding the value represented by a bar involves a straightforward lookup: we compare the end of the bar to the nearest value along the scale. When a bar graph is properly designed, we can perform this task quickly, easily, and precisely.

The fundamental superiority of bar graphs over pie charts is rooted in a fact of visual perception: we can compare the 2-D positions of objects (such as the ends of bars) or their lengths (especially when they share a common baseline), more easily and precisely than we can compare the sizes or angles of pie slices. When people like Edward Tufte, William Cleveland, Naomi Robbins, and I express disdain for pie charts, it is for this reason and this reason alone. We love circles as much as anyone, but we don’t worship them and we don’t expect from them what they can’t provide.

Despite the perceptual problems associated with pie charts, which are well established, every once in awhile some new study or book comes along and suggests that the experts have been wrong all along. Even when utterly absurd and completely unfounded, lovers of pie charts, especially software vendors that promote silly, ineffective data visualization practices, celebrate these studies: “Mission accomplished! We have proven the worth of our beloved pie.” To quote the conclusion of a recent journal article: “The pie is a communication chart par excellence…pies are from Venus, bars are from Mars” (Charles Wesley Ervin, “Pie charts in financial communications,” Information Design Journal, 19:3, 2011). People love circles, there’s no doubt about it, but they are rarely useful for displaying quantitative information.

A few days ago I discovered the latest paper that gives an undeserved thumbs up for the pie chart: “Using fNIRS Brain Sensing to Evaluate Information Visualization Interfaces,” written by students and faculty at Tufts University. I discovered this paper when reading a blog post that cited my negative opinion of pie charts and then pointed to this paper as potential evidence of my error. This paper has been accepted for presentation at CHI 2013 in Paris later this year and is already available in published form. This study is misdesigned, misinterpreted, and misrepresented. I wish I could say that this is an anomaly, but sadly, I cannot. If you are not intimately acquainted with academic research, you might assume that most of it is well done, and that getting published is a sure sign of credibility. This is far from true. Bad research gets published in every field, but in the field of information visualization, it sometimes even wins awards.

The following bold statement appears in the paper’s abstract (emphasis mine):

In this paper, we use the classic comparison of bar graphs and pie charts to test the viability of fNIRS [functional near-infrared spectroscopy] for measuring the impact of a visual design on the brain. Our results demonstrate that we can indeed measure this impact, and furthermore measurements indicate that there are not universal differences in bar graphs and pie charts.

In fact, this study demonstrates nothing of the kind. It does not meaningfully measure the impact of visual design on the brain, and it definitely does not indicate anything universal or even otherwise about differences between bar graphs and pie charts. The primary problem with this study is the fact that it did not simulate any of the actual tasks that people perform when using bar graphs and pie charts. This is only apparent, however, if you read beyond the abstract.

I’ll describe the tasks that test subjects performed. See if you can identify the problem. Subjects performed multiple series of tasks. Each time they were shown 11 slides in sequence, lasting 3.7 seconds per slide. Each slide in a particular series displayed either a single bar graph or pie chart. Each chart displayed multiple bars or slices. Among them, one bar or slice was marked with a large black dot and one with a small red dot. The subject was required to compare the length of the bar or size of the slice marked with the red dot to the length of the bar or slice of the pie marked with the black dot on the previous slide. Items marked with black dots always represented values that were greater than those marked with red dots. The subject’s task for each slide was to estimate how much larger the item marked with the black dot on the previous slide was compared to the item marked with the red dot on the current slide, to the nearest 10%. In other words, they would indicate that it was approximately 10% greater, 20% greater, 30% greater, etc., which they did by pressing an appropriate key on a keyboard. After making this choice, they then had to quickly look at the item marked with the black dot in the current slide before the 3.7 seconds were up so they could remember it when the next slide appeared and they were required to compare it to the item marked with the red dot there. The figure below shows an example of three slides in an eleven-slide series, in this case consisting of pie charts:

Think about this task. Is this what we do when we compare values in bar graphs or pie charts? It isn’t. What’s different from our actual use of these charts? The things that subjects compared were never simultaneously visible.

When we use a chart to compare either slices or bars, we almost always compare values within a single chart. The values are right there near one another, which allows the visual cortex of the brain to handle the comparison. On less frequent occasions when we compare values that reside in separate charts, we always put those charts in front of our eyes at the same time, such as in a trellis display. This is a fundamental practice of data visualization. Why? Because, if the things that we need to compare are not simultaneously visible, we must rely on working memory, which is extremely limited. Work is transferred from the visual cortex to working memory—from our strength to our weakness—which is just plain dumb.

The designers of this study created a task that was handled by working memory because they wanted to demonstrate the usefulness of fNIRS technology for data visualization research and fNIRS can only measure neural activity in the prefrontal cortex, not the visual cortex. They created an unrealistic, artificial task. In doing so, they created something to measure in the prefrontal cortex, but it had nothing to do with a realistic use of charts.

This study was not actually designed to compare the effectiveness of bar graphs vs. pie charts, yet it makes the claim that “there are not universal differences in bar graphs and pie charts.” Instead, this study was designed to demonstrate a use for fNIRS technology in the field of data visualization research. It failed to achieve the latter and should have made no claims regarding the former.

Only one potentially meaningful finding should have been claimed by this study: a positive correlation between test subjects’ subjective sense of difficulty associated with the use of bar graphs vs. pie charts and hemoglobin oxygenation levels in the prefrontal cortex. Subjects who felt that bar graphs were more difficult exhibited higher levels of oxygenation when using bar graphs. Those who felt that pie charts were more difficult exhibited higher levels of oxygenation when using pie charts. This tells us nothing about the relative effectiveness of bar graphs vs. pie charts. Subjects’ preferences for one type of chart over the other might have been a predisposition, but predispositions were not tested. Whether or not a predisposition existed, we don’t know if test subjects’ sense of difficulty and higher levels of oxygenation have any relationship to the effectiveness of the charts. What the experiment found is that working memory performed equally well (or equally poorly) regardless of the chart that was used.

This and other studies done at Tufts University interpret higher levels of hemoglobin oxygenation in the prefrontal cortex as “cognitive load,” by which they imply “cognitive overload.” A negative connotation is assumed. Measuring hemoglobin oxygenation levels in the prefrontal cortex may be a valid measure of brain activity, but we have no reason to believe that this activity is necessarily negative. Perhaps high levels of activity correlate to greater insights rather than counterproductive overload. In truth, oxygenation levels probably indicate neural activity of many types: some positive and some negative. To date, we don’t know how to discriminate between them.

The use of neuroimaging such as fNIRS in HCI studies is still in its infancy. fNIRS may be useful, but we must be careful to read no more into these measures than our current understanding can actually support. Using fNIRS to interpret neural activity is a bit like using temperature readings inside a building to determine the specific activities that are going on within, even though we are separated from those activities by a solid, opaque wall.

The authors of this study indicated the need for caution, but notice how they failed to heed this concern (emphasis mine):

During the course of this paper, we have been intentionally ambiguous about assigning a specific cognitive state to our fNIRS readings. The brain is extremely complex and it is dangerous to make unsubstantiated claims about functionality. However, for fNIRS to be a useful tool in the evaluation of visual design, there also needs to be an understanding of what cognitive processes fNIRS signals may represent. In our experiment, we have reason to believe that the signals we recorded correlate with levels of mental demand.

Notice the reasoning here. We can’t assign specific cognitive states to fNIRS readings, but these readings are useless to us unless we can assign specific states to them, so we’re going to do so. After the disclaimer, they went on to declare:

Our findings suggest that fNIRS can be used to monitor differences in brain activity that derive exclusively from visual design. We find that levels of deoxygenated hemoglobin in the prefrontal cortex (PFC) differ during interaction with bar graphs and pie charts. However, there are not categorical differences between the two graphs. Instead, changes in deoxygenated hemoglobin correlated with the type of display that participants believed was more difficult.

“Differences in brain activity that derive exclusively from visual design”? What they actually found were differences related to subjective feelings of difficulty and oxygenation levels associated with those feelings, which they assumed were “derived exclusively from visual design.” It is entirely possible, however, that those subjective feelings were derived from dispositions regarding bar graphs vs. pie charts that did not grow out of differences in visual design.

Because fNIRS can only measure activity in the prefrontal cortex, not the visual cortex, the authors acknowledge that it is only potentially useful for measuring more complex tasks that involve the prefrontal cortex.

We find that fNIRS can provide insight on the impact of visual design during interaction with difficult, analytical tasks, but is less suited for simple, perceptual comparisons.

Even this statement contains an error. Remembering the size of a slice or bar so it can be compared to another slice or bar later is indeed a difficult task because of working memory’s limitations, but is it an analytical task? Does it require reasoning? It is entirely a task of memory. The prefrontal cortex handles many tasks, but we cannot currently use fNIRS to specifically measure analytical tasks because it cannot discriminate among different neural activities.

Research studies like this should prompt us to ask several questions, including:

How can students earn PhD’s while focusing on information visualization without first learning the fundamental skills required of the discipline (best practices of graph design, the basic tenets of the scientific method, an understanding of visual perception and cognition, and critical thinking)?
Do the professors who participate in these studies and the reviewers who approve them also lack these skills?
Do the professors who advise these students review these studies carefully?
Why aren’t researchers in information visualization asked to go back and correct their work prior to approval for publication based on feedback from reviewers?

I am not writing about this particular study because it is extraordinarily bad, but merely because its claims address topics of interest to me. This paper is typically bad. The problems that we see in it arise from deeper problems that are both endemic and systemic. Papers get published and awards are given when studies exhibit novelty or make controversial claims. A study that tests a hypothesis that turns out to be false is rarely published, even though it is still informative. A study that tries to replicate a past study to confirm or deny its findings is considered boring and thus avoided. Student in doctoral programs are encouraged to find something sexy. Sometimes this takes the form of studies that supposedly challenge long-established best practices. When you’re a young up-and-comer, it’s exhilarating to take a leader in the field down a peg or two. What academics sometimes forget, however, is that their work affects the world. People trust their findings and make decisions based on them. When studies make erroneous claims, they do harm. Research should be better reviewed for the merits of its content. We need fact checkers; not after the fact, such as this review that I’m writing, but prior to publication. Students should receive corrective guidance during the course of their research rather than being subjected to corrective reviews like this post-publication. The bar must be raised, but that won’t happen until academics themselves become willing to speak up.

Take care,

17 Comments

Tableau Veers from the Path

O’Reilly Media Has Lost Its Soul

Naked Statistics

Big Data Disaster

A Pie in the Face for Information Visualization Research

Archives