A few days ago I noticed a blog posted by Boris Evelson of Forrester Research titled “How to Differentiate Advanced Data Visualization Solutions.” Forrester is one of the leading IT research and advice companies. Along with its larger rival, Gartner, these companies serve as trusted advisers to thousands of organizations, helping them make decisions about all aspects of information technology. Although it’s convenient for Chief Information Officers (CIOs) to subscribe to a single service for all the advice they need, is this approach reliable? It depends on whether we actually get advice from someone who has the expertise we’re missing. Far too often when relying on these services, however, we get advice from people whose range of topics is too broad to manage knowledgeably. We sometimes find ourselves being advised by someone who understands less about the topic than we do. If you’re looking for advice about data visualization products, based on what I read in Forrester’s blog, I suggest that you look elsewhere.
Evelson provided a list of features that he believes we should look for when shopping for an advanced data visualization solution. Unfortunately, his list looks as if it was constructed by visiting the websites of several vendors that claim to offer data visualization solutions and then collating the features that they offer. I expect more from a service that people pay good money to for advice. We can’t trust most vendors that sell data visualization software to tell us what we should expect from a good product. It is in their interest to promote the features that they offer, and only those features, whether they’re worthwhile or not. In fact, most vendors that offer so-called data visualization solutions know little about data visualization.
Another problem with Evelson’s advice is that it isn’t clear what he means by “advanced data visualization solutions.” What distinguishes advanced solutions from the others? Of the few features on his list that actually characterize an effective data visualization solution (most of his list misses the mark, as I’ll show in a moment), none go beyond the basic functionality that should exist in every data visualization solution, not just those that are “advanced.”
Evelson has offered the kind of analysis and advice that we get from people who dabble in data visualization, rather than those who have taken the time to develop, not just shallow talking points, but an understanding of what’s really needed and what really works.
Let’s take a look at each feature on Evelson’s list in the order presented and evaluate it’s worth.
Feature #1: “If it’s a thin client does it have Web2.0 RIA (Rich Internet Application) functionality (Flash, Flex, Silverlight, etc)?”
Response: This is a feature that only a IT guy with myopia could appreciate, not someone who actually analyzes and presents data. When evaluating software, we care about functionality and usability, not about the specific technology that delivers it. If we’re exploring and analyzing data via the Web, what matters is that interactions are smooth, efficient, easy, and seamless. How this is accomplished technically doesn’t matter.
Feature #2: “In addition to standard bar, column, line, pie charts, etc how many other chart types does the vendor offer? Some advanced examples include heat maps, bubble charts, funnel graphs, histograms, pareto chats, spider / radar diagrams, and others?”
Response: So it’s the number of chart types that matters? What constitutes a chart type? Do useless chart types count? This is a lot like giving high marks to the software programs with the most lines of programming code, as if that were a measure of quality and usefulness. What matters is that a data visualization solution supports the types of charts that do what we need and that they work really well. Many data visualization products could be dramatically improved by removing many of the silly charts that they offer rather than by adding more to the collection.
Feature #3: “Can the data be visualized via gadgets/widgets like temperature gauges, clocks, meters, street lights, etc?”
Response: Is Evelson serious? Should vendors get points for providing silly, dysfunctional display gadgets? Most of the gauges, clocks, meters, and street lights that many so-called data visualization products provide are worthless. Anyone who understands data visualization knows this to be true. This is what Evelson looks for in “advanced” data visualization solutions?
Feature #4: “Can you mash up your data with geospatial data and perform analysis based on visualisation of maps, routes, architectural layouts, etc?”
Response: While the ability to view and interact with data geo-spatially is critical, most of the “mash-ups” that vendors enable are horribly designed, and thus of little use. Throwing quantitative data onto a Google map doesn’t qualify as effective data visualization. Google maps (and other similar services) were not designed as platforms for quantitative display, but instead as sources for directions (“How do I get from here to there?”). Good geo-spatial data visualization uses maps that are designed to feature quantitative data only within the context of geo-spatial information that adds meaning to the data. What’s also important is that geo-spatial displays can be combined on the screen simultaneously with other forms of data visualization (for example, bar graphs, line graphs, tables, and so on) to provide a fuller view of the data than geography alone.
Feature #5: “Can you have multiple dynamically linked visualization panels? It’s close to impossible to analyze more than 3 dimensions (xyz) on a single panel. So when you need to analyze >3 dimensions you need multiple panels, each with 1-3 dimensions, all dynamically linked so that you can see how changing one affects another.”
Response: This is probably the clearest description on Evelson’s list of a feature that is actually useful and indeed critical. Whether the separate views of the data set appear in separate panels or not isn’t important however. What’s important is the ability visualize the data in multiple ways–that is, from multiple perspectives on the screen at once. Only then can we construct a comprehensive view and spot relationships, which would be impossible if we were forced to examine each view independently, one at a time.
Feature #6: “Animations. Clicking through 100s of time periods to perform time series analysis may be impractical. So can you animate/automate that time period journey / analysis?”
Response: So far, researchers have only found a limited role for animation in data visualization, especially for data analysis. When Hans Rosling of GapMinder uses bubble plots to tell a story, such as the correlation between literacy and fertility throughout the world and how it has changed through time, bubbles (one per country) that move to display change through time work because he is narrating–telling us where to look and what it means. Research has shown, however, that these same animated bubble plots are of limited use for data analysis. We simply cannot watch all those bubbles as they follow their independent trajectories through the plot. To compare the paths that two bubbles have taken through time by means of animation, we must mark the path with trails that provide static representations of the bubbles’ journeys. Too many software vendors are providing animations that are nothing more than cute tricks to entertain, rather than useful visualizations. We should run from any vendor that has actually taken the time to make the pointers on their silly gas gauges wobble back and forth for several seconds until they eventually stop moving and point to the value that we need.
Feature #7: “3 dimensional charts. Can you have a 3rd dimension, such as a size of a bubble on an XY axis?”
Response: Simply asking a vendor if his products support 3-D displays is the wrong question. 3-D pie charts, bar graphs, and line graphs are almost never useful. Most implementations of 3D in so-called data visualization products are either entirely gratuitous and thus distracting, or far too difficult to read. The example that Evelson gave, however–the ability to add a third quantitative variable to a scatterplot by allowing the data points to vary in size to represent a third quantitative variable–is actually useful, assuming the vendor designs this feature properly. That’s a big assumption.
Feature #8: “Can you have microcharts (aka trellis) — a two dimensional chart embedded in each row or cell on a grid?”
Response: Evelson is onto something here, but he seems a bit confused about the terms. “Microcharts” is the name of an Excel add-in product from Bonavista Systems. A microchart is a small chart, such as a sparkline or a bullet graph, which conveys rich information in a small amount of space, such as in a single spreadsheet cell. A “trellis” display, what Edward Tufte has been calling “small multiples” for many years, is something quite different. It is a series of charts that breaks a data set into logical subsets, each with the same quantitative scale, arranged within eye span on a single screen or page, for the purpose of making comparisons between the charts. For example, if the correlation between the number of sales contacts and sales revenues for 500 customers and 20 separate products would be too cluttered and complex if displayed in a single scatterplot, we might be able to solve this problem by creating a trellis display of 20 scatterplots, one per product.
Feature #9: “Can you do contextual or gestural (not instrumented, not pushing buttons, or clicking on tabs) manipulation of visualization objects, as in video games or iPhone like interface?”
Response: Evelson might be getting at something useful here, but he hasn’t distinguished the gratuitous video game-like interactions that have become all too common in many so-called data visualization products from useful interactions that are needed to uncover meanings that live in our data, which only a few products actually support. For data exploration and analysis, it’s quite useful to interact with visualizations of data directly to change the nature of the display in pursuit of meaning, such as to sort or filter data. For instance, rather than using a separate control or dialog box to remove outliers in a scatterplot, it’s useful to be able to grab them with the mouse (or with your finger on a touch screen) and simple throw them away.
Feature #10: “Is the data that is being analyzed
a) Pulled on demand from source applications?
b) Stored in an intermediary DBMS
c) Stored in memory? This last one has a distinct advantage of being much more flexible. For example, you can instantaneously reuse element as a fact or a dimension, or you can build aggregates or hierarchies on the fly.”
Response: What really matters is not where the information is stored, but how easily, flexibly, and rapidly we can access and interact with the data that we need. How this is accomplished technically needn’t concern us as long as it works.
Feature #11: “Is there a BAM-like operational monitoring functionality where data can be fed into the visualization in real time?”
Response: When real-time data updates are needed, this is a useful feature, but few data visualization solutions require real-time updates.
Feature #12: “In addition to historical analysis, does visualization incorporate predictive analytics components?”
Response: This is indeed useful, but what many vendors call “predictive analytics” are neither predictive nor analytical. Rather than simply asking vendors if they support predictive analytics (you will never get a “No” answer to this question), we should questions such as: “Can the software be used to build effective predictive models (that is, those that are statistically robust) that allow us to not only determine the probability of particular results under particular conditions, but also to see, understand, and therefore reason about the interactions between variables that contribute to that result?”
Feature #13: “Portal integration. If you have to deliver these visualizations via a portal (SharePoint, etc) do these tools have out of the box portal integration or do you need to customize.”
Response: Generic portal integration isn’t important. If you use a particular portal product and you need the analytics tools to integrate with it, then this specific requirement might be useful to you. This should not, however, be a reason to reject an otherwise effective data visualization solution. There are so few good solutions to choose from today, don’t let someone in your IT department turn away the one that’s useful to you because it doesn’t integrate neatly into your organization’s portal.
At the end of his list of features, Evelson asked, “What did I miss?” I appreciate his openness to suggestions. More than what he missed, however, I’m concerned about the features that he included that are either unimportant or that in some cases actually undermine data visualization.
Fundamentally, Evelson missed the opportunity to assess the effectiveness of data visualization solutions. Lists of features–even good ones–fail to do this. Another fundamental problem is that his list lumps all data visualization solutions together, as if every purpose for which data visualization might be used requires the same functionality. This is far from the truth. Uses of visualization for monitoring, analysis, or communication, although they share much in common, require many distinct features as well. When shopping for data visualization software, you must first know what you plan to accomplish with it and then determine the features that are specifically required for that purpose. Unless you’re planning to use a single tool for all purposes, you won’t need everything that a data visualization solution could possibly offer.
Evelson is but one of many people that organizations erroneously trust for critical advice. Regarding data visualization, he lacks the expertise that’s required and legitimately expected. Anyone who sets himself up as an adviser–especially one that organizations pay for dearly–ought to develop deep expertise in the subject matter. Before we can shop effectively for technology, we must first shop effectively for reliable sources of advice.