In September, I wrote a rather scathing review of a product called Lyza from a new business intelligence (BI) vendor named LyzaSoft. Part of my criticism was that LyzaSoft erroneously claimed that Lyza qualifies as data analysis and data visualization software. A month later, a good friend and respected colleague, Colin White, took issue with my opinion of Lyza. Thus began an email exchange between us and several other leaders in the field of BI. In this exchange, Colin noticed that we all seemed to use the terms “data analysis” and “data visualization” differently, so he asked each of us to define them. Here are the definitions that I contributed to the discussion:
Data sense-making. The process of discovering and understanding the meanings of data. (Not to be confused with preliminary steps taken to prepare data for the process of analysis.)
The use of visual representations to explore, make sense of, and communicate data. As such, data visualization is a core and usually essential means to perform data analysis, and then, once the meanings have been discovered and understood, to communicate those meanings to others.
On December 17th, Colin wrote about this in an article titled “Business Intelligence Data Analysis and Visualization: What’s in a Name?” Colin did a nice job of summarizing the discussion, but I believe that the conclusions that he reached miss the mark and are typical of most traditional BI professionals.
Here are Colin’s concluding opinions:
At a detailed level, two questions dominate the discussion:
- Are data transformation and integration different from data analysis? There are many examples of applications that retrieve data from multiple sources, restructure and aggregate it, and then load the results into a data warehouse. Similarly, data federation and data streaming technologies allow users not only to do dynamic in-motion data transformation and integration, but also data aggregation and summarization. These are all examples of processes that perform some level of data analysis. The ability to clearly delineate data transformation from data analysis is fast disappearing, and to say data transformation is completely different from data analysis makes no sense.
- Is data presented for presentation purposes only a form of data visualization? The mere fact that some of the comments got into semantic debates about what is data and what is information, and about whether a user is actually analyzing the results or not, suggests that a more pragmatic viewpoint is required. From my perspective, if data or information is presented to a user in a format that aids decision making, then that constitutes data visualization.
At a more macro level, it is important to define the role of a so-called expert or specialist. Our job is to help people understand and use new and evolving technologies and products for business benefit. As such, we need to use clear definitions and terminology that aids in this understanding. However, it is important that we accept that other people may have different definitions, and we need to find common ground. Defending our positions at all costs does not aid the industry. We also have to accept that business users may employ technology and use some terms in a completely different way, and it is important to adjust our positions and explanations accordingly. Unless we do that, business intelligence will continue to be usable only by the small subset of users that employ it today.
I’ll come back to Colin’s position in a moment, but first, I’d like to provide some context for what I’m going to argue. The BI industry has done a wonderful job of providing technologies that enable us to collect, cleanse, and store huge warehouses of data. We now have enormous reservoirs of data available to us, but most people are drowning in them, unable to do the only thing that really matters: actually use the information to achieve the understanding that’s needed to make good decisions. This is predominantly a human task.
The technologies that are needed to help us make sense of data must be built on a clear understanding of what people must do to understand data and the perceptual and cognitive processes involved in the effort. In other words, the solutions that are needed require a human focus, not the technology focus that has produced the tools that we use to collect, cleanse, and store data. I believe most of the people who have done great work to enable the BI achievements in building a solid data infrastructure are locked in a technology mindset from which they can’t escape and rarely even recognize that they should escape. Almost every vendor that is currently offering real solutions for data sense-making—a rather small group—has emerged from outside the BI industry. Some have been working for years as statistical analysis vendors and most others are spin-offs of information visualization research at universities. None of the major BI vendors seem to understand data analytics at all. I don’t think this is for lack of interest or effort, but because they are focused on technology, an engineering focus, rather than the human beings who use technology, a social science and design focus. I believe that the discussion that Colin, I, and others in the industry had about data analysis and data visualization illustrates this situation.
Contrary to LyzaSoft’s claim that businesspeople use the term data analysis for the entire end-to-end process of working with data (you can read their position in Colin’s article, which he refers to as “The Vendor’s Position”), I’ve found that the people who actually work in business and elsewhere to make sense of data know that the tasks of collecting, cleansing, aggregating, and storing data are different from data analysis. The former tasks precede and support the process of data analysis by making data accessible and reliable, but they aren’t data analysis itself. These folks would much rather have the IT department build a good data warehouse for them so they aren’t bothered by having to prepare the data and can spend their time actually analyzing it. This distinction between data preparation and data analysis is not just a matter of semantics. Until vendors understand this difference, they will continue to produce so-called data analysis products that don’t work. In contrast, vendors such as Tableau, Spotfire, Advizor Solutions, Panopticon, Visual I|O, and SAS—examples of those who haven’t emerged from within the BI industry—already get this.
Now that buyers of BI software are turning their focus to the actual use of data—to data sense-making and communication—it’s tempting and all too convenient for BI vendors such as LyzaSoft to call what they do “data analysis.” This murky use of the term not only renders it vague, confusing, and for all practical purposes useless, it also prolongs the state of affairs that has given rise to our current desire for data analytics: the fact that BI vendors have failed to provide useful tools for data sense-making and communication. These tools, which we desperately need to make better decisions, have always been the central, but failed, promise of business intelligence.
The opinion that Colin expresses in response to the second issue concerns me: ”From my perspective, if data or information is presented to a user in a format that aids decision making, then that constitutes data visualization.” I certainly agree that the goal is to achieve understanding and support decision making, but not every way of doing this is data visualization, and not everything that would like to call itself data visualization deserves the name. Information can be presented in various ways, just as it can be verbally communicated in various languages; each medium of data presentation (the spoken word, the written word, and visual representations of various types) has its strengths and weaknesses, its appropriate applications, and its rules for effective use. Saying that every presentation that aids decision making is data visualization is not a useful definition. In fact, it’s an example of what I warned against in our email discussion. Here’s what I said, as quoted in Colin’s article:
Confusion regarding terms such as data analysis and data visualization exists in the BI community because little effort has been made to sufficiently define them. Our industry tolerates a freewheeling, define-it-as-you-wish attitude toward these and other terms to the detriment of our customers. In the academic world, which I keep one foot in, a greater effort is made to define the terms to provide the shared meanings that are required to communicate, yet even in academia it gets a bit murky at times. I believe that terms are inadequately defined in the BI community in part because ours is an industry that has largely been defined for marketing purposes, rather than as a rational discipline. It serves the interests of software vendors to keep the terms vague.
I agree that we must be open to one another’s ideas and definitions, but I believe the goal of this openness, after thinking long and hard, is to narrow, not expand, our use of these terms. As it is today, these terms are barely useful because they are defined too loosely, broadly and inconsistently. Expanding the definitions will only add to the problem.
I’ll conclude this blog post as Colin ended his article, with the following question and invitation: “What do you think?”