Data Visualization and Analysis—BI’s Blind Spots
In September, I wrote a rather scathing review of a product called Lyza from a new business intelligence (BI) vendor named LyzaSoft. Part of my criticism was that LyzaSoft erroneously claimed that Lyza qualifies as data analysis and data visualization software. A month later, a good friend and respected colleague, Colin White, took issue with my opinion of Lyza. Thus began an email exchange between us and several other leaders in the field of BI. In this exchange, Colin noticed that we all seemed to use the terms “data analysis” and “data visualization” differently, so he asked each of us to define them. Here are the definitions that I contributed to the discussion:
Data analysis
Data sense-making. The process of discovering and understanding the meanings of data. (Not to be confused with preliminary steps taken to prepare data for the process of analysis.)
Data visualization
The use of visual representations to explore, make sense of, and communicate data. As such, data visualization is a core and usually essential means to perform data analysis, and then, once the meanings have been discovered and understood, to communicate those meanings to others.
On December 17th, Colin wrote about this in an article titled “Business Intelligence Data Analysis and Visualization: What’s in a Name?” Colin did a nice job of summarizing the discussion, but I believe that the conclusions that he reached miss the mark and are typical of most traditional BI professionals.
Here are Colin’s concluding opinions:
At a detailed level, two questions dominate the discussion:
- Are data transformation and integration different from data analysis? There are many examples of applications that retrieve data from multiple sources, restructure and aggregate it, and then load the results into a data warehouse. Similarly, data federation and data streaming technologies allow users not only to do dynamic in-motion data transformation and integration, but also data aggregation and summarization. These are all examples of processes that perform some level of data analysis. The ability to clearly delineate data transformation from data analysis is fast disappearing, and to say data transformation is completely different from data analysis makes no sense.
- Is data presented for presentation purposes only a form of data visualization? The mere fact that some of the comments got into semantic debates about what is data and what is information, and about whether a user is actually analyzing the results or not, suggests that a more pragmatic viewpoint is required. From my perspective, if data or information is presented to a user in a format that aids decision making, then that constitutes data visualization.
At a more macro level, it is important to define the role of a so-called expert or specialist. Our job is to help people understand and use new and evolving technologies and products for business benefit. As such, we need to use clear definitions and terminology that aids in this understanding. However, it is important that we accept that other people may have different definitions, and we need to find common ground. Defending our positions at all costs does not aid the industry. We also have to accept that business users may employ technology and use some terms in a completely different way, and it is important to adjust our positions and explanations accordingly. Unless we do that, business intelligence will continue to be usable only by the small subset of users that employ it today.
I’ll come back to Colin’s position in a moment, but first, I’d like to provide some context for what I’m going to argue. The BI industry has done a wonderful job of providing technologies that enable us to collect, cleanse, and store huge warehouses of data. We now have enormous reservoirs of data available to us, but most people are drowning in them, unable to do the only thing that really matters: actually use the information to achieve the understanding that’s needed to make good decisions. This is predominantly a human task.
The technologies that are needed to help us make sense of data must be built on a clear understanding of what people must do to understand data and the perceptual and cognitive processes involved in the effort. In other words, the solutions that are needed require a human focus, not the technology focus that has produced the tools that we use to collect, cleanse, and store data. I believe most of the people who have done great work to enable the BI achievements in building a solid data infrastructure are locked in a technology mindset from which they can’t escape and rarely even recognize that they should escape. Almost every vendor that is currently offering real solutions for data sense-making—a rather small group—has emerged from outside the BI industry. Some have been working for years as statistical analysis vendors and most others are spin-offs of information visualization research at universities. None of the major BI vendors seem to understand data analytics at all. I don’t think this is for lack of interest or effort, but because they are focused on technology, an engineering focus, rather than the human beings who use technology, a social science and design focus. I believe that the discussion that Colin, I, and others in the industry had about data analysis and data visualization illustrates this situation.
Contrary to LyzaSoft’s claim that businesspeople use the term data analysis for the entire end-to-end process of working with data (you can read their position in Colin’s article, which he refers to as “The Vendor’s Position”), I’ve found that the people who actually work in business and elsewhere to make sense of data know that the tasks of collecting, cleansing, aggregating, and storing data are different from data analysis. The former tasks precede and support the process of data analysis by making data accessible and reliable, but they aren’t data analysis itself. These folks would much rather have the IT department build a good data warehouse for them so they aren’t bothered by having to prepare the data and can spend their time actually analyzing it. This distinction between data preparation and data analysis is not just a matter of semantics. Until vendors understand this difference, they will continue to produce so-called data analysis products that don’t work. In contrast, vendors such as Tableau, Spotfire, Advizor Solutions, Panopticon, Visual I|O, and SAS—examples of those who haven’t emerged from within the BI industry—already get this.
Now that buyers of BI software are turning their focus to the actual use of data—to data sense-making and communication—it’s tempting and all too convenient for BI vendors such as LyzaSoft to call what they do “data analysis.” This murky use of the term not only renders it vague, confusing, and for all practical purposes useless, it also prolongs the state of affairs that has given rise to our current desire for data analytics: the fact that BI vendors have failed to provide useful tools for data sense-making and communication. These tools, which we desperately need to make better decisions, have always been the central, but failed, promise of business intelligence.
The opinion that Colin expresses in response to the second issue concerns me: Â “From my perspective, if data or information is presented to a user in a format that aids decision making, then that constitutes data visualization.” I certainly agree that the goal is to achieve understanding and support decision making, but not every way of doing this is data visualization, and not everything that would like to call itself data visualization deserves the name. Information can be presented in various ways, just as it can be verbally communicated in various languages; each medium of data presentation (the spoken word, the written word, and visual representations of various types) has its strengths and weaknesses, its appropriate applications, and its rules for effective use. Saying that every presentation that aids decision making is data visualization is not a useful definition. In fact, it’s an example of what I warned against in our email discussion. Here’s what I said, as quoted in Colin’s article:
Confusion regarding terms such as data analysis and data visualization exists in the BI community because little effort has been made to sufficiently define them. Our industry tolerates a freewheeling, define-it-as-you-wish attitude toward these and other terms to the detriment of our customers. In the academic world, which I keep one foot in, a greater effort is made to define the terms to provide the shared meanings that are required to communicate, yet even in academia it gets a bit murky at times. I believe that terms are inadequately defined in the BI community in part because ours is an industry that has largely been defined for marketing purposes, rather than as a rational discipline. It serves the interests of software vendors to keep the terms vague.
I agree that we must be open to one another’s ideas and definitions, but I believe the goal of this openness, after thinking long and hard, is to narrow, not expand, our use of these terms. As it is today, these terms are barely useful because they are defined too loosely, broadly and inconsistently. Expanding the definitions will only add to the problem.
I’ll conclude this blog post as Colin ended his article, with the following question and invitation: “What do you think?”
11 Comments on “Data Visualization and Analysis—BI’s Blind Spots”
I must admit I was disappointed with Steve’s reply. It simply regurgitates his earlier position and doesn’t really add anything to the discussion. It also largely ignores the input from other folks and the main thrust of the argument in my article. I also don’t like the vendors being brought into the discussion again. It detracts from the discussion and raises questions about the motivation behind the opinions being expressed.
One thing Steve’s reply did help me with is possibly explaining why some of his perspectives are different from mine. His definition of BI is quite different from mine. He has a somewhat outdated view of BI that shares the common misconception that BI and data warehousing are closely linked and one of the same. This is not true in today’s modern decision-making world. This misconception is why Claudia Imhoff and I recently introduced the idea of decision intelligence to demonstrate how the world of BI is changing.
Colin,
I have not ignored the opinions expressed by you or others; I have disagreed with them and explained the nature of my disagreement. You say that you “don’t like vendors being brought into the discussion again,” but a significant portion of your article involved stating a vendor’s position. The only difference is that I named the vendor, which is appropriate and adds to the discussion. No one should participate in discussions of this nature anonymously. People’s motivations should be taken into account when evaluating their arguments.
You say that my definition of BI is different from yours and outdated, but you haven’t provided your definition, nor did I provide a definition in my comments, so there’s no way of knowing if and how our definitions differ. I believe that the term BI was originally coined and promoted as a way to breathe new life into the data warehousing industry, which had grown stagnant. Although there is no consensus regarding the definition of “business intelligence,” which I believe is a problem, essentially I define it as “technologies, practices, and activities that enable businesses (and organizations of all types) to use information to achieve the level of understanding that’s needed to pursue their objectives based on intelligent decisions.” This is the focus of my work. Is my understanding of business intelligence antiquated? I assume that you and Claudia have coined the new term “decision intelligence” either because you feel that the term “business intelligence” lacks clarity or an emphasis on what’s needed, or that the BI industry has failed to deliver what’s needed, so its time to come up with a new term to push things in the right direction.
I’ve worked in the business intelligence industry for many years, just as you and Claudia have. I’ve grown frustrated with the degree to which, after all these years, this industry continues to throw technologies at customers that are often poorly designed, largely ineffective, and rarely provide people with the tools they need to make sense of data and therefore use it to achieve their goals. In the last few years, I’ve been encouraged by the efforts of a new breed of vendors that address this need. I cringe whenever I see a new product hit the market from the narrow technology-centric, human-ignorant mindset that does nothing to address the real needs that people who work with data face.
This discussion began after I critiqued LyzaSoft new product Lyza, exposing the fact that many of it’s marketing claims were false and that its attempts to provide data analysis and data visualization functionality, which were featured in their claims, were poorly designed and ineffective. At no point during the discussion that has since ensued has anyone responded to the substance of my claims by attempting to show how my assessment was incorrect. Instead, you and apparently LyzaSoft (based on the quotes that you included in your article) have argued that I should open myself to definitions of “data analysis” and “data visualization” that in my opinion rob these terms of all useful meaning by assigning to them every single activity that people engage in related to data. As I’ve said from the beginning, Lyza might do things that are useful and effective outside of the realm of data analysis and data visualization. For the sake of the folks who work at LyzaSoft and anyone who buys the product, I hope it does.
I responded to your article here in my blog to invite all who are interested to offer ideas and engage with one another around those ideas. You’ve suggested that you have fresh ideas about business intelligence (what you call “decision intelligence”) that relate to this discussion. I invite you to share them so I and others can respond.
Stephen rightly says that “the only thing that really matters” is “us[ing] the information to achieve the understanding that’s needed to make good decisions. This is predominantly a human task.”
I’d add that there is something crucial between understanding and decision, namely deliberation. Most decisions of any significance are made by “weighing up” the arguments for and against various options. Those arguments are often grounded in data. The role of data analysis is to transform data into useful arguments. The understanding generated by analysis informs deliberation. If data analyis is a “higher level” activity than data visualisation, so analytically-informed argumentation is higher level than data analysis.
Further, just as data can be visualised, so can deliberation. That is, we can visually display the structure of complex, data-grounded cases.
For some related discussion see http://blog.austhink.com/blogcisive/2008/06/the-missing-i-in-bi/
I like your definition of data analysis as data sense-making, Stephen. I assume you don´t mean to exclude all kinds of data transformations from the task of data analysis. There are things you need to do to prepare data for analysis but I also think that data transformations sometimes are an integral part of the analysis. In fact one could argue that BI solutions sometimes do too much “preparation” of the data before it´s given to the analyst and that they thereby end up restricting the analyst and limit the kinds of questions that the analyst can ask. In that sense I think it is indeed difficult to “clearly delineate data transformation from data analysis” as Colin points out.
There are some interesting papers on sense-making in the human computer interaction literature. I am thinking in particular of a paper by Russell, Stefik, Pirolli and Card (The cost structure of sense-making. Proceedings of INTERCHI ’93; 1993 April 24-29; Amsterdam; the Netherlands. Amsterdam: IOS Press; 1993; 269-276.). The authors describe the process of sense-making as an iterative search for a representation of the problem, that reduces the information processing cost. Ideally you´d like to finally end up with a representation of the problem that makes the answer obvious. The authors argue that this view of sense-making has implications for the integrated design of user interfaces, representational tools and information retrieval systems.
I´ve noticed that tools that tries to provide an integrated user experience for both data retrieval and data representation fall into two broad classes:
1) Process-centric tools allow the analyst to manipulate a representation of the analysis process until she arrives at an output that produce a good results. This is the approach taken by tools such as Spotfire Miner, SPSS Clementine, SAS Enterprise Miner, Pipeline Pilot etc. It seems to me that Lyza falls into this camp as well. A benefit of the process-centric approach is that it´s self-documenting and it´s easy to see how the final process could be automated. The downside is that the UI is somewhat indirect. Instead of manipulating the data directly you are describing how data should be manipulated. This may appeal more to engineers and software programmers (a common programmer joke is that a programmer rather writes programs that writes programs that writes programs than writes programs) than to the business community at large although I haven´t seen any specific research to support this claim in the context of data analysis.
2) Data centric tools allow the analyst to manipulate representations of the data directly until she arrives at a conclusion. This is the approach taken by Spotfire Professional, Tableau, Excel etc. The benefit is that the analyst works on the problem more directly and the solution may be obvious when a well-designed tool is used. However the downside is that it is not always easy to see how you arrived at the end result.
Whatever the shortcomings are of the tools mentioned above it would be interesting to see which approach works best for what kinds of users and what kinds of analysis tasks. I´m not prepared to make a judgement one way or another yet, although I do suspect that the answer is going to be “it depends”.
Tobias,
Thanks for the thoughtful response. Below are my responses to some of your specific points, which I’ve repeated as quotes:
“I assume you don´t mean to exclude all kinds of data transformations from the task of data analysis.”
Yes, you assume correctly that I make a distinction between the technology-centric tasks of extracting, cleansing, integrating, transforming, and loading data that are done to prepare data for analysis and the data transformations that must often be done during the course of analysis to do such things as create new calculated fields ad hoc groupings.
“One could argue that BI solutions sometimes do too much ‘preparation’ of the data before it´s given to the analyst and that they thereby end up restricting the analyst and limit the kinds of questions that the analyst can ask.”
What you point out here is often a problem with data warehouses and other means of preparing data for analysis, including the semantic layers that products such as Business Objects support. When data can only be accessed through these limited and sometimes ineffectively designed access points, analysts can’t get to what they need.
“Process-centric tools allow the analyst to manipulate a representation of the analysis process until she arrives at an output that produce a good results. This is the approach taken by tools such as Spotfire Miner, SPSS Clementine, SAS Enterprise Miner, Pipeline Pilot etc.”
Although I’m not familiar with all of these tools, those that I am familiar with are quite different from Lyza in that they actually provide the means to interact with data analytically. Lyza’s primitive, poorly-designed charts and PowerPoint-like presentation aren’t useful for data exploration and sense-making.
“The downside [of data-centric analysis tools] is that it is not always easy to see how you arrived at the end result.”
This is an important point that is currently being worked on by some of the good data analysis vendors that I mentioned. For instance, I saw a wonderful presentation by Jeff Heer at InfoVis2008 about ways that good visual analysis tools can become self-documenting, based on research the he did for Tableau. As you point out, there is a significant difference between interactive analysis tools that document a free-flowing analytical process and programmer-oriented tools that allow you to determine and describe “how data should be manipulated.†A related approach, which is quite useful, can be seen in analysis tools that allow free-flowing, interactive analysis, but then also let analysts create analytical applications (that is, predefined analytical processes that can be used by others for routine analysis that always involve a defined series of steps).
Perhaps to use the word “definition” with respect to our views on BI was misleading. It’s just that throughout your blog reply you constantly use the word BI when you really mean data warehousing. You confuse the two terms which suggests you have this outdated view that BI and data warehousing are the same thing. They are not. The constant association of BI with data warehousing is why Claudia and I use the term decision intelligence instead. Our objective is to break the connection between BI and data warehousing and hope people such as yourself stop perpetuating this misconception.
I view you as specializing in visual data analysis. This is a combined subset of the separate fields of data analysis and data visualization, which have a much broader scope. I think not seeing this distinction is why you fail to see and understand where other folks such as me are coming from. I will write another blog on this. A good example here is the comment from Tobias which says, “I assume you don´t mean to exclude all kinds of data transformations from the task of data analysis.” The answer to this question is that you do exclude all forms of data transformation (in data preparation) from data analysis. This is one of the key areas where you and I disagree.
I also think another point Tobias made is also very important. He talked about process centric tools. This is a growing field of BI. One problem with your outdated view of BI is that data warehousing is primarily data centric. As we move toward the use of operational BI we need to have a more process centric view of our business. Data centric approaches don’t work in this environment. More on this to come. I hope I have added to the conversation. Colin.
Colin,
Despite the fact that many people, including vendors and consultants, use the terms “data warehousing†and “business intelligence†synonymously, personally I define the terms differently. When I stated in my original blog post that “the BI industry has done a wonderful job of providing technologies that enable us to collect, cleanse, and store huge warehouses of data†I was making two points:
1)Despite the introduction of the term “business intelligence†in the 1990s, those vendors that until then called themselves data warehousing vendors simply starting using the new term without really changing what they did. This is what I meant when I said that the term “business intelligence” has been use primarily for marketing purposes, to revive the ailing data warehousing industry.
(2)When I look at the significant accomplishments of vendors consider themselves part of the business intelligence industry, these accomplishments primarily involve technologies for collecting, cleansing, integrating, transforming, storing, and the production reporting of data. Little of use has emerged from the those who call themselves business intelligence vendors that actually helps people use information intelligently, that is in the form of tools for exploring, making sense of, and communicating information.
My hope that the term “decision intelligence,†which you and Claudia have coined, was intended to place greater emphasis on data sense-making and communication, based on your explanation appears to be incorrect. Despite the importance of what you’re trying to do, I believe that the business intelligence industry’s lack of support for the actual use of data for sense-making and communication is a much greater and more direct barrier to intelligent decisions than confusion between data warehousing and business intelligence.
Regarding your understanding of my field of expertise, actually I specialize in all forms of quantitative data visualization, not just visual analysis. As you know, neither of my two existing books have anything to do with data analysis, visual or otherwise, but focus exclusively on the data presentation aspects of data visualization. I believe that I actually do understand where you’re coming from; we simply don’t agree on a few points. If I don’t understand, however, I doubt that it’s because I now specialize in data visualization after many years of working in all aspects of data warehousing and business intelligence.
Regarding process-centric products, I fully embrace those that work well and believe that they can contribute greatly to data analysis. I blend process-centric and data-centric approaches quite fluidly in my work and long for tools that do the same.
I’m reminded by this discussion of the old, rather gruesome and probably very un-PC saying “there’s more than one way to skin a catâ€! This particular cat has arguably been tortured to death, so I’ve been hesitating to add to its pain. But as “Blogger 3†in Colin’s article (www.b-eye-network.com/view/9336), the author of a white paper sponsored by Lyzasoft (see my own blog for details and reference http://www.b-eye-network.co.uk/blogs/devlin/) and one of the original “inventors†of data warehousing, I’d like to try to apply some healing balm to this long-suffering animal.
Harking back to Stephen’s original post in September, he used the phrase “a data analyst’s nightmare†in the title. While the debate since then has ranged far and wide into the possible meanings of data analysis, Business Intelligence, visualization, data integration and many other terms, I believe the key questions are: what do data analysts do and what do we believe they should do?
After more than 20 years of experience in what data analysts do, I suggest that the most accurate answer to the first question is: whatever they need to do in order to get results to the real business problems with which they’re faced. Their actions are driven by factors such as (1) data availability, cleanliness and consistency, or lack thereof, (2) their own personal way of seeing and processing information and (3) the tools available to them.
The first factor often determines the extent to which they see data cleansing and integration as part of data analysis. One of the original goals of data warehousing and BI was to reduce the cleansing and integration burden on data analysts. Over the years, I’ve learned that this goal can only ever be “mostly†achieved. Skilled and innovative data analysts can and do come up with very obscure and unexpected data sourcing needs.
With regard to the second factor, there are certainly any number of best practices that can be brought to bear on how to see and process information. Whether these best practices are all agreed and consistent with one another I leave to others who know that field better that I do. However, my experience is that there is much variability between data analysts in how they see and process information. Some are very visual and like to start from the overview. Others prefer to see detail and tabular information. Still others are process-oriented and -driven.
I believe that factors (1) and (2) lead directly to the conclusion that there is no single answer to the tooling needed by data analysts. Spreadsheets, although widely derided by the mainstream BI community, are used by data analysts because they do the job and they work in a way that has become familiar. Mining, OLAP, visualization and other tools are widely used by other data analysts and certainly have their strengths and weaknesses too. Furthermore, these two same factors lead businesses and their data analysts to choose the tool(s) that work for them.
Which leads to the second part of my question above: what do we (industry experts and consultants) believe data analysts should do? Of course, the question is bigger than data analysts, but, in general our role is to try to give independent and unbiased thought leadership in the wide and ill-defined field of Business Intelligence. Certainly, if marketers go overboard in their claims or commonly understood terms get stood on their heads, we need to make clear what’s happening. But we all have our own opinions and positions, and there is always more than one valid point of view (or way to skin a cat). We cannot hope to come down to one, single, agreed and unchanging set of definitions of the terms used in the industry. But we can all agree to respect the fact the one person’s “nightmare†may be another’s “dreamâ€.
In the spirit of the day that’s in it – I wish you all a peaceful and joyous Christmas.
Barry.
Barry,
In the same gracious spirit of your final comments, I’d like to thank you for weighing in so thoughtfully. I agree with almost everything you wrote, and certainly with the premise that shapes your case—that the answer to the question “what should analysts do?†is “whatever they need to do in order to get results to the real business problems with which they’re faced.†If BI vendors really understood what people need to do and took the time to design solutions that would actually help them do these things more easily and effectively, we would soon be living in a much different world.
I can understand why you and others became excited about Lyza. The problem the folks at LyzaSoft recognized and attempted to solve with Lyza is real and significant. It’s sad that most analysts still spend most of their time whipping data into shape before they begin to explore and analyze it, and then must step back into data fix mode again and again between brief bursts of actual analytical activity. Because this process is so time-consuming and difficult, analysts rarely get around to any actual data exploration, open to the possibility of unexpected discoveries, but spend their time only looking for answers to specific questions and then eagerly accept the first answer that presents itself.
Let me remind everyone of LyzaSoft’s claim, that Lyza is a “powerful desktop analytics solution that enables analysts to synthesize, explore and visualize data.†Had LyzaSoft actually delivered what they claimed — an end-to-end analytical solution — I would have shared your enthusiasm. But they didn’t. Lyza does not provide the means to explore, analyze, and then present information effectively. What it perhaps does is provide a means to more easily integrate data from various sources and improve it through various data transformations, and do so in a self-documenting manner. Assuming that they’ve done this effectively, by failing to give analysts useful means to explore, analyze, and present information, they have provided middleware that sits between data sources on one end and data analysis and presentation tools on the other. In other words, they have provided an ETL (data extract, transform, and load) tool that can be used by analysts. Along with this Lyza documents the process, which could warm the hearts of IT department as a possible means of monitoring and perhaps better controlling what people do with data. If these capabilities were actually integrated into and end-to-end analytical solution, Lyza might be useful; as it is, however, Lyza supports the fragmentation of data analysis into processes that must be done on different platforms. Once data has been assembled, another tool must be used for analysis.
With what were probably the best of intentions, LyzaSoft approached a real problem as any engineering organization would. Don’t misunderstand me; I have great respect for software engineers. I worked as one for many years. My point is that, unless BI vendors approach problems from a design perspective, they build products that only an engineer could love. The code might be elegant, but the problem isn’t solved in a way that actually works for people in the real world.
As long as data analysts must collect information from various sources, then integrate, cleanse, and transform it before its useful, solutions like the one LyzaSoft attempted will be needed. However, I think the vendors that are in the best position to deliver these integrated analytical solutions are those that already understand and effectively support data analysis and presentation. Some of the best analytical products out there already provide the means to integrate data from various sources and transform it in useful ways, and do so in a manner that is tightly integrated into the analytical process. Enhancing already well-designed software to fill in some of the gaps is a more promising path to a solution.
I use strong words to pound on vendors like LyzaSoft, not because I don’t appreciate their intentions and efforts, but because those intentions and efforts are wasted if they don’t design real solutions. When this happens, neither the vendor nor its customers benefit, and the BI industry as a whole suffers. I don’t speak softly, because that approach hasn’t worked. I just watched the new film about Harvey Milk yesterday and was reminded of another cause that made no headway until voices were raised in anger and sometimes provocative protest. I’m not equating the need for better BI software with the need for gay rights. Though the cause that I support is less important in the grand scheme of things, it is important to me.
Leaving the general issue aside for a moment, I’d like to address a particular point that you made. You wrote:
While it is certainly true that data analysts have their own styles and preferences, what your statement seems to overlook is the fact that many of the best methods, approaches, means of representing data, ways of interacting with data, analysis techniques, and so on that effectively support the data analysis process do so, not because of analysts’ preferences, but because they are aligned with the way the human perceptual system and brain work. An analyst might be inclined to work with tables of data, but no matter how talented she is, she will never spot most of the trends, patterns, and exceptions that reside in the information until she examines it in visual form. An analyst might feel comfortable looking at one graph at a time, but until he can see the data represented from several perspectives simultaneously, perhaps in several charts, he will never recognize some of the relationships or be able to make all of the useful comparisons. An analyst might like to dive right into the details, but unless she has an overview of the data, she will waste a great deal of time traveling pointless paths. Most of what makes software effective is not an emphasis on serving people’s preferences, but an emphasis on doing what actually works.
We know what works in many cases because of a great and expanding body of research into the abilities, limitations, and basic workings of human perception and cognition. Unless this knowledge informs the work of business intelligence vendors, this industry will never effectively support analytics. I spent many years working in the fields of data warehousing and business intelligence from a technology-engineering perspective, and my achievements were always limited, especially my efforts to build systems that people would interact with directly to explore and make sense of data. Between then and now, I‘ve struggled to find better ways to solve the problems people face when using data to improve decision making. I didn’t get involved in data visualization because I’m a visually-oriented person (I’m actually much more of a words person), but because, over time, I began to recognize its potential for solving many of the real problems that people struggle with today.
I’m eager for better business intelligence solutions. In my eagerness, I sometimes lash out when vendors make exaggerated claims and sell ineffective solutions. I do so because people deserve better and I’m tired of seeing their hopes enlivened and then dashed by shallow promises.
At the risk of continuing a debate that has probably gone on too long already, I’d like to cautiously provide my own opinion. As Director of 2 overlapping yet distinct depts (BI and Analytics) my respective staffs are responsible for the end to end processes to get from raw data to analytical insight.
There are many steps involved, and we do them in this order:
1) Understanding Business Neeeds. (BI)
2) Translating that to table and schema layouts in the various disparate systems so we may instruct IT and the DBAs on how to setup the ETL process, the timings and frequencies, what latency to use when pulling the data, any formulas or filters to apply when moving the data to the data warehouse (BI).
3) Authoring reports to enable business users to get the data on demand from a single BI tool (BI).
4) Then the Analytics staff takes over. Their skill set overlaps the BI team’s somewhat, but they are more adept at working with large data sets (not the raw data, but the data from the warehouse).
5) They also understand visual best practices, and how to explore and iteratively analyze the datasets for outliers, trends, drivers, etc.
6) Finally they convey results and insights in brief analyses and narrative “white papers” with supportng graphs, to more senior Business users.
This, to me, represents the full lifecycle, from raw data to insight. The BI folks would take the process up to the point where the average business user could get tactical and operational info via reports on their own. The Analytics staff would proactively support strategy and Leadership with data easily obtained from BI’s efforts, the same data the average business users utilizes, but in aggregate, over a longer time frame, to identify issues in process, strategy, etc. To use an analogy, the typical users worry about getting their flight from NY to LA on time, with no lost luggage, etc. The more executive staff want to know if the NY to LA flight is profitable, whether to add additional flights, etc.
Lyzasoft would make life somewhat easier for the BI process described above in the short run, IF you had no formal BI team or BI toolset. But it would actually complicate matters over the long run. The last thing you want is a proliferation of spreadmarts, and it seems the product offers false theoretical rigor to those data manipulations by “documenting” transformations and combining of data.
I ultimately would really only want data transformations to be applied once, by a centralized team to ensure consistency, and in the ETL process or embedded in a calculation in my centralized BI tool, instead of in some new semantic layer of logic. Further, the root problems with doing stuff in Excel spreadmarts are not solved with Lyzasoft…you can still have too many incompetent hands in the soup…each sorting incorrectly, applying incorrect formulas, saving in the lower version of Excel and losing records past row 16,000, etc.
If you’re going to leverage your data right, you have to have staff whose core competency aligns with those I described above. Having some accountant or accounts payable clerk use a tool like Lyzasoft should not help anyone who relies on the output sleep any better at night. Sure it will document their errors, but how readily apparent will they be?
In conclusion, I’d like to use a famous American phrase: “thanks… but no thanks, on that bridge to nowhere”.
BI journalist Ted Cuzzillo recently joined in this discussion on his own blog with a post entitled, “Some of us like to name things in BI.” You can read the blog entry at http://datadoodle.com/2009/01/06/some-of-us-like-to-name-things/.