Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Big Data, Big Dupe: A Progress Report

February 23rd, 2018

My new book, Big Data, Big Dupe, was published early this month. Since its publication, several readers have expressed their gratitude in emails. As you can imagine, this is both heartwarming and affirming. Big Data, Big Dupe confirms what these seasoned data professionals recognized long ago on their own, and in some cases have been arguing for years. Here are a few excerpts from emails that I’ve received:

I hope your book is wildly successful in a hurry, does its job, and then sinks into obscurity along with its topic.  We can only hope! 

I hope this short book makes it into the hands of decision-makers everywhere just in time for their budget meetings… I can’t imagine the waste of time and money that this buzz word has cost over the past decade.

Like yourself I have been doing business intelligence, data science, data warehousing, etc., for 21 years this year and have never seen such a wool over the eyes sham as Big Data…The more we can do to destroy the ruse, the better!

I’m reading Big Data, Big Dupe and nodding my head through most of it. There is no lack of snake oil in the IT industry.

Having been in the BI world for the past 20 years…I lead a small (6 to 10) cross-functional/cross-team collaboration group with like-minded folks from across the organization. We often gather to pontificate, share, and collaborate on what we are actively working on with data in our various business units, among other topics.  Lately we’ve been discussing the Big Data, Big Dupe ideas and how within [our organization] it has become so true. At times we are like ‘been saying this for years!’…

I believe deeply in the arguments you put forward in support of the scientific method, data sensemaking, and the right things to do despite their lack of sexiness.

As the title suggests, I argue in the book that Big Data is a marketing ruse. It is a term in search of meaning. Big Data is not a specific type of data. It is not a specific volume of data. (If you believe otherwise, please identify the agreed-upon threshold in volume that must be surpassed for data to become Big Data.) It is not a specific method or technique for processing data. It is not a specific technology for making sense of data. If it is none of these, what is it?

The answer, I believe, is that Big Data is an unredeemably ill-defined and therefore meaningless term that has been used to fuel a marketing campaign that began about ten years ago to sell data technologies and services. Existing data products and services at the time were losing their luster in public consciousness, so a new campaign emerged to rejuvenate sales without making substantive changes to those products and services. This campaign has promoted a great deal of nonsense and downright bad practices.

Big Data cannot be redeemed by pointing to an example of something useful that someone has done with data and exclaiming “Three cheers for Big Data,” for that useful thing would have still been done had the term Big Data never been coined. Much of the disinformation that’s associated with Big Data is propogated by good people with good intentions who prolong its nonsense by erroneously attributing beneficial but unrelated uses of data to it. When they equate Big Data with something useful, they make a semantic connection that lacks a connection to anything real. That semantic connection is no more credible than attributing a beneficial use of data to astrology. People do useful things with data all the time. How we interact with and make use of data has been gradually evolving for many years. Nothing that is qualitatively different about data or its use emerged roughly ten years ago to correspond with the emergence of the term Big Data.

Although no there is no consensus about the meaning of Big Data, one thing is certain: the term is responsible for a great deal of confusion and waste.

I read an article yesterday titled “Big Data – Useful Tool or Fetish?” that exposes some failures of Big Data. For example, it cites the failed $200,000,000 Big Data initiative of the Obama administration. You might think that I would applaud this article, but I don’t. I certainly appreciate the fact that it recognizes failures associated with Big Data, but its argument is logically flawed. Big Data is a meaningless term. As such, Big Data can neither fail nor succeed. By pointing out the failures of Big Data, this article endorses its existence, and in so doing perpetuates the ruse.

The article correctly assigns blame to the “fetishization of data” that is promoted by the Big Data marketing campaign. While Big Data now languishes with an “increasingly negative perception,” the gradual growth of skilled professionals and useful technologies continue to make good uses of data, as they always have.

Different Tools for Different Tasks

February 19th, 2018

I am often asked a version of the following question: “What data visualization product do you recommend?” My response is always the same: “That depends on what you do with data.” Tools differ significantly in their intentions, strengths, and weaknesses. No one tool does everything well. Truth be told, most tools do relatively little well.

I’m always taken by surprise when the folks who ask me for a recommendation fail to understand that I can’t recommend a tool without first understanding what they do with data. A fellow emailed this week to request a tool recommendation, and when I asked him to describe what he does with data, he responded by describing the general nature of the data that he works with (medical device quality data) and the amount of data that he typically accesses (“around 10k entries…across multiple product lines”). He didn’t actually answer my question, did he? I think this was, in part, because he and many others like him don’t think of what they do with data as consisting of different types of tasks. This is a fundamental oversight.

The nature of your data (marketing, sales, healthcare, education, etc.) has little bearing on the tool that’s needed. Even the quantity of data has relatively little effect on my tool recommendations unless you’re dealing with excessively large data sets. What you do with the data—the tasks that you perform and the purposes for which you perform them—is what matters most.

Your work might involve tasks that are somewhat unique to you, which should be taken into account when selecting a tool, but you also perform general categories of tasks that should be considered. Here are a few of those general categories:

  • Exploratory data analysis (Exploring data in a free-form manner, getting to know it in general, from multiple perspectives, and asking many questions to understand it)
  • Rapid performance monitoring (Maintaining awareness of what’s currently going on as reflected in a specific set of data to fulfill a particular role)
  • A routine set of specific analytical tasks (Analyzing the data in the same specific ways again and again)
  • Production report development (Preparing reports that will be used by others to lookup data that’s needed to do their jobs)
  • Dashboard development (Developing displays that others can use to rapidly monitor performance)
  • Presentation preparation (Preparing displays of data that will be presented in meetings or in custom reports)
  • Customized analytical application development (Developing applications that others will use to analyze data in the same specific ways again and again)

Tools that do a good job of supporting exploratory data analysis usually do a poor job of supporting the development of production reports and dashboards, which require fine control over the positioning and sizing of objects. Tools that provide the most flexibility and control often do so by using a programming interface, which cannot support the fluid interaction with data that is required for exploratory data analysis. Every tool specializes in what it can do well, assuming it can do anything well.

In addition to the types of tasks that we perform, we must also consider the level of sophistication to which we peform them. For example, of you engage in exploratory data analysis, the tool that I recommend would vary significantly depending on the depth of your data analysis skills. For instance, I wouldn’t recommend a complex statistical analysis product such as SAS JMP if you’re untrained in statistics, just as I wouldn’t recommend a general purpose tool such as Tableau Software if you’re well trained in statistics, except for performing statistically lightweight tasks.

Apart from the tasks that we perform and the level of skill with which we perform them, we must also consider the size of our wallet. Some products require a significant investment to get started, while others can be purchased for an individual user at little cost or even downloaded for free.

So, what tool do I recommend? It depends. Finding the right tool begins with a clear understanting of what you need to do with data and with your ability to do it.

Take care,

Introducing www.Stephen-Few.com

December 27th, 2017

I’ve ended my public Visual Business Intelligence Workshops and quarterly Visual Business Intelligence Newsletter, in part, to make time for other ventures. You have perhaps noticed that here, in my Perceptual Edge blog articles, I sometimes veer from data visualization to reflect my broader interests. In this blog, I’ve usually tried to my keep topics at least tangentially related to data sensemaking, but I now find this too confining. Going forward, I’d like to release the reins and write about any topic that might benefit from my perspective. Rather than expanding the scope of Perceptual Edge for this purpose, however, I’ve created a new website—www.Stephen-Few.com—as a venue for all of my other interests.

If you’ve found my work useful in the past, you might find the blog on my new website useful as well. I promise, I won’t waste your time with self-indulgent articles. Most of these articles will address the following topics:

  • Ethics, especially ethical approaches to the development and use of information technologies
  • Critical thinking
  • Effective communication
  • Brain science
  • Scientific thinking
  • Skepticism
  • Deep learning

I will feel free, however, to venture beyond these when so inspired.

When I write about data visualization or other aspects of data sensemaking, I’ll continue to post those articles here in my www.PerceptualEdge.com blog as well. Other articles, however, will only be posted in my www.Stephen-Few.com blog.

To launch the new website, I posted my first blog article there today titled Beware Incredible Technology-Enabled Futures. In it, I expose the frightening nonsense of a new TED talk titled “Three Steps to Surviving the Robot Revolution,” by “data philosopher” and “serial entrepreneur” Charles Radclyffe.

Take care,

There’s Nothing Mere About Semantics

December 13th, 2017

Disagreements and confusion are often characterized as mere matters of semantics. There is nothing “mere” about semantics, however. Differences that are based in semantics can be insidious, for we can differ semantically without even realizing it. It is our shared understanding of word meanings that enables us to communicate. Unfortunately, our failure to define our terms clearly lies at the root of countless misunderstandings and a world of confusion.

Language requires definitions. Definitions and how they vary depending on context are central to semantics. We cannot communicate effectively unless those to whom we speak understand how we define our terms. Even in particular fields of study and practice, such as my field of data visualization, practitioners often fail to define even its core terms in ways that are shared. This leads to failed discussions, a great deal of confusion, and harm to the field.

The term “dashboard” has been one of the most confusing in data visualization since it came into common use about 15 years ago. If you’re familiar with my work, you know that I’ve lamented this problem and worked diligently to resolve it. In 2004, I wrote an article titled “Dashboard Confusion” that offered a working definition of the term. Here’s the definition that appeared in that article:

A dashboard is a visual display of the most important information needed to achieve one or more objectives that has been consolidated on a single computer screen so it can be monitored at a glance.

Over the years, I refined my original definition in various ways to create greater clarity and specificity. In my Dashboard Design course, in addition to the definition above, eventually I began to share the following revised definition as well:

A dashboard is a predominantly visual information display that people use to rapidly monitor current conditions that require a timely response to fulfill a specific role.

Primarily, I revised my original definition to emphasize that the information most in need of a dashboard—a rapid-monitoring display—is that which requires a timely response. Knowing what to display on a dashboard, rather than in other forms of information display, such as monthly reports, is one of the fundamental challenges of dashboard design.

Despite my steadfast efforts to promote clear guidelines for dashboard design, confusion persists because of the diverse and conflicting ways in which people define the term, some of which are downright nonsensical.

When Tableau Software first added the ability to combine multiple charts on a single screen in their product, I encouraged them to call it something other than a dashboard, knowing that doing so would contribute to the confusion. The folks at Tableau couldn’t resist, however, because the term “dashboard” was popular and therefore useful for marketing and sales. Unfortunately, if you call any display that combines multiple charts for whatever reason a dashboard, you can say relatively little about effective design practices. This is because designs, to be effective, must vary significantly based on how and for what purpose the information is used. For example, how we should design a display that’s used for rapidly monitoring—what I call a dashboard—is different in many ways from how we should design a display that’s used for exploratory data analysis.

To illustrate the ongoing prevalence of this problem, we don’t need to look any further than the most recent book of significance that’s been written about dashboards: The Big Book of Dashboards, by Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave. The fact that all three authors are avid users and advocates of Tableau Software is reflected in their definition of a dashboard and in the examples of so-called dashboards that appear in the book. These examples share nothing in common other than the fact that they include multiple charts.

When one of the authors told me about his plans for the book as he and his co-authors were just beginning to collect examples, I strongly advised that they define the term dashboard clearly and only include examples that fit that definition. They did include a definition in the book, but what they came up with did not address my concern. They apparently wanted their definition to describe something in particular—monitoring—but the free-ranging scope of their examples prevented them from doing so exclusively. Given this challenge, they wrote the following definition:

A dashboard is a visual display of data used to monitor conditions and/or facilitate understanding.

Do you see the problem? Stating that a dashboard is used for monitoring conditions is specific. So far, so good. Had they completed the sentence with “and facilitate understanding,” the definition would have remained specific, but they didn’t. The problem is their inclusion of the hybrid conjunction: “and/or.” Because of the “and/or,” according to their definition a dashboard is any visual display whatsoever, so long as it supports monitoring or facilitates understanding. In other words, any display that 1) supports monitoring but doesn’t facilitate understanding, 2) facilitates understanding but doesn’t support monitoring, or 3) both supports monitoring and facilitates understanding, is a dashboard. Monitoring displays, analytical displays, simple lookup reports, even infographics, are all dashboards, as long as they either support monitoring or facilitate understanding. As such, the definition is all-inclusive to the point of uselessness.

Only 2 of the 28 examples of displays that appear in the book qualify as rapid-monitoring displays. The other 26 might be useful for facilitating understanding, but by including displays that share nothing in common except that they are all visual and include multiple charts, the authors undermined their own ability to teach anything that is specific to dashboard design. They provided useful bits of advice in the book, but they also added to the confusion that exists about dashboards and dashboard design.

In all disciplines and all aspects of life, as well, we need clarity in communication. As such, we need clearly defined terms. Using terms loosely creates confusion. It’s not just a matter of semantics. Semantics matter.

Take care,

New Book: Big Data, Big Dupe

December 6th, 2017

I’ve written a new book, titled Big Data, Big Dupe, which will be published on February 1, 2018.

As the title suggests, it is an exposé on Big Data—one that is long overdue. To give you an idea of the content, here’s the text that will appear on the book’s back cover:

Big Data, Big Dupe is a little book about a big bunch of nonsense. The story of David and Goliath inspires us to hope that something little, when armed with truth, can topple something big that is a lie. This is the author’s hope. While others have written about the dangers of Big Data, Stephen Few reveals the deceit that belies its illusory nature. If “data is the new oil,” Big Data is the new snake oil. It isn’t real. It’s a marketing campaign that has distracted us for years from the real and important work of deriving value from data.

Here’s the table of contents:

As you can see, unlike my four other books, this is not about data visualization, but it is definitely relevant to all of us who are involved in data sensemaking. If the nonsense of Big Data is making your work difficult and hurting your organization, this is a book that you might want to leave on the desks of your CEO and CIO. It’s short enough that they might actually read it.

Big Data, Big Dupe is now available for pre-order.

Take care,