Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Different Tools for Different Tasks

February 19th, 2018

I am often asked a version of the following question: “What data visualization product do you recommend?” My response is always the same: “That depends on what you do with data.” Tools differ significantly in their intentions, strengths, and weaknesses. No one tool does everything well. Truth be told, most tools do relatively little well.

I’m always taken by surprise when the folks who ask me for a recommendation fail to understand that I can’t recommend a tool without first understanding what they do with data. A fellow emailed this week to request a tool recommendation, and when I asked him to describe what he does with data, he responded by describing the general nature of the data that he works with (medical device quality data) and the amount of data that he typically accesses (“around 10k entries…across multiple product lines”). He didn’t actually answer my question, did he? I think this was, in part, because he and many others like him don’t think of what they do with data as consisting of different types of tasks. This is a fundamental oversight.

The nature of your data (marketing, sales, healthcare, education, etc.) has little bearing on the tool that’s needed. Even the quantity of data has relatively little effect on my tool recommendations unless you’re dealing with excessively large data sets. What you do with the data—the tasks that you perform and the purposes for which you perform them—is what matters most.

Your work might involve tasks that are somewhat unique to you, which should be taken into account when selecting a tool, but you also perform general categories of tasks that should be considered. Here are a few of those general categories:

  • Exploratory data analysis (Exploring data in a free-form manner, getting to know it in general, from multiple perspectives, and asking many questions to understand it)
  • Rapid performance monitoring (Maintaining awareness of what’s currently going on as reflected in a specific set of data to fulfill a particular role)
  • A routine set of specific analytical tasks (Analyzing the data in the same specific ways again and again)
  • Production report development (Preparing reports that will be used by others to lookup data that’s needed to do their jobs)
  • Dashboard development (Developing displays that others can use to rapidly monitor performance)
  • Presentation preparation (Preparing displays of data that will be presented in meetings or in custom reports)
  • Customized analytical application development (Developing applications that others will use to analyze data in the same specific ways again and again)

Tools that do a good job of supporting exploratory data analysis usually do a poor job of supporting the development of production reports and dashboards, which require fine control over the positioning and sizing of objects. Tools that provide the most flexibility and control often do so by using a programming interface, which cannot support the fluid interaction with data that is required for exploratory data analysis. Every tool specializes in what it can do well, assuming it can do anything well.

In addition to the types of tasks that we perform, we must also consider the level of sophistication to which we peform them. For example, of you engage in exploratory data analysis, the tool that I recommend would vary significantly depending on the depth of your data analysis skills. For instance, I wouldn’t recommend a complex statistical analysis product such as SAS JMP if you’re untrained in statistics, just as I wouldn’t recommend a general purpose tool such as Tableau Software if you’re well trained in statistics, except for performing statistically lightweight tasks.

Apart from the tasks that we perform and the level of skill with which we perform them, we must also consider the size of our wallet. Some products require a significant investment to get started, while others can be purchased for an individual user at little cost or even downloaded for free.

So, what tool do I recommend? It depends. Finding the right tool begins with a clear understanting of what you need to do with data and with your ability to do it.

Take care,

Introducing www.Stephen-Few.com

December 27th, 2017

I’ve ended my public Visual Business Intelligence Workshops and quarterly Visual Business Intelligence Newsletter, in part, to make time for other ventures. You have perhaps noticed that here, in my Perceptual Edge blog articles, I sometimes veer from data visualization to reflect my broader interests. In this blog, I’ve usually tried to my keep topics at least tangentially related to data sensemaking, but I now find this too confining. Going forward, I’d like to release the reins and write about any topic that might benefit from my perspective. Rather than expanding the scope of Perceptual Edge for this purpose, however, I’ve created a new website—www.Stephen-Few.com—as a venue for all of my other interests.

If you’ve found my work useful in the past, you might find the blog on my new website useful as well. I promise, I won’t waste your time with self-indulgent articles. Most of these articles will address the following topics:

  • Ethics, especially ethical approaches to the development and use of information technologies
  • Critical thinking
  • Effective communication
  • Brain science
  • Scientific thinking
  • Skepticism
  • Deep learning

I will feel free, however, to venture beyond these when so inspired.

When I write about data visualization or other aspects of data sensemaking, I’ll continue to post those articles here in my www.PerceptualEdge.com blog as well. Other articles, however, will only be posted in my www.Stephen-Few.com blog.

To launch the new website, I posted my first blog article there today titled Beware Incredible Technology-Enabled Futures. In it, I expose the frightening nonsense of a new TED talk titled “Three Steps to Surviving the Robot Revolution,” by “data philosopher” and “serial entrepreneur” Charles Radclyffe.

Take care,

There’s Nothing Mere About Semantics

December 13th, 2017

Disagreements and confusion are often characterized as mere matters of semantics. There is nothing “mere” about semantics, however. Differences that are based in semantics can be insidious, for we can differ semantically without even realizing it. It is our shared understanding of word meanings that enables us to communicate. Unfortunately, our failure to define our terms clearly lies at the root of countless misunderstandings and a world of confusion.

Language requires definitions. Definitions and how they vary depending on context are central to semantics. We cannot communicate effectively unless those to whom we speak understand how we define our terms. Even in particular fields of study and practice, such as my field of data visualization, practitioners often fail to define even its core terms in ways that are shared. This leads to failed discussions, a great deal of confusion, and harm to the field.

The term “dashboard” has been one of the most confusing in data visualization since it came into common use about 15 years ago. If you’re familiar with my work, you know that I’ve lamented this problem and worked diligently to resolve it. In 2004, I wrote an article titled “Dashboard Confusion” that offered a working definition of the term. Here’s the definition that appeared in that article:

A dashboard is a visual display of the most important information needed to achieve one or more objectives that has been consolidated on a single computer screen so it can be monitored at a glance.

Over the years, I refined my original definition in various ways to create greater clarity and specificity. In my Dashboard Design course, in addition to the definition above, eventually I began to share the following revised definition as well:

A dashboard is a predominantly visual information display that people use to rapidly monitor current conditions that require a timely response to fulfill a specific role.

Primarily, I revised my original definition to emphasize that the information most in need of a dashboard—a rapid-monitoring display—is that which requires a timely response. Knowing what to display on a dashboard, rather than in other forms of information display, such as monthly reports, is one of the fundamental challenges of dashboard design.

Despite my steadfast efforts to promote clear guidelines for dashboard design, confusion persists because of the diverse and conflicting ways in which people define the term, some of which are downright nonsensical.

When Tableau Software first added the ability to combine multiple charts on a single screen in their product, I encouraged them to call it something other than a dashboard, knowing that doing so would contribute to the confusion. The folks at Tableau couldn’t resist, however, because the term “dashboard” was popular and therefore useful for marketing and sales. Unfortunately, if you call any display that combines multiple charts for whatever reason a dashboard, you can say relatively little about effective design practices. This is because designs, to be effective, must vary significantly based on how and for what purpose the information is used. For example, how we should design a display that’s used for rapidly monitoring—what I call a dashboard—is different in many ways from how we should design a display that’s used for exploratory data analysis.

To illustrate the ongoing prevalence of this problem, we don’t need to look any further than the most recent book of significance that’s been written about dashboards: The Big Book of Dashboards, by Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave. The fact that all three authors are avid users and advocates of Tableau Software is reflected in their definition of a dashboard and in the examples of so-called dashboards that appear in the book. These examples share nothing in common other than the fact that they include multiple charts.

When one of the authors told me about his plans for the book as he and his co-authors were just beginning to collect examples, I strongly advised that they define the term dashboard clearly and only include examples that fit that definition. They did include a definition in the book, but what they came up with did not address my concern. They apparently wanted their definition to describe something in particular—monitoring—but the free-ranging scope of their examples prevented them from doing so exclusively. Given this challenge, they wrote the following definition:

A dashboard is a visual display of data used to monitor conditions and/or facilitate understanding.

Do you see the problem? Stating that a dashboard is used for monitoring conditions is specific. So far, so good. Had they completed the sentence with “and facilitate understanding,” the definition would have remained specific, but they didn’t. The problem is their inclusion of the hybrid conjunction: “and/or.” Because of the “and/or,” according to their definition a dashboard is any visual display whatsoever, so long as it supports monitoring or facilitates understanding. In other words, any display that 1) supports monitoring but doesn’t facilitate understanding, 2) facilitates understanding but doesn’t support monitoring, or 3) both supports monitoring and facilitates understanding, is a dashboard. Monitoring displays, analytical displays, simple lookup reports, even infographics, are all dashboards, as long as they either support monitoring or facilitate understanding. As such, the definition is all-inclusive to the point of uselessness.

Only 2 of the 28 examples of displays that appear in the book qualify as rapid-monitoring displays. The other 26 might be useful for facilitating understanding, but by including displays that share nothing in common except that they are all visual and include multiple charts, the authors undermined their own ability to teach anything that is specific to dashboard design. They provided useful bits of advice in the book, but they also added to the confusion that exists about dashboards and dashboard design.

In all disciplines and all aspects of life, as well, we need clarity in communication. As such, we need clearly defined terms. Using terms loosely creates confusion. It’s not just a matter of semantics. Semantics matter.

Take care,

New Book: Big Data, Big Dupe

December 6th, 2017

I’ve written a new book, titled Big Data, Big Dupe, which will be published on February 1, 2018.

As the title suggests, it is an exposé on Big Data—one that is long overdue. To give you an idea of the content, here’s the text that will appear on the book’s back cover:

Big Data, Big Dupe is a little book about a big bunch of nonsense. The story of David and Goliath inspires us to hope that something little, when armed with truth, can topple something big that is a lie. This is the author’s hope. While others have written about the dangers of Big Data, Stephen Few reveals the deceit that belies its illusory nature. If “data is the new oil,” Big Data is the new snake oil. It isn’t real. It’s a marketing campaign that has distracted us for years from the real and important work of deriving value from data.

Here’s the table of contents:

As you can see, unlike my four other books, this is not about data visualization, but it is definitely relevant to all of us who are involved in data sensemaking. If the nonsense of Big Data is making your work difficult and hurting your organization, this is a book that you might want to leave on the desks of your CEO and CIO. It’s short enough that they might actually read it.

Big Data, Big Dupe is now available for pre-order.

Take care,

Researchers — Share Your Data!

November 13th, 2017

One of the most popular shows in the early years of television was hosted by Art Linkletter, which included a segment called “Kids say the darndest things.” Linkletter would have conversations with young children who could be counted on to say things that adults found entertaining. I’ve experienced my own version of this in recent years that could be described as “Researchers say the darndest things.” My conversations with the authors of data visualization research studies have often featured shocking statements that would be amusing if they weren’t so potentially harmful.

The most recent example occurred in email correspondence with the lead author of a study titled “Evaluating the Impact of Binning 2D Scalar Fields.” I’m currently working on a newsletter article about binned versus continuous color scales in data visualization, so this paper interested me. After reading the paper, however, I had a few questions, so I contacted the author. One of my requests was, “I would like to see the full data set that you collected during the experiment.” Here’s the response that I received from the paper’s author: “In psychology, we do not share data sets but the full analyses are available in the supplementary materials.” You can imagine my shock and dismay. Researchers say the darndest things!

Withholding the data that was collected in a research study—the data on which the published findings and claims were based—subverts the essential nature and goals of science. Published research studies should be accompanied by the data sets on which their findings were based—always. The data should be made readily available to anyone who is interested, just as “supplemental materials” are often made available.

Only good can result from sharing our research data. If we share our data, our results can be confirmed. If we share our data, errors in our work can be identified and corrected. If we share our data, science can progress.

Empirical research is based on data. We make observations, usually in the form of measurements, which serve as the data sets on which our findings are based. Only by reviewing our data can the validity of empirical research be confirmed or denied by the research community. Only by sharing our data can questions about our findings be pursued by those who are interested. Refusing to share our data is the antithesis of science.

The author’s claim that, “In psychology, we do not share our data” is false. Psychology researchers do not have a “Do not share your data” policy. I’m astounded that the author thought that I’d buy this absurd claim. What is true, however, is that, even though there is no policy that research data should not be shared, it usually isn’t. On many occasions this is not an overt act of omission, but a mere act of laziness. The data files that researchers use are often messy and they don’t want the bother of structuring and labeling those files in a manner that would make them useful if shared. On more than one occasion I have requested data files only to be told that it would take too much time to put them into a form that could be shared. This response always makes me wonder if the messiness of those files might have caused the researchers themselves to make errors during their analysis of the data. When I told a respected psychology researcher friend of mine about the “In psychology, we don’t share our data” response that I received from the study’s author, he told me, “In my experience, extreme protectiveness about data tends to correlate with work that is not stellar in quality.” I suspect that this is true.

If you can’t make your research data available, either on some public medium (e.g., accessible as a download from a web page) or upon request, you’d better have a really good excuse. You could try the old standby “My dog ate it,” but it probably won’t work any better than it did when you were in elementary school. If your excuse is, “After doing my analysis and writing my paper, I somehow misplaced the data,” the powers that be (e.g., your university or the publication that made your study public) should respond by saying, “Do it over.”

If I could set the standards for research, I would require that the data be examined during the peer review process. It isn’t necessary that every reviewer examine the data, but at least one who is qualified to detect errors should. Among other potential problems, calculations performed on the data should be checked and it should be determined if statistics have been properly used. Checking the data should be fundamental to the peer review process. If this were done, some of the poor research that wastes our time each year with shoddy work and false claims would remain unpublished. I realize that this would complicate the process. Well, guess what, good research takes time and effort. Doing it well is hard work.

If you want to keep your data private, then do the world a favor and keep your research private as well. It isn’t valid research unless your findings are subject to review, and your findings cannot be fully reviewed without the data.

Take care,