Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.


The Data Loom Is Now Available!

May 16th, 2019

After a few months of waiting, my new book The Data Loom: Weaving Understanding by Thinking Critically and Scientifically with Data is now available. By clicking on the image below, you can order it for immediate delivery from Amazon.

Data, in and of itself, is not valuable. It only becomes valuable when we make sense of it. Unfortunately, most of us who are responsible for making sense of data have never been trained in two of the job’s most essentially thinking skillsets: critical thinking and scientific thinking. The Data Loom does something that no other book does—it covers the basic concepts and practices of both critical thinking and scientific thinking and does so in a way that is tailored to the needs of data sensemakers. If you’ve never been trained in these essential thinking skills, you owe it to yourself and your organization to read this book. This simple book will bring clarity and direction to your thinking.

Turn Up the Signal; Turn Off the Noise

April 21st, 2019

To thoroughly, accurately, and clearly inform, we must identify the intended signal and then boost it while eliminating as much noise as possible. This certainly applies to data visualization, which unfortunately lends itself to a great deal of noise if we’re not careful and skilled. The signal in a stream of content is the intended message, the information we want people to understand. Noise is everything that isn’t signal, with one exception: non-signal content that somehow manages to boost the signal without compromising it in any way is not noise. For example, if we add nonessential elements or attributes to a data visualization to draw the reader’s attention to the message, thus boosting it, without reducing or altering the message in any way, we haven’t introduced noise. No accurate item of data, in and of itself, always qualifies either as a signal or noise. It always depends on the circumstances.

In physics, the signal-to-noise ratio, which is where the concept originated, is an expression of odds: the ratio of the one possible outcome to another. When comparing signal to noise, we want the odds to dramatically favor the signal. Which odds qualify as favorable varies, depending on the situation. When communicating information to someone, a signal-to-noise ratio of 99 to 1 would usually be considered favorable. When hoping to get into a particular college, however, 3-to-1 odds might be considered favorable, but those odds would be dreadful in communication, for it would mean that 25% of the content was noise. Another ratio that is common in data communication, a probability ratio, is related to an odds ratio. Rather than comparing one outcome to other as we do with odds, however, a probability ratio compares a particular outcome to the total of all outcomes. For example, a probability ratio of 85 out of 100 (i.e., the outcome of interest will occur 85% of the time on average), is the mathematical equivalent of 85-to-15 odds. When Edward Tufte introduced the concept of the data-ink ratio back in the 1980s, he proposed a probability ratio rather than an odds ratio. He argued that the percentage of ink in a chart that displays data, when compared to the total ink, should be as close to 100% as possible.

Every choice that we make when creating a data visualization seeks to optimize the signal-to-noise ratio. We could argue that the signal-to-noise ratio is the most essential consideration in data visualization—the fundamental guide for all design decisions while creating a data visualization and the fundamental measure of success once it’s out there in the world.

It’s worth noting that particular content doesn’t qualify as noise simply because it’s inconvenient. Earlier, I said that a signal is the intended message, but let me qualify this further by pointing out that this assumes the message is truthful. In fact, the message itself is noise to the degree that it communicates misinformation, even if that misinformation is intentional. I’ve seen many examples of data visualizations that left out or misrepresented vital information because a clear understanding of the truth wasn’t the designer’s objective. I’ve also witnessed occasions when highly manipulated data replaced the actual data because it told a more convenient story—one that better supported an agenda. For example, a research paper that claims a strong relationship between two variables might refrain from revealing the actual data on which those claims were supposedly based in favor of a statistical model that replaced a great deal of volatility and uncertainty in the relationship, which could be seen in the actual data, with a perfectly smooth and seemingly certain portrayal of that relationship. On occasions when I’ve questioned researchers about this, I’ve been told that the volatility in the actual data was “just noise,” so they removed it. While they might argue that their smooth model illustrates the relationship in a simpler manner, I would argue that it over-simplifies the relationship if they only report the model without also revealing the actual data on which it was based. Seeing the actual data as well helps us keep in mind that statistical models are estimates, built on assumptions, which are never entirely true.

So, to recap, noise in communication, including data visualization, is content that isn’t part of and doesn’t support the intended message or content that isn’t truthful. Turn up the signal; turn off the noise.

Worthy of Your Attention

April 9th, 2019

I spend a great deal of time reading books. Many of them cover topics that are relevant to my work in data sensemaking and data visualization, and most of them are quite good, but only a few are extraordinary. The new book, How Attention Works: Finding Your Way in a World Full of Distraction, by Stefan van der Stigchel, definitely qualifies as extraordinary.

Stigchel is a professor in the Department of Experimental Psychology at Utrecht University in the Netherlands. Until recently, I taught annual data visualization workshops in Utrecht for several years. Had I known about Stigchel at the time, I would have definitely invited him out for a beer during one of my visits. His work is fascinating. This book focuses on a specific aspect of visual perception: visual attention—what it is, how it works, how it is limited, and how it has allowed the human species to progress beyond other species. It does so in a practical manner by explaining how an understanding of visual attention can improve all forms of information design.

I only know of one other author who has written practical works about visual perception with such clarity and insight: Colin Ware, Director of the Data Visualization Research Lab at the University of New Hampshire. It was from Ware’s two books—Visual Thinking for Design and Information Visualization: Perception for Design—that I learned much that I know about visual perception and its application to data visualization. Although Stigchel doesn’t address data visualization in particular, what he reveals about visual attention complements and, in some respects, extends what Ware covers in his books. Here’s an excerpt from the preface that will give you an idea of the book’s contents and intentions:

If you dig deeper into the subject of visual perception, you will quickly discover that we actually register very little of the visual world around us. We think that we see a detailed and stable world, but this is just an illusion created by the way in which our brains process visual information. This has important consequences for how we present information to others—especially attention architects.

Everyone whose job involves guiding people’s attention, like website designers, teachers, traffic engineers, and, of course, advertising agents, could be given the title of “attention architect.” Such individuals know that simply presenting a visual message is never enough. Attention architects need to be able to guide our attention to get the message across…Whoever can influence our attention has the power to allow information to reach us or, conversely, to ensure that we do not receive that information at all.

Everyone who visualizes data and presents the results to others is an attention architect…or should be. To visualize data effectively, you must learn how to draw people’s attention to those parts of the display that matter and to prevent the inclusion of anything that potentially distracts attention from the message. You can only do this to the degree that you understand how our brains manage visual attention, both consciously and unconsciously. Reading this book is a good start.

Minimally Viable Data Visualization

May 22nd, 2018

I received an email a few days ago from the founder and CEO of a new analytics software company that led to an interesting revelation. In his email, this fellow thanked me for sharing my insights regarding data visualization and shared that he has acquired several of my books, which are “nearing the top” of his queue. He went on to provide a link to his website where I could see his attempts to incorporate visual analytics into his product. After taking a quick look at his website and noting its poor data visualization practices, I wrote him back and suggested that he make time to read my books soon. It was in his subsequent response that he revealed what I found most interesting. In response to my concern about the poor data visualization practices that I observed on his website he wrote, “The site content has been delivered with a minimally viable product mindset.” My jaw hit the floor.

This fellow apparently misunderstands the concept of a minimal viable product (MVP). According to Wikipedia, “a minimal viable product is a product with just enough features to satisfy early customers, and to provide feedback for future product development.” When you initially introduce a new product, it doesn’t make sense to address every possible feature. Instead, it usually makes sense to provide enough features to make the product useful and put it on a trajectory, through feedback from customers, to become in time a product that is fully viable.

This misunderstanding reminds me of the way that product companies have sometimes misapplied the Pareto Principle (a.k.a., the 80/20 rule). Years ago when I worked for a business intelligence software company, it was common practice for managers in that company to encourage designers and developers to create products that only satisfied 80% of the customers’ needs, which they justified as the 80/20 rule. This has nothing to do with the Vilfredo Pareto’s observation that 80% of the property in Italy was owned by 20% of the people in Italy, a ratio that he went on to observe in the relative distribution of several other things as well. Pareto never promoted this ratio as a goal. It’s amazing how concepts and principles can be perverted in silly and harmful ways.

The concern that I expressed to this fellow about his fledgling product was not a lack in the number of features but a lack in the quality of the features that he included. Shooting for minimally viable quality is not a rational, ethical, or productive goal.

My exchange with this fellow continued. I pointed out that “the analytics space is filled with minimally viable products.” This was not a compliment. To this, however, he enthusiastically responded:

Certainly, agreed – which is one reason we believe we can be successful. I’m using MVP in the context of product development; the quicker we deliver functional capabilities the more quickly we receive feedback and iterate through enhancements. In terms of mature client solutions we stand for, and strive to deliver, an exceptional standard of quality – rare in the analytics space.

The notion that quick iterations can make up for sloppy and inexpert development is nonsense, but this philosophy has nevertheless become enshrined in many software companies. Is it any wonder that most analytics products function so poorly?

There is absolutely no justification for producing an analytics application that at any stage during the development process chooses inappropriate data visualizations and designs them poorly. Best practices can be incorporated into each stage of development process without undue or wasted effort. Not only are ineffective data visualization practices at any stage in the process inexcusable, they do harm, for they expose and thereby promote those bad practices.

This fellow used the “minimally viable product mindset” as a justification for the fact that his team doesn’t understand data visualization. This is all too familiar. To complete the story, here is my final response to this fellow’s mindset:

You are not exhibiting the “exceptional standard of quality” that you claim as your goal. Every single player in the analytics space claims to strive for “exceptional quality,” but none exhibit a genuine commitment to this goal. To seriously strive for this goal, you must develop the required expertise before beginning to develop solutions. Slow down and take time to get it right. The world doesn’t need any more “minimally viable” products.

What are the chances that he will accept and follow my advice? My experience suggests that odds aren’t good, but I’d be happy for this fellow to become an outlier. We don’t need more bad analytics products. A few that are well designed are all that we need.

So Far, VR-Enabled Data Visuailzation is Nonsense

May 14th, 2018

Few data technologies are subject to more hype these days than VR-enabled data visualization. I have never seen a single example that adds value and therefore makes sense. Those who promote it don’t base their claims on actual evidence that it works. Instead, they tend to spout a lot of misinformation about visual perception and cognition. Those who have actually taken the time to study visual perception and cognition could take each of these claims apart with ease. VR has the cool factor going for it and vendors are capitalizing on this fact.

VR certainly has its applications. Data visualization just doesn’t seem to be one of them and it’s unlikely that this will change. If it does at some point in the future, I’ll gladly embrace it. Navigating physical reality in a virtual, computer-generated manner can indeed be useful. I recently visited the beautiful medieval town of Cesky Krumlov in the Czech Republic near the Austrian border. I could have relied solely on photographs and descriptions in a guide book, but walking in the midst of that old city, experiencing it directly with my own senses, enhanced the experience. Had I not been able to visit it personally, a VR tour of Cesky Krumlov could have provided a richer experience than photographs and words alone. Data visualizations, however, display abstract data, not physical reality, such as a city. There is no advantage that we have discovered so far, either perceptual or cognitive, to flying around inside a VR version of the kind of abstract data that we display in data visualizations. We can see and make sense of the data more effectively using 2-D or, on rare occasions, 3-D displays projected onto a flat plane (e.g., a screen) without donning a VR headset.

I was prompted to write this blog post by a recent article titled “Data visualization in mixed reality can unlock big data’s potential,” by Amir Bozorgzahed. This fellow is the cofounder and CEO of Virtuleap and host of the Global WebXR Hackathon, which puts his interest in perspective. The article quotes several software executives who have VR products to sell, and the claims that they make are misleading. They take advantage of the gullibility of people who are already susceptible to the allure of technological hyperbole that goes by such names as VR, Big Data, AI, and self-service analytics. They market their VR-enabled data visualization tools as techno-magical—capable of turning anyone into a skilled data analyst without an ounce of training, except in the use of their VR tools.

Let’s examine a few of the claims made in the article, beginning with the following:

The tech enables not only enterprises and organizations, but anyone, to use their spatial intelligence to spot patterns and make connections that breakthrough the tangled clutter of big data in a way that has been out of reach even with traditional 2D analytics.

“Anyone” can use their “spatial intelligence to spot patterns and make connections.” Wow, this is truly magical and downright absurd. While it is true that spatial perception is built into our brains, it is not true that we can use this ability to make sense of abstract data without having developed an array of data sensemaking skills.

The self-service claims of VR data visualization can get even more outlandish. Consider the following excerpt from the article, which describes WebVR’s “forthcoming seismic-upgrade:”

In fact, their platform wasn’t designed to cater to just highly-trained data scientists, but for anyone with a stake in the game. In the not so distant future, I picture the average Joe or Jane regularly making use of their spatial intelligence to slice and dice big data of any kind, because everyone has the basic skill-sets required to play Sherlock Holmes in mixed reality. All they need to get started is access to big data sets, which I also foresee as being more prevalent not too long from now.

Amazing! I suppose it’s true that everyone can “play” Sherlock Holmes, but playing at it is quite different from sleuthing with skill.

Here’s an example of a VR data visualization that was included in the article:

First of all, you don’t need VR to view data in this manner. At this moment you’re viewing this example on a screen or printed page. You do need VR hardware and software, however, to virtually place yourself in the middle of a 3-D scatter plot and fly around in it, but this wouldn’t make the data more accessible, perceptible, or understandable. Viewing the data laid out in front of us makes it easier to find and make sense of the meaningful patterns that exist within.

The spatial perception that is built into the human brain can indeed be leveraged, using data visualization, to make sense of data. It is not true, however, that it can do so independent of a host of other hard-won skills. Here’s another similar excerpt from the article:

Pattern recognition is an inherent talent that we all possess; the evolutionary edge that sets us apart from the animal kingdom. So, it’s not so much that immersive data visualization unlocks big data but, rather, that it allows us to interact with big data in a way that is natural for us.

This is quite misleading. Other animals also have tremendously good pattern recognition abilities built into their brains, in many cases much better than ours. What sets humans apart in regards to pattern recognition is our ability to reason about patterns in abstract ways, sometimes called patternicity. This is both a blessing and a curse, however, for we can and often do see patterns that are entirely meaningless. We are prolific meaning generators, but separating valid from illusory meanings requires a rich set of data sensemaking skills. No tool, including and perhaps especially VR, will replace the need for these skills.

Here’s another visualization that’s featured in the article:

The caption describes this as “a volatile blockchain market.” What is the claim?

The Bitcoin blockchain in particular pushes the limits of traditional data visualization technology, as its support for transactions involving multiple payers and multiple payees and high transactional volume would create an incomprehensible jumble of overlapping points on any two-dimensional viewer.

Let’s think about this for a moment. If we view a forest from the outside, it appears as a “jumble” of trees. Due to occlusion, we can’t see each of the trees. If we walk into that forest, we can examine individual trees, but we lose sight of the forest. This is a fundamental problem that we often face when trying to visualize a large and complex data set. We typically attempt to resolve this challenge by finding ways to visualize subsets of data while simultaneously viewing how those subsets fit into the larger context of the whole. A traditional data visualization approach to this problem involves the use of concurrent “focus+context” displays to keep from getting lost in the forest while focusing on the trees. Nothing about VR helps us resolve this challenge. In fact, compared to a screen-based display, VR just makes it easier to get lost in the forest.

Here’s the ultimate expression of nonsense that I encountered at the end of Bozorgzahed’s article:

We have reached a point in time where much of the vast digital landscape of data can be now rendered into visual expressions that, paired up with artificial intelligence, can be readily deciphered and understood by anyone with simply the interest to mine big data. And all this because the underlying tech has become advanced enough to finally align with how we visually process the world.

Notice the abundant sprinkling of buzzwords in this final bit of marketing. When you combine data visualization with VR, AI, and Big Data you have a magic trick as impressive as anything that David Copperfield could pull off on a Las Vegas stage, but one that is just as much an illusion.

I will continue saying what I have said before too many times to count: data sensemaking requires skills that must be learned. No tool will replace the need for these skills. It’s time that we accept the unpopular truth that data sensemaking requires a great deal of training and effort. There are no magic bullets, including VR.