Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

The Incompatible Marriage of Data Visualization and VR

August 19th, 2019

Every once in a while, someone claims that data visualization can be enhanced when viewed in virtual reality (e.g., by wearing a VR headset). Not once, however, has anyone demonstrated any real benefits. If you think about it for a moment, this isn’t surprising. How could viewing a chart in VR possibly work any better than viewing it on a flat screen? The chart would be the same and VR doesn’t alter visual perception; it merely gives us the ability to navigate through a virtual world. Whether viewing the real world (including a flat screen) or a virtual world, our eyes work the same. VR is useful for some applications, but apparently not for data visualization.

VR only gives us a different way of changing the perspective from which we view a chart if the chart is three dimensional. If the chart is two dimensional, whether viewing the chart on a flat screen in the real world or in the virtual world, we would view it straight on. Viewing it from any other perspective (e.g., from the side or behind) would always be less useful than straight on. If the chart is three dimensional, however, in addition to moving the chart around to view it from different perspectives as we would when using a flat screen (e.g., by rotating it), in VR we could also virtually move ourselves around the chart to change perspectives, such as by virtually walking around to view it from behind. Does this offer an advantage? It does not. What we see once we shift perspectives is the same, no matter how we get there.

VR does offer one other navigational possibility. While wearing a VR headset, we could virtually position ourselves within the chart, among the data. Imagine yourself in the midst of a 3-D scatter plot, with data points all around you. Would this offer a better view? Quite the opposite, actually. It would cause us to become metaphorically lost in the forest, among the trees. How much of the data could we see if we were located in the midst of it? Very little at any one time. To see it all, we would need to turn back and forth to see only bits of it at a time. Much of a chart’s power is derived from seeing all of the data at once. What might seem cool about VR navigation on the surface would be dysfunctional in actual practice when viewing data visualizations.

Before moving on, I should mention the general uselessness in all but rare cases of 3-D charts that encode a variable along the Z axis. The Z axis simulates depth perception, but unlike 2-D positions along the X axis (horizontal positions) and the Y axis (vertical positions), which human visual perception handles exceptionally well, our perception of depth is relatively poor. It is for this reason that vision scientists sometimes refer to human visual perception as 2.5 dimensional rather than 3 dimensional. 3-D charts suffer from many problems, which is why the best data visualization products avoid them altogether. Viewing 3-D charts in VR solves none of these problems.

I was prompted to write about this in response to a recent press release about a new product named Immersion Analytics by the company Virtual Cove. The company claims to provide several patents-pending features that enhance data visualization through VR. When I read the press release, being the suspicious guy that I am, I suspected yet another false claim about the benefits of VR, but I was more than willing to take a look. What I found, as expected, was pure nonsense. I’ve examined every example of an Immersion Analytics data visualization that I could find and observed nothing that would work any better when viewed in VR rather than on a flat screen. During a promotional presentation that’s available on YouTube, the company’s founder, Bob Levy, who “invented” the product, listed three visual attributes, as examples, that we can supposedly view more effectively in VR: Z position, glow, and translucency. I’ve already explained how Z position is just as useless in VR as it is on a flat screen, but what about the other two attributes? By glow, Levy is referring to a halo effect around an object (e.g., around a bubble in a 3-D bubble plot) that varies in brightness. You can see this effect in the example below.

Notice how poorly this effect works as a means of encoding quantitative values. Can you determine any of the values? I certainly can’t. Nor can we compare halo intensities to a useful degree. How could this possibly work any better in VR? VR doesn’t enhance visual perception. Our eyes work the same whether we view a chart on a flat screen or in VR. The remaining attribute—translucency—is no different. What Levy means by translucency (a.k.a., transparency) is the ability to see through an object, like looking through glass. Varying the degree to which something is translucent is also illustrated in the example above: the bubbles are translucent to varying degrees. Can we decode the values represented by their translucency? We cannot. Can we compare the varying degrees to which bubbles are translucent? Not well enough for data sensemaking. During the presentation, Levy claimed that if we could view this chart while wearing a VR headset rather than on a flat screen, translucency would work much better. That is a false claim. Our perception of translucency would not be changed by VR, and it certainly wouldn’t be enhanced.

Based on the examples that I reviewed, my suspicions about the product’s claims seemed justified. When I contacted Andrew Shepherd, Virtual Cove’s VP of Strategic Growth who functions as their media contact, to ask several questions about the product, I honestly admitted my skepticism about their claims. In response, he wrote, “I know it could be a long shot, but it would be a thrill to convert you from a skeptic into a true believer.” Definitely a long shot, but I nevertheless offered to examine the product in VR if they would loan me the necessary equipment. Perhaps you won’t be surprised to hear that they don’t have any VR headsets available to loan to skeptics. Faced with no possible way to evaluate their VR claims directly, I asked Shepherd a simple question: “What can be seen in VR that cannot be seen just as well on a flat screen of equal resolution?” I’m still waiting for an answer. I’ve found that Shepherd quickly responds to questions that he likes, but persistently ignores those that are inconvenient. I am still quite willing to be surprised by VR-enhanced capabilities that contradict everything I know about visual perception, but I’m not holding my breath.

If a vendor tries to sell you VR data visualization software, I suggest that you either ignore them altogether or do what I’ve done—ask them to justify their claims with clear explanations and actual evidence—then be prepared to wait a very long time.

Ethical Data Sensemaking

July 22nd, 2019

Simply stated, data sensemaking is what we do to make sense of data. We do this in an attempt to understand the world, based on empirical evidence. Those who work to make sense of data and communicate their findings are data sensemakers. Data sensemaking, as a profession, is currently associated with several job titles, including data analyst, business intelligence professional, statistician, and data scientist. Helping people understand the world based on data is important work. Without understanding, we often make bad decisions. When done well, data sensemaking requires a broad and deep set of skills and a commitment to ethical conduct. When data sensemaking professionals fail to do their jobs well, whether through a lack of skills or other ethical misconduct, confusion and misinformation results, which encourages bad decisions—decisions that do harm. Making sense of data is not ethically or morally neutral; it can be done for good or ill. “I did what I was told” is not a valid excuse for unethical behavior.

In recent years, misuses of data have led to a great deal of discussion about ethics related to invasions of privacy and discriminatory uses of data. Most of these discussions focus on the creation and use of analytical algorithms. I’d like to extend the list of ethical considerations to address the full range of data sensemaking activities. The list of ethical practices that I’m proposing below is neither complete nor sufficiently organized nor fully described. I offer it only as an initial effort that we can discuss, expand, and clarify. Once we’ve done that, we can circle back and refine the work.

The ethical practices that can serve as a code of conduct for data sensemaking professionals are, in my opinion, built upon a single fundamental principle. It is the same principle that medical doctors swear as an oath before becoming licensed: Do no harm.

Here’s the list:

  1. You should work, not just to provide information, but to enable understanding that can be used in beneficial ways.
  2. You should develop the full range of skills that are needed to do the work of data sensemaking effectively. Training in a data analysis tool is not sufficient. This suggests the need for an agreed-upon set of skills for data sensemaking.
  3. You should understand the relevant domain. For instance, if you’re doing sales analysis, you should understand the sales process as well as the sales objectives of your organization. When you don’t understand the domain well enough, you must involve those who do.
  4. You should know your audience (i.e., your clients; those who are asking you to do the work)—their interests, beliefs, values, assumptions, biases, and objectives—in part to identify potentially unethical inclinations.
  5. You should understand the purpose for which your work will be used. In other words, you should ask “Why?”.
  6. You should strive to anticipate the ways in which your findings could be used for harm.
  7. When asked to do something harmful, you should say “No.” Furthermore, you should also discourage others from doing harm.
  8. When you discover harmful uses of data, you should challenge them, and if they persist, you should expose them to those who can potentially end them.
  9. You should primarily serve the needs of those who will be affected by your work, which is not necessarily those who have asked you to do the work.
  10. You should not examine data that you or your client have no right to examine. This includes data that is private, which you have not received explicit permission to examine. To do this, you must acquaint yourself with data privacy laws, but not limit yourself to concern only for data that has been legally deemed private if it seems reasonable that it should be considered private nonetheless.
  11. You should not do work that will result in the unfair and discriminatory treatment of particular groups of people based on race, ethnicity, gender, religion, age, etc.
  12. If you cannot enable the understanding that’s needed with the data that’s available, you should point this out, identify what’s needed, and do what you can to acquire it.
  13. If the quality of the data that’s available is insufficient for the data sensemaking task, you should point this out, describe what’s lacking, and insist that the data’s quality be improved to the level that’s required before proceeding.
  14. You should always examine data within context.
  15. You should always examine data from all potentially relevant perspectives.
  16. You should present your findings clearly.
  17. You should present your findings as comprehensively as necessary to enable the level of understanding that’s needed.
  18. You should present your findings truthfully.
  19. You should describe the uncertainty of your findings.
  20. You should report any limitations that might have had an effect on the validity of your findings.
  21. You should confirm that your audience understands your findings.
  22. You should solicit feedback during the data sensemaking process and invite others to critique your findings.
  23. You should document the steps that you took, including the statistics that you used, and maintain the data that you produced during the course of your work. This will make it possible for others to review your work and for you to reexamine your findings at a later date.
  24. When you’re asked to do work that doesn’t make sense or to do it in a way that doesn’t make sense (i.e., in ways that are ineffective), you should propose an alternative that does make sense and insist on it.
  25. When people telegraph what they expect you to find in the data, you should do your best to ignore those expectations or to subject them to scrutiny.
    As data sensemakers, we stand at the gates of understanding. Ethically, it is our job to serve as gatekeepers. In many cases, we will be the only defense against harm.

I invite you to propose additions to this list and to discuss the merits of the practices that I’ve proposed. If you are part of an organization that employs other data sensemakers, I also invite you to discuss the ethical dimensions of your work with one another.

The Inflated Role of Storytelling

July 14th, 2019

People increasingly claim that the best and perhaps only way to convince someone of something involves telling them a story. In his new book Ruined By Design—a book that I largely agree with and fully appreciate—designer Mike Monteiro says that “If you’re not persuading people, you’re not telling a good enough story.” Furthermore, “…while you should absolutely include the data in your approach, recognize that when you get to the point where you’re trying to persuade someone…, you need a story.” Really? Where’s the evidence for this claim? On what empirical research is it based? And what the hell is a story, anyway? Can you only persuade people by constructing a narrative—a presentation that has a beginning, middle, and end, with characters and plot, tension and resolution? In truth, stories are only one of several ways that we can persuade. In some cases, a simple photograph might do the trick. A gesture, such as a look of anger or a raised fist, sometimes works. A single sentence or a chart might do the job. Even a straightforward, unembellished presentation of the facts will sometimes work. The notion that stories are needed to convince people is itself a story—a myth—nothing more.

It reminds me of the silly notion that people only use 10% of their brains, which someone fabricated long ago from thin air and others have since quoted without ever checking the facts. This notion is absurd. If we used only 10% of our brains, the other 90% would wither and die. Stories are not the exclusive path to persuasion. Not everyone can be convinced in the same way and most people can be convinced in various ways, depending on the circumstances. While potentially powerful and useful, the role of stories is overblown.

One of the common errors that people sometimes make when promoting the power of stories is the notion that stories work because they appeal to emotions. For example, Monteiro wrote that “…people don’t make decisions based on data; they make them based on feelings.” This is the foundation for his rationale that stories are the only path to persuasion. Stories can certainly appeal to emotions, but stories can also present facts without any emotional content whatsoever. We all, no matter how rational, are subject to emotion, but not exclusively so. Stories structure information in narrative form and those narratives can appeal to emotions, to the rational mind, or both. In other words, saying that stories are powerful is not the same as saying that appeals to people’s feelings are powerful.

Don’t get me wrong, stories are great; they’re just not the panacea that many people now claim. The current emphasis on storytelling is a fad. In time, it will fade. In time, some of the people who promote stories to the exclusion of other forms of communication will look back with embarrassment. No matter what they claim, no one actually believes that only stories can convince people. No one exclusively uses stories to persuade. We all use multiple means and that’s as it should be. The sooner we get over this nonsense that only stories can persuade, the sooner we can get on to the real task of presenting truths that matter in all the ways that work.

The Data Loom Is Now Available!

May 16th, 2019

After a few months of waiting, my new book The Data Loom: Weaving Understanding by Thinking Critically and Scientifically with Data is now available. By clicking on the image below, you can order it for immediate delivery from Amazon.

Data, in and of itself, is not valuable. It only becomes valuable when we make sense of it. Unfortunately, most of us who are responsible for making sense of data have never been trained in two of the job’s most essentially thinking skillsets: critical thinking and scientific thinking. The Data Loom does something that no other book does—it covers the basic concepts and practices of both critical thinking and scientific thinking and does so in a way that is tailored to the needs of data sensemakers. If you’ve never been trained in these essential thinking skills, you owe it to yourself and your organization to read this book. This simple book will bring clarity and direction to your thinking.

Turn Up the Signal; Turn Off the Noise

April 21st, 2019

To thoroughly, accurately, and clearly inform, we must identify the intended signal and then boost it while eliminating as much noise as possible. This certainly applies to data visualization, which unfortunately lends itself to a great deal of noise if we’re not careful and skilled. The signal in a stream of content is the intended message, the information we want people to understand. Noise is everything that isn’t signal, with one exception: non-signal content that somehow manages to boost the signal without compromising it in any way is not noise. For example, if we add nonessential elements or attributes to a data visualization to draw the reader’s attention to the message, thus boosting it, without reducing or altering the message in any way, we haven’t introduced noise. No accurate item of data, in and of itself, always qualifies either as a signal or noise. It always depends on the circumstances.

In physics, the signal-to-noise ratio, which is where the concept originated, is an expression of odds: the ratio of the one possible outcome to another. When comparing signal to noise, we want the odds to dramatically favor the signal. Which odds qualify as favorable varies, depending on the situation. When communicating information to someone, a signal-to-noise ratio of 99 to 1 would usually be considered favorable. When hoping to get into a particular college, however, 3-to-1 odds might be considered favorable, but those odds would be dreadful in communication, for it would mean that 25% of the content was noise. Another ratio that is common in data communication, a probability ratio, is related to an odds ratio. Rather than comparing one outcome to other as we do with odds, however, a probability ratio compares a particular outcome to the total of all outcomes. For example, a probability ratio of 85 out of 100 (i.e., the outcome of interest will occur 85% of the time on average), is the mathematical equivalent of 85-to-15 odds. When Edward Tufte introduced the concept of the data-ink ratio back in the 1980s, he proposed a probability ratio rather than an odds ratio. He argued that the percentage of ink in a chart that displays data, when compared to the total ink, should be as close to 100% as possible.

Every choice that we make when creating a data visualization seeks to optimize the signal-to-noise ratio. We could argue that the signal-to-noise ratio is the most essential consideration in data visualization—the fundamental guide for all design decisions while creating a data visualization and the fundamental measure of success once it’s out there in the world.

It’s worth noting that particular content doesn’t qualify as noise simply because it’s inconvenient. Earlier, I said that a signal is the intended message, but let me qualify this further by pointing out that this assumes the message is truthful. In fact, the message itself is noise to the degree that it communicates misinformation, even if that misinformation is intentional. I’ve seen many examples of data visualizations that left out or misrepresented vital information because a clear understanding of the truth wasn’t the designer’s objective. I’ve also witnessed occasions when highly manipulated data replaced the actual data because it told a more convenient story—one that better supported an agenda. For example, a research paper that claims a strong relationship between two variables might refrain from revealing the actual data on which those claims were supposedly based in favor of a statistical model that replaced a great deal of volatility and uncertainty in the relationship, which could be seen in the actual data, with a perfectly smooth and seemingly certain portrayal of that relationship. On occasions when I’ve questioned researchers about this, I’ve been told that the volatility in the actual data was “just noise,” so they removed it. While they might argue that their smooth model illustrates the relationship in a simpler manner, I would argue that it over-simplifies the relationship if they only report the model without also revealing the actual data on which it was based. Seeing the actual data as well helps us keep in mind that statistical models are estimates, built on assumptions, which are never entirely true.

So, to recap, noise in communication, including data visualization, is content that isn’t part of and doesn’t support the intended message or content that isn’t truthful. Turn up the signal; turn off the noise.