Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

Is there a Role for Natural Language Processing in Data Visualization?

December 6th, 2016

Several software vendors are integrating natural language processing (NLP) into data visualization tools these days, which should cause us to question the merits of this feature. In most cases, NLP is being used as an input interface—a way to specify what you would like to see—but some vendors are now proposing a reverse application of NLP as an output interface to express in words what already appears in a data visualization. In my opinion, NLP has limited usefulness in the context of data visualization. It is one of those features that vendors love to tout for the cool factor alone.

We express ourselves and interact with the world through multiple modes of communication, primarily through verbal language (i.e., spoken and written words), visual images, and physical gestures (i.e., movements of the body). These modes are not interchangeable. Each exists because different types of information are best expressed using specific modes. Even if you consider yourself a “word” person, you can only communicate some information effectively using images, and vice versa. Similarly, sometimes a subtle lift of the brow can say what we wish in a way that neither words nor pictures could ever equal. We don’t communicate effectively if we stick with the mode that we prefer when a different mode is better suited for the task.

NLP is computer-processed verbal language. Are words an appropriate means to specify what you want to see in data (i.e., input) or to explain what has already been expressed in images (i.e., output)? Let’s consider this.

First, we’ll begin with the usefulness of NLP as a means of input. Let’s sneak up on this topic by first recognizing that words are not always the most effective or efficient means of input. Just because you can get a computer to process words as a means of input doesn’t mean that it’s useful to do so. Would you use words to drive your car? (Please note that I’m not talking about the brief input that you would provide a self-driving car.) The commands that we issue to our cars to tell them where and how fast to go are best handled through a manual interface—one that today involves movements of our hands and feet. We could never equal with words what we can communicate to our cars immediately and precisely with simple movements. This is but one of many examples of situations that are best suited to physical gestures as the means of input. So, are words ever an appropriate means to specify what you’d like to see in data? Rarely, at best. NLP would only be useful as a means of input either in situations when the data visualization tool that you’re using has a horribly designed interface but a well-designed NLP interface (this tool doesn’t exist) or when you need to use a tool but have not yet learned its interface.

The second situation above corresponds to the “self-service” business intelligence or data analysis model that software vendors love to promote but can never actually provide. You cannot effectively make sense of data without first developing a basic set of data analysis skills. If you’ve already developed this basic set of skills, you would never choose NLP as your means of input, for a well-designed interface that you manipulate using manual gestures will almost always be more efficient and precise. Consequently, the only time that NLP is useful as a data visualization input interface is when people with no analytical skills want to view data. For example, a CEO could type or say “Show me sales revenues in U.S. dollars for the last year by product and by month” and the tool could potentially produce a line graph that the CEO could successfully read. Simple input such as this could certainly be handled by NLP. Chances are, however, that the simple requests that this CEO makes of data are already handled by predesigned reports that are readily available. Most likely, what the CEO would like to request using words would be something more complex, which NLP would not handle very well, and even if it could, the CEO might misunderstand once the results are displayed due to a lack of statistical knowledge. It isn’t useful to enable people to request visualizations that they cannot understand.

Now let’s consider the use of NLP as a means of expressing in words what appears in a data visualization. When properly done, we visualize data to present information that cannot be expressed at all or as well using words or numbers. For example, we visualize data to reveal patterns or to make rapid comparisons, which could never be done based solely on words or statistics. If the information can only be properly understood when expressed visually, using NLP to decipher the visualization and attempt to put it into words makes no sense. The only possible situation that I can imagine when this would provide any value at all would be for people who are visually impaired, rendering them unable to see the visualization. The caveat in this case, however, is the fact that words would never provide for someone who is visually impaired what an image could provide if the person could see. So, however cool it might seem when a vendor claims to apply NLP for this purpose, it’s a silly feature without substance.

You might argue, however, that NLP algorithms could be used to supplement a visualization by providing a narrative explanation, much as a presenter might explain the primary message of a chart and point out specific features of interest. Do you really believe that software developers can write computer algorithms that successfully supplement data visualizations in this manner, without human intervention? I suspect that only simple charts could be successfully interpreted using algorithms today.

This is not a topic that I’ve explored extensively, so it is certainly possible that I’m missing potential uses of NLP. If you believe that there is more to this than I’m seeing, please let me know. I will gladly revise my position based on good evidence.

Take care,

Signature

When More is Less: Quantitative Numbing

November 23rd, 2016

There is an effect, which I will call “quantitative numbing,” that results in greater quantities of things that concern us producing less rather than more concern. This is an expression of “psychic numbing,” which is well documented. This phenomenon is chillingly described in a quote that is probably misattributed to Joseph Stalin, “A single death is a tragedy; a million deaths is a statistic.” According to Paul Slovic and Daniel Västfjäll, in a chapter titled “The More Who Die, the Less We Care” that appears in the book Numbers and Nerves:

There is considerable evidence that our affective responses and the resulting value we place on saving human lives follow the same sort of psychophysical function that characterizes our diminished sensitivity to changes in a wide range of perceptual and cognitive entities—brightness, loudness, heaviness, and wealth—as their underlying magnitudes increase.

 As psychological research indicates, constant increases in the magnitude of a stimulus typically evoke smaller and smaller changes in response. Applying this principle to the valuing of human life suggests that a form of psychophysical numbing may result from our inability to appreciate losses of life as they become larger.

(Numbers and Nerves: Information, Emotion, and Meaning in a World of Data, Scott Slovic and Paul Slovic, editors, 2015, page 31)

This effect exhibits a logarithmic pattern, with larger numbers resulting in a diminishing response. Slovic and Västfjäll go on to point out, however, that feelings of compassion suffer an even greater loss as quantities increase. In fact, increases in the growth of compassion are not merely reduced as quantities increase, but compassion actually decreases and does so quite dramatically.

For example, studies have shown that charitable contributions in response to appeals are greatest when intended to help a single individual, are reduced somewhat when helping two, and decrease at a greater and greater rate as the numbers grow. Beyond a certain threshold, we stop giving altogether, even to save lives.

Our moral intuitions often seduce us into calmly turning away from massive losses of human lives, when we should be driven by outrage to act. This is no small weakness in our moral compass. (ibid, page 35)

This is a tragic response that we must learn to overcome, but it is difficult for it is built into our brains.

This phenomenon of more producing less is not limited to compassion. It seems to broadly apply to our greatest concerns. Current events involving president-elect Donald Trump have made me particularly sensitive to quantitative numbing. Because he exhibits so many examples of bad behavior, those behaviors are having relatively little impact on us. The sheer number of incidents creates a numbing effect. Any one of Trump’s greedy, racist, sexist, vulgar, discriminatory, anti-intellectual, and dishonest acts, if considered alone, would concern us more than the huge number of examples that now confront us. The larger the number, the lesser the impact, because increases in quantity inoculate us against the effects. This tendency is built into our brains; it is automatic, immediate, and unconscious.

The cause of this psychic numbing effect is not entirely understood, but it is surely related to the fact that our brains developed during eons of living much simpler—although harsh—lives as hunter-gatherers. Our needs were relatively few, as were the types of risks and opportunities that we faced. We didn’t need to manage large numbers of things. Subitization—our preattentive ability to recognize the quantities one, two, and three immediately, without conscious thought—was perhaps built into our brains for this same reason.

To overcome the numbing effect of large quantities, we must engage our slow, rational, and reflective System 2 thinking processes. For example, in relation to Trump’s onslaught of bad behaviors, we should consider each infraction individually, allowing its full weight to affect us. Then, when thinking of his bad behaviors cumulatively, we should consciously remind ourselves that more is more, not less. We must rationally reframe the information. Abel Hertzberg reframed the Holocaust to effect this shift when he said, “There were not six million Jews murdered: there was one murder, six million times.”

We can and must fight quantitative numbing. If we don’t, we will remain its victims.

Take care,

Signature

Packed Bubbles Finally Make Sense

November 8th, 2016

I couldn’t understand it when Tableau Software introduced its packed bubble chart in version 8. It’s a useless form of display, assuming you care about the data. Today, however, I discovered what Tableau must have had in mind when they added this chart. The following example appears in an article that was published today about Tableau’s annual customer event, which is currently taking place in Austin:

Packed Bubble Elvis

Clearly, my inability to recognize the value of packed bubbles was a failure of imagination.

Take care,

Signature

Bad Science and the Fear of “Methodological Terrorism”

October 25th, 2016

A few days ago a data visualization developer friend of mine, Robert Monfera, sent me a link to a blog post titled “On methodological terrorism” by a thoughtful statistician named Robert Grant. Grant lays out an intelligent and entertaining case for speaking out against methodological flaws in scientific research—a practice that some on the receiving end characterize as “methodological terrorism.” He, I, and a growing number of others are speaking out to expose bad methodological practices in scientific research, not because we enjoy conflict, and certainly not because we’re assholes, but because bad science is always a waste of time and resources and it sometimes causes harm.

Grant wrote his blog post, in part, as a response to an article in a magazine of the Association for Psychological Science written by Susan Fiske, who decried the venomous nature of critiques and coined the term “methodological terrorism,” along with a few other bombastic terms, including “destructo-critics, “data police,” and “vigilante critique.” Fiske seems to be describing people such as Andrew Gelman, Ben Goldacre, Gerd Gigerenzer, and John Ioannides, whose work and integrity I greatly admire.

Here are a few of my favorite excerpts from Grant’s blog post:

If we view it as our civic duty to promote good research, it is also our civic duty not to tolerate bad research.

There is a corrupt system which you are obliged to end, and you will have to act outside the system to do so. Not by blowing up their offices…but by confronting their work when it is wrong, in the best scientific tradition, and refusing to go away until it is fixed.

Fiske [the scientist who coined the term “methodological terrorism”] said “it’s careers that are getting broken”; yes, that is precisely the objective. Acting out of ignorance, then seeing the light and fixing the problem is one thing, fighting not to change is another, and someone who refuses to learn and improve is not a scientist…

We should scare them all right, but in a thoroughly scientific way. It needs to be clear that nobody’s blunders are safe from being called out. We need to go after anyone and everyone, not just the big names.

The following excerpt from Grant’s blog post describes the circle-the-wagons resistance that the information visualization research community has exhibited in responses to my critiques:

The current system of a small number of the same people approving funding for studies, doing them, and editing the journals where they are published is arguably corrupt. The subject experts who run it benefit so much from it that they certainly don’t allow dissenting voices on their patch, and, unable to control self-publication on blogs and social media, react forcefully. Journals and conferences are used as an organ of repression, and we should focus on influencing them and not allowing them to be a refuge for irresponsible conduct.

Grant points out that researchers who cry foul in response to critiques of their work or the work of their communities tend to characterize those critiques as crossing the line into meanness. It is ironic that they often oppose these critiques through truly mean attempts at character assassination (“kill the messenger”) rather than rational discourse, which demonstrates the weakness of their position. Borrowing an analogy that was used during a keynote presentation last year by the designer Mike Monteiro, Grant likens the work of critics to that of dentists:

Now, consider the dentist. You pay them to tell it like it is. If your molar is rotten and has to come out, you want to hear it and have some straight-talking advice on what to do about it. You don’t enjoy hearing the news, but better now than later in agony. That is the service they provide — to tell you the facts, not to be your friend. We need to stop being friends of subject experts and start being their dentists instead.

Take a few minutes to read Grant’s blog post in full. If you care about science, you’ll find it worthwhile.

Take care,

Signature

“Should We?”: The Question That Is Rarely Asked

October 17th, 2016

The unique ability of the human brain to create technologies has taken us far. The benefits of technology, however, are not guaranteed, yet we celebrate and pursue them with abandon. When we imagine new technological abilities, we tend to ask one question only: “Can we?” “Can we create such a thing?” However, we’re good at creating what we can but shouldn’t. “Should we?”, though rarely asked, is the more important question by far.

I recently read a book by Samuel Arbesman, entitled Overcomplicated. I found it intriguing, yet also utterly frightening. Arbeson is Scientist in Residence at Lux Capital, a science and technology venture capital firm. He is a fine spokesperson for his employer’s interests, for he gives the technologies that make venture capitalists rich free license to do what they will by calling it inevitable.

Many modern technologies are now complicated in ways and to degrees that place them beyond our understanding. Arbesman accepts these over-complications as a given. In light of this, he proposes ways to study them that might yield a bit more understanding, even though, in his opinion, they will forever remain beyond our full grasp. He argues that modern technologies are like biological systems—the result of evolution rather than design—sometimes a mishmash of kluges embedded in millions of lines of programming code and sometimes the results of computers generating their own code with little or no human involvement. At no point in the book does Arbesman ask the question that was constantly screaming in my head as I read it: “Should we?” Should we create technologies that exceed our understanding and can therefore never be fully controlled? The only rational and moral answer to this question is “No, we shouldn’t.”

Arbesman assumes that we often cannot design and develop modern technologies in ways that remain within the reach of human understanding. Even though he acknowledges several examples of technologies that have created havoc because they were not understood, such as financial trading systems and power grids, he accepts these over-complications as inevitable.

As a technology professional of many years, I see things differently. These technological monsters that we create today as the products of kluges are over-complicated not because they cannot be kept within the realm of understanding and our control but because of poor, sloppy, undisciplined, and shortsighted design. Arbesman and others who pull the strings of modern information technologies want us to believe that these technologies are inherently and necessarily beyond human understanding, but this is a lie. Those who create these technologies are simply not willing to do the work that’s required to build them well.

We have a choice. We could demand better design. We could and should set the limits of human understanding as the unyielding boundary of our technologies. We can choose to only build what we can understand. This is harder than quickly and carelessly throwing together kluges or trusting algorithms to manage themselves, but it is a path that we must take to avoid destruction.

Arbesman advocates humility in the face of technologies that we cannot understand, but this is an odd humility, for it’s wrapped in hubris—a belief that we have the right to unleash on the world that which we can neither understand nor control. We may have this ability, but we do not have this right, for it is an almost certain path to destruction. Along with most of the technologists that he admiringly quotes in the book, Arbesman seems to embrace all information technologies that can be created as both inevitable and good—a reverence for Technology with a capital “T” that is both irrational and dangerous.

I’m certainly not the only technology professional who is concerned about this. Many share my perspective and express it, but our concerns are not backed by the deep pockets of technology companies, which currently set the agendas and shape the values of cultures throughout the developed world. The fear that our technologies could do great harm if left uncontrolled has been around for ages. This is a reasonable fear. In his film Jurassic Park, Steven Spielberg poignantly expressed this fear regarding biological technologies. There’s a great scene in the movie when a scientist played by the actor Jeff Goldblum asks the questions that we should always ask about potential technologies before we create and unleash them on the world. The scene accurately frames the problem as one that results from the selfishness of those who care only about their own immediate gains, never raising their eyes to look further into the future and never doubting the essential goodness of their creations, despite the monsters we are capable of creating.

Although this concern about unbridled technological development is occasionally expressed, it has had little effect on modern culture so far. Each of us who cares about the future of humanity and understands that the arc of technological development can be brought into line with the interests of humanity without sacrificing anything of real value should do what we can to voice our concerns. In your own organization, when an opportunity to create, modify, or uniquely apply a technology arises, you can ask, “Should we?” This might not be the path to popularity—those who choose to do good are often unappreciated for a time—but it is the only path that doesn’t lead to destruction. Be courageous, because you should.

Take care,

Signature