Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
January 5th, 2017
Last June I celebrated my 62nd birthday. As I look back on my life, my early years seem like distant memories of a different age, yet the years also seem to have flown by in an instant. Our lives are brief when superimposed on history, but they can be rich if we find a way to contribute to history. I feel that my work in the field of data visualization has provided that opportunity, and I’m incredibly grateful.
I have worked as an information technologist for 33 years. Similar to many other thoughtful IT professionals, I have a love-hate relationship with technologies. My feelings about them range from ecstasy to depression and disgust. I love technologies that are useful and work well, but I dislike all else, which includes most of the IT products on the market.
We humans are distinguished from other species in part by our creation and use of tools (a.k.a., technologies). Our relationship with these technologies has changed considerably since the hunter-gatherer days, especially since the advent of the computer. The human condition is increasingly affected for both good and ill by our technologies. We need to evaluate them with increasing awareness and moral judgment. We need to invite them into our lives and the lives of our children with greater care.
In the early days of my IT career, I spent a decade working in the world of finance. I was employed by one of the financial institutions that later contributed to the meltdown of 2007 and 2008. In fact, If I’m not mistaken, my employer invented the infamous reverse-interest mortgage loan. I was a manager in the loan service department at a time when a large group of employees had the job of explaining to customers why their loan balances were increasing. Fortunately, I never had to answer those questions myself, which I would have found intolerable.
During those years, I remember learning about the famous 80/20 rule (a.k.a., the Pareto Principle), but what I learned at the time was a perversion of the principle that says a lot about the culture in which I worked. I was told that the 80/20 rule meant that we should only work to satisfy 80% of the need, for the remaining 20% wasn’t worth the effort. When we built IT systems, we attempted to address only 80% of what was needed with tools that worked only 80% of the time. Excellence was never the goal; we sought “good enough.” But good enough for what? For most technology companies, the answer is “good enough to maximize revenues for the next few quarters.” A product that is only 80% good or less can be camouflaged for awhile by deceitful marketing. By the time customers discover the truth, it will be too late: their investment will have already been made and those who made it will never admit their error, lest they be held responsible.
Traditional theories of economics assume rational behavior. A relatively recent newcomer, Behavioral Economics, has shown, however, that human economic behavior is often far from rational. The same can be said of the human production of and use of technologies. When our progenitors became tool users and eventually tool creators, for eons those tools always arose from real need and they rarely caught on unless they worked. This is no longer true, especially of information technologies. Much that we do with computers today did not emerge in response to real needs, is often misapplied in ways that produce little or no benefit, and far too often works poorly, if at all. This suggests that a new scientific discipline may be needed to study these technologies to improve their usefulness and to diminish their waste and harmful effects. I propose that we call this new field of study Itology (i.e., IT-ology, pronounced eye-tology). Its focus would be on the responsible creation and use of information technologies. Whether the name “Itology” is adopted doesn’t matter, but making this area of study integral to IT certainly does.
January 2nd, 2017
For data sensemakers and others who are concerned with the integrity of data sensemaking and its outcomes, the most important book published in 2016 was Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. This book is much more than a clever title. It is a clarion call of imminent necessity.
Data can be used in harmful ways. This fact has become magnified to an extreme in the so-called realm of Big Data, fueled by an indiscriminate trust in information technologies, a reliance on fallacious correlations, and an effort to gain efficiencies no matter the cost in human suffering. In Weapons of Math Destruction, O’Neil reveals the dangers of data-sensemaking algorithms that employ statistics to score people and institutions for various purposes in ways that are unsound, unfair, and yes, destructive. Her argument is cogent and articulate, the product of deep expertise in data sensemaking directed by a clear sense of morality. Possessing a Ph.D. in mathematics from Harvard and having worked for many years herself as a data scientist developing algorithms, she is well qualified to understand the potential dangers of algorithms.
O’Neil defines WMDs as algorithms that exhibit three characteristics:
- They are opaque (i.e., inscrutable black boxes). What they do and how they do it remains invisible to us.
- They are destructive. They are designed to work against the subjects’ best interests in favor of the interests of those who use them.
- They scale. They grow exponentially. They scale not only in the sense of affecting many lives but also by affecting many aspects of people’s lives. For example, an algorithm that rejects you as a potential employee can start a series of dominoes in your life tumbling toward disaster.
O’Neill identifies several striking examples of WMDs in various realms, including evaluating teachers, identifying potential criminals, screening job applicants and college admissions candidates, targeting people for expensive payday loans, and pricing loans and insurance variably to take advantage of those who are most vulnerable.
During the Occupy Wall Street movement, following the financial meltdown that was caused in part by WMDs, O’Neill became increasingly concerned that the so-called Big Data movement could lead to harm. She writes:
More and more, I worried about the separation between technical models and real people, and about the moral repercussions of that separation. In fact, I saw the same pattern emerging that I’d witnessed in finance: a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.
I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, raking in outrageous fortunes and convincing themselves all the while that they deserved it.
WMDs are misuses of computers and data. She writes:
WMDs…tend to favor efficiency. By their very nature they feed on data that can be measured and counted. But fairness is squishy and hard to quantify. It is a concept. And computers, for all of their advances in language and logic, still struggle mightily with concepts…And the concept of fairness utterly escapes them…So fairness isn’t calculated into WMDs. And the result is massive, industrial production of unfairness.
WMDs are sometimes the result of good intentions, and they are passionately defended by their creators, but that doesn’t excuse them.
Injustice, whether based on greed or prejudice, has been with us forever. And you could argue that WMDs are no worse than the human nastiness of the recent past. In many cases, after all, a loan officer or hiring manager would routinely exclude entire races, not to mention an entire gender, from being considered for a mortgage or a job offer. Even the worst mathematical models, many would argue, aren’t nearly that bad.
But human decision making, while often flawed, has one chief virtue. It can evolve. As human beings learn and adapt, we change, and so do our processes. Automated systems, by contrast, stay stuck in time until engineers dive in to change them…Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide.
This book is more than an exposé. O’Neil goes on to suggest what we can do to prevent WMDs. It is incredibly important that we take these steps and do so starting now. Many of the industrial revolution’s abuses were eventually curtailed through a heightened moral awareness and thoughtful regulation. Now is the time to clean up the abuses of algorithms much as we once cleaned up the abuses of slavery and sweatshops. I heartily recommend this book.
December 6th, 2016
Several software vendors are integrating natural language processing (NLP) into data visualization tools these days, which should cause us to question the merits of this feature. In most cases, NLP is being used as an input interface—a way to specify what you would like to see—but some vendors are now proposing a reverse application of NLP as an output interface to express in words what already appears in a data visualization. In my opinion, NLP has limited usefulness in the context of data visualization. It is one of those features that vendors love to tout for the cool factor alone.
We express ourselves and interact with the world through multiple modes of communication, primarily through verbal language (i.e., spoken and written words), visual images, and physical gestures (i.e., movements of the body). These modes are not interchangeable. Each exists because different types of information are best expressed using specific modes. Even if you consider yourself a “word” person, you can only communicate some information effectively using images, and vice versa. Similarly, sometimes a subtle lift of the brow can say what we wish in a way that neither words nor pictures could ever equal. We don’t communicate effectively if we stick with the mode that we prefer when a different mode is better suited for the task.
NLP is computer-processed verbal language. Are words an appropriate means to specify what you want to see in data (i.e., input) or to explain what has already been expressed in images (i.e., output)? Let’s consider this.
First, we’ll begin with the usefulness of NLP as a means of input. Let’s sneak up on this topic by first recognizing that words are not always the most effective or efficient means of input. Just because you can get a computer to process words as a means of input doesn’t mean that it’s useful to do so. Would you use words to drive your car? (Please note that I’m not talking about the brief input that you would provide a self-driving car.) The commands that we issue to our cars to tell them where and how fast to go are best handled through a manual interface—one that today involves movements of our hands and feet. We could never equal with words what we can communicate to our cars immediately and precisely with simple movements. This is but one of many examples of situations that are best suited to physical gestures as the means of input. So, are words ever an appropriate means to specify what you’d like to see in data? Rarely, at best. NLP would only be useful as a means of input either in situations when the data visualization tool that you’re using has a horribly designed interface but a well-designed NLP interface (this tool doesn’t exist) or when you need to use a tool but have not yet learned its interface.
The second situation above corresponds to the “self-service” business intelligence or data analysis model that software vendors love to promote but can never actually provide. You cannot effectively make sense of data without first developing a basic set of data analysis skills. If you’ve already developed this basic set of skills, you would never choose NLP as your means of input, for a well-designed interface that you manipulate using manual gestures will almost always be more efficient and precise. Consequently, the only time that NLP is useful as a data visualization input interface is when people with no analytical skills want to view data. For example, a CEO could type or say “Show me sales revenues in U.S. dollars for the last year by product and by month” and the tool could potentially produce a line graph that the CEO could successfully read. Simple input such as this could certainly be handled by NLP. Chances are, however, that the simple requests that this CEO makes of data are already handled by predesigned reports that are readily available. Most likely, what the CEO would like to request using words would be something more complex, which NLP would not handle very well, and even if it could, the CEO might misunderstand once the results are displayed due to a lack of statistical knowledge. It isn’t useful to enable people to request visualizations that they cannot understand.
Now let’s consider the use of NLP as a means of expressing in words what appears in a data visualization. When properly done, we visualize data to present information that cannot be expressed at all or as well using words or numbers. For example, we visualize data to reveal patterns or to make rapid comparisons, which could never be done based solely on words or statistics. If the information can only be properly understood when expressed visually, using NLP to decipher the visualization and attempt to put it into words makes no sense. The only possible situation that I can imagine when this would provide any value at all would be for people who are visually impaired, rendering them unable to see the visualization. The caveat in this case, however, is the fact that words would never provide for someone who is visually impaired what an image could provide if the person could see. So, however cool it might seem when a vendor claims to apply NLP for this purpose, it’s a silly feature without substance.
You might argue, however, that NLP algorithms could be used to supplement a visualization by providing a narrative explanation, much as a presenter might explain the primary message of a chart and point out specific features of interest. Do you really believe that software developers can write computer algorithms that successfully supplement data visualizations in this manner, without human intervention? I suspect that only simple charts could be successfully interpreted using algorithms today.
This is not a topic that I’ve explored extensively, so it is certainly possible that I’m missing potential uses of NLP. If you believe that there is more to this than I’m seeing, please let me know. I will gladly revise my position based on good evidence.
November 23rd, 2016
There is an effect, which I will call “quantitative numbing,” that results in greater quantities of things that concern us producing less rather than more concern. This is an expression of “psychic numbing,” which is well documented. This phenomenon is chillingly described in a quote that is probably misattributed to Joseph Stalin, “A single death is a tragedy; a million deaths is a statistic.” According to Paul Slovic and Daniel Västfjäll, in a chapter titled “The More Who Die, the Less We Care” that appears in the book Numbers and Nerves:
There is considerable evidence that our affective responses and the resulting value we place on saving human lives follow the same sort of psychophysical function that characterizes our diminished sensitivity to changes in a wide range of perceptual and cognitive entities—brightness, loudness, heaviness, and wealth—as their underlying magnitudes increase.
As psychological research indicates, constant increases in the magnitude of a stimulus typically evoke smaller and smaller changes in response. Applying this principle to the valuing of human life suggests that a form of psychophysical numbing may result from our inability to appreciate losses of life as they become larger.
(Numbers and Nerves: Information, Emotion, and Meaning in a World of Data, Scott Slovic and Paul Slovic, editors, 2015, page 31)
This effect exhibits a logarithmic pattern, with larger numbers resulting in a diminishing response. Slovic and Västfjäll go on to point out, however, that feelings of compassion suffer an even greater loss as quantities increase. In fact, increases in the growth of compassion are not merely reduced as quantities increase, but compassion actually decreases and does so quite dramatically.
For example, studies have shown that charitable contributions in response to appeals are greatest when intended to help a single individual, are reduced somewhat when helping two, and decrease at a greater and greater rate as the numbers grow. Beyond a certain threshold, we stop giving altogether, even to save lives.
Our moral intuitions often seduce us into calmly turning away from massive losses of human lives, when we should be driven by outrage to act. This is no small weakness in our moral compass. (ibid, page 35)
This is a tragic response that we must learn to overcome, but it is difficult for it is built into our brains.
This phenomenon of more producing less is not limited to compassion. It seems to broadly apply to our greatest concerns. Current events involving president-elect Donald Trump have made me particularly sensitive to quantitative numbing. Because he exhibits so many examples of bad behavior, those behaviors are having relatively little impact on us. The sheer number of incidents creates a numbing effect. Any one of Trump’s greedy, racist, sexist, vulgar, discriminatory, anti-intellectual, and dishonest acts, if considered alone, would concern us more than the huge number of examples that now confront us. The larger the number, the lesser the impact, because increases in quantity inoculate us against the effects. This tendency is built into our brains; it is automatic, immediate, and unconscious.
The cause of this psychic numbing effect is not entirely understood, but it is surely related to the fact that our brains developed during eons of living much simpler—although harsh—lives as hunter-gatherers. Our needs were relatively few, as were the types of risks and opportunities that we faced. We didn’t need to manage large numbers of things. Subitization—our preattentive ability to recognize the quantities one, two, and three immediately, without conscious thought—was perhaps built into our brains for this same reason.
To overcome the numbing effect of large quantities, we must engage our slow, rational, and reflective System 2 thinking processes. For example, in relation to Trump’s onslaught of bad behaviors, we should consider each infraction individually, allowing its full weight to affect us. Then, when thinking of his bad behaviors cumulatively, we should consciously remind ourselves that more is more, not less. We must rationally reframe the information. Abel Hertzberg reframed the Holocaust to effect this shift when he said, “There were not six million Jews murdered: there was one murder, six million times.”
We can and must fight quantitative numbing. If we don’t, we will remain its victims.
November 8th, 2016
I couldn’t understand it when Tableau Software introduced its packed bubble chart in version 8. It’s a useless form of display, assuming you care about the data. Today, however, I discovered what Tableau must have had in mind when they added this chart. The following example appears in an article that was published today about Tableau’s annual customer event, which is currently taking place in Austin:
Clearly, my inability to recognize the value of packed bubbles was a failure of imagination.