Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
January 2nd, 2017
For data sensemakers and others who are concerned with the integrity of data sensemaking and its outcomes, the most important book published in 2016 was Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. This book is much more than a clever title. It is a clarion call of imminent necessity.
Data can be used in harmful ways. This fact has become magnified to an extreme in the so-called realm of Big Data, fueled by an indiscriminate trust in information technologies, a reliance on fallacious correlations, and an effort to gain efficiencies no matter the cost in human suffering. In Weapons of Math Destruction, O’Neil reveals the dangers of data-sensemaking algorithms that employ statistics to score people and institutions for various purposes in ways that are unsound, unfair, and yes, destructive. Her argument is cogent and articulate, the product of deep expertise in data sensemaking directed by a clear sense of morality. Possessing a Ph.D. in mathematics from Harvard and having worked for many years herself as a data scientist developing algorithms, she is well qualified to understand the potential dangers of algorithms.
O’Neil defines WMDs as algorithms that exhibit three characteristics:
- They are opaque (i.e., inscrutable black boxes). What they do and how they do it remains invisible to us.
- They are destructive. They are designed to work against the subjects’ best interests in favor of the interests of those who use them.
- They scale. They grow exponentially. They scale not only in the sense of affecting many lives but also by affecting many aspects of people’s lives. For example, an algorithm that rejects you as a potential employee can start a series of dominoes in your life tumbling toward disaster.
O’Neill identifies several striking examples of WMDs in various realms, including evaluating teachers, identifying potential criminals, screening job applicants and college admissions candidates, targeting people for expensive payday loans, and pricing loans and insurance variably to take advantage of those who are most vulnerable.
During the Occupy Wall Street movement, following the financial meltdown that was caused in part by WMDs, O’Neill became increasingly concerned that the so-called Big Data movement could lead to harm. She writes:
More and more, I worried about the separation between technical models and real people, and about the moral repercussions of that separation. In fact, I saw the same pattern emerging that I’d witnessed in finance: a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.
I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, raking in outrageous fortunes and convincing themselves all the while that they deserved it.
WMDs are misuses of computers and data. She writes:
WMDs…tend to favor efficiency. By their very nature they feed on data that can be measured and counted. But fairness is squishy and hard to quantify. It is a concept. And computers, for all of their advances in language and logic, still struggle mightily with concepts…And the concept of fairness utterly escapes them…So fairness isn’t calculated into WMDs. And the result is massive, industrial production of unfairness.
WMDs are sometimes the result of good intentions, and they are passionately defended by their creators, but that doesn’t excuse them.
Injustice, whether based on greed or prejudice, has been with us forever. And you could argue that WMDs are no worse than the human nastiness of the recent past. In many cases, after all, a loan officer or hiring manager would routinely exclude entire races, not to mention an entire gender, from being considered for a mortgage or a job offer. Even the worst mathematical models, many would argue, aren’t nearly that bad.
But human decision making, while often flawed, has one chief virtue. It can evolve. As human beings learn and adapt, we change, and so do our processes. Automated systems, by contrast, stay stuck in time until engineers dive in to change them…Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide.
This book is more than an exposé. O’Neil goes on to suggest what we can do to prevent WMDs. It is incredibly important that we take these steps and do so starting now. Many of the industrial revolution’s abuses were eventually curtailed through a heightened moral awareness and thoughtful regulation. Now is the time to clean up the abuses of algorithms much as we once cleaned up the abuses of slavery and sweatshops. I heartily recommend this book.
December 6th, 2016
Several software vendors are integrating natural language processing (NLP) into data visualization tools these days, which should cause us to question the merits of this feature. In most cases, NLP is being used as an input interface—a way to specify what you would like to see—but some vendors are now proposing a reverse application of NLP as an output interface to express in words what already appears in a data visualization. In my opinion, NLP has limited usefulness in the context of data visualization. It is one of those features that vendors love to tout for the cool factor alone.
We express ourselves and interact with the world through multiple modes of communication, primarily through verbal language (i.e., spoken and written words), visual images, and physical gestures (i.e., movements of the body). These modes are not interchangeable. Each exists because different types of information are best expressed using specific modes. Even if you consider yourself a “word” person, you can only communicate some information effectively using images, and vice versa. Similarly, sometimes a subtle lift of the brow can say what we wish in a way that neither words nor pictures could ever equal. We don’t communicate effectively if we stick with the mode that we prefer when a different mode is better suited for the task.
NLP is computer-processed verbal language. Are words an appropriate means to specify what you want to see in data (i.e., input) or to explain what has already been expressed in images (i.e., output)? Let’s consider this.
First, we’ll begin with the usefulness of NLP as a means of input. Let’s sneak up on this topic by first recognizing that words are not always the most effective or efficient means of input. Just because you can get a computer to process words as a means of input doesn’t mean that it’s useful to do so. Would you use words to drive your car? (Please note that I’m not talking about the brief input that you would provide a self-driving car.) The commands that we issue to our cars to tell them where and how fast to go are best handled through a manual interface—one that today involves movements of our hands and feet. We could never equal with words what we can communicate to our cars immediately and precisely with simple movements. This is but one of many examples of situations that are best suited to physical gestures as the means of input. So, are words ever an appropriate means to specify what you’d like to see in data? Rarely, at best. NLP would only be useful as a means of input either in situations when the data visualization tool that you’re using has a horribly designed interface but a well-designed NLP interface (this tool doesn’t exist) or when you need to use a tool but have not yet learned its interface.
The second situation above corresponds to the “self-service” business intelligence or data analysis model that software vendors love to promote but can never actually provide. You cannot effectively make sense of data without first developing a basic set of data analysis skills. If you’ve already developed this basic set of skills, you would never choose NLP as your means of input, for a well-designed interface that you manipulate using manual gestures will almost always be more efficient and precise. Consequently, the only time that NLP is useful as a data visualization input interface is when people with no analytical skills want to view data. For example, a CEO could type or say “Show me sales revenues in U.S. dollars for the last year by product and by month” and the tool could potentially produce a line graph that the CEO could successfully read. Simple input such as this could certainly be handled by NLP. Chances are, however, that the simple requests that this CEO makes of data are already handled by predesigned reports that are readily available. Most likely, what the CEO would like to request using words would be something more complex, which NLP would not handle very well, and even if it could, the CEO might misunderstand once the results are displayed due to a lack of statistical knowledge. It isn’t useful to enable people to request visualizations that they cannot understand.
Now let’s consider the use of NLP as a means of expressing in words what appears in a data visualization. When properly done, we visualize data to present information that cannot be expressed at all or as well using words or numbers. For example, we visualize data to reveal patterns or to make rapid comparisons, which could never be done based solely on words or statistics. If the information can only be properly understood when expressed visually, using NLP to decipher the visualization and attempt to put it into words makes no sense. The only possible situation that I can imagine when this would provide any value at all would be for people who are visually impaired, rendering them unable to see the visualization. The caveat in this case, however, is the fact that words would never provide for someone who is visually impaired what an image could provide if the person could see. So, however cool it might seem when a vendor claims to apply NLP for this purpose, it’s a silly feature without substance.
You might argue, however, that NLP algorithms could be used to supplement a visualization by providing a narrative explanation, much as a presenter might explain the primary message of a chart and point out specific features of interest. Do you really believe that software developers can write computer algorithms that successfully supplement data visualizations in this manner, without human intervention? I suspect that only simple charts could be successfully interpreted using algorithms today.
This is not a topic that I’ve explored extensively, so it is certainly possible that I’m missing potential uses of NLP. If you believe that there is more to this than I’m seeing, please let me know. I will gladly revise my position based on good evidence.
November 23rd, 2016
There is an effect, which I will call “quantitative numbing,” that results in greater quantities of things that concern us producing less rather than more concern. This is an expression of “psychic numbing,” which is well documented. This phenomenon is chillingly described in a quote that is probably misattributed to Joseph Stalin, “A single death is a tragedy; a million deaths is a statistic.” According to Paul Slovic and Daniel Västfjäll, in a chapter titled “The More Who Die, the Less We Care” that appears in the book Numbers and Nerves:
There is considerable evidence that our affective responses and the resulting value we place on saving human lives follow the same sort of psychophysical function that characterizes our diminished sensitivity to changes in a wide range of perceptual and cognitive entities—brightness, loudness, heaviness, and wealth—as their underlying magnitudes increase.
As psychological research indicates, constant increases in the magnitude of a stimulus typically evoke smaller and smaller changes in response. Applying this principle to the valuing of human life suggests that a form of psychophysical numbing may result from our inability to appreciate losses of life as they become larger.
(Numbers and Nerves: Information, Emotion, and Meaning in a World of Data, Scott Slovic and Paul Slovic, editors, 2015, page 31)
This effect exhibits a logarithmic pattern, with larger numbers resulting in a diminishing response. Slovic and Västfjäll go on to point out, however, that feelings of compassion suffer an even greater loss as quantities increase. In fact, increases in the growth of compassion are not merely reduced as quantities increase, but compassion actually decreases and does so quite dramatically.
For example, studies have shown that charitable contributions in response to appeals are greatest when intended to help a single individual, are reduced somewhat when helping two, and decrease at a greater and greater rate as the numbers grow. Beyond a certain threshold, we stop giving altogether, even to save lives.
Our moral intuitions often seduce us into calmly turning away from massive losses of human lives, when we should be driven by outrage to act. This is no small weakness in our moral compass. (ibid, page 35)
This is a tragic response that we must learn to overcome, but it is difficult for it is built into our brains.
This phenomenon of more producing less is not limited to compassion. It seems to broadly apply to our greatest concerns. Current events involving president-elect Donald Trump have made me particularly sensitive to quantitative numbing. Because he exhibits so many examples of bad behavior, those behaviors are having relatively little impact on us. The sheer number of incidents creates a numbing effect. Any one of Trump’s greedy, racist, sexist, vulgar, discriminatory, anti-intellectual, and dishonest acts, if considered alone, would concern us more than the huge number of examples that now confront us. The larger the number, the lesser the impact, because increases in quantity inoculate us against the effects. This tendency is built into our brains; it is automatic, immediate, and unconscious.
The cause of this psychic numbing effect is not entirely understood, but it is surely related to the fact that our brains developed during eons of living much simpler—although harsh—lives as hunter-gatherers. Our needs were relatively few, as were the types of risks and opportunities that we faced. We didn’t need to manage large numbers of things. Subitization—our preattentive ability to recognize the quantities one, two, and three immediately, without conscious thought—was perhaps built into our brains for this same reason.
To overcome the numbing effect of large quantities, we must engage our slow, rational, and reflective System 2 thinking processes. For example, in relation to Trump’s onslaught of bad behaviors, we should consider each infraction individually, allowing its full weight to affect us. Then, when thinking of his bad behaviors cumulatively, we should consciously remind ourselves that more is more, not less. We must rationally reframe the information. Abel Hertzberg reframed the Holocaust to effect this shift when he said, “There were not six million Jews murdered: there was one murder, six million times.”
We can and must fight quantitative numbing. If we don’t, we will remain its victims.
November 8th, 2016
I couldn’t understand it when Tableau Software introduced its packed bubble chart in version 8. It’s a useless form of display, assuming you care about the data. Today, however, I discovered what Tableau must have had in mind when they added this chart. The following example appears in an article that was published today about Tableau’s annual customer event, which is currently taking place in Austin:
Clearly, my inability to recognize the value of packed bubbles was a failure of imagination.
October 25th, 2016
A few days ago a data visualization developer friend of mine, Robert Monfera, sent me a link to a blog post titled “On methodological terrorism” by a thoughtful statistician named Robert Grant. Grant lays out an intelligent and entertaining case for speaking out against methodological flaws in scientific research—a practice that some on the receiving end characterize as “methodological terrorism.” He, I, and a growing number of others are speaking out to expose bad methodological practices in scientific research, not because we enjoy conflict, and certainly not because we’re assholes, but because bad science is always a waste of time and resources and it sometimes causes harm.
Grant wrote his blog post, in part, as a response to an article in a magazine of the Association for Psychological Science written by Susan Fiske, who decried the venomous nature of critiques and coined the term “methodological terrorism,” along with a few other bombastic terms, including “destructo-critics, “data police,” and “vigilante critique.” Fiske seems to be describing people such as Andrew Gelman, Ben Goldacre, Gerd Gigerenzer, and John Ioannides, whose work and integrity I greatly admire.
Here are a few of my favorite excerpts from Grant’s blog post:
If we view it as our civic duty to promote good research, it is also our civic duty not to tolerate bad research.
There is a corrupt system which you are obliged to end, and you will have to act outside the system to do so. Not by blowing up their offices…but by confronting their work when it is wrong, in the best scientific tradition, and refusing to go away until it is fixed.
Fiske [the scientist who coined the term “methodological terrorism”] said “it’s careers that are getting broken”; yes, that is precisely the objective. Acting out of ignorance, then seeing the light and fixing the problem is one thing, fighting not to change is another, and someone who refuses to learn and improve is not a scientist…
We should scare them all right, but in a thoroughly scientific way. It needs to be clear that nobody’s blunders are safe from being called out. We need to go after anyone and everyone, not just the big names.
The following excerpt from Grant’s blog post describes the circle-the-wagons resistance that the information visualization research community has exhibited in responses to my critiques:
The current system of a small number of the same people approving funding for studies, doing them, and editing the journals where they are published is arguably corrupt. The subject experts who run it benefit so much from it that they certainly don’t allow dissenting voices on their patch, and, unable to control self-publication on blogs and social media, react forcefully. Journals and conferences are used as an organ of repression, and we should focus on influencing them and not allowing them to be a refuge for irresponsible conduct.
Grant points out that researchers who cry foul in response to critiques of their work or the work of their communities tend to characterize those critiques as crossing the line into meanness. It is ironic that they often oppose these critiques through truly mean attempts at character assassination (“kill the messenger”) rather than rational discourse, which demonstrates the weakness of their position. Borrowing an analogy that was used during a keynote presentation last year by the designer Mike Monteiro, Grant likens the work of critics to that of dentists:
Now, consider the dentist. You pay them to tell it like it is. If your molar is rotten and has to come out, you want to hear it and have some straight-talking advice on what to do about it. You don’t enjoy hearing the news, but better now than later in agony. That is the service they provide — to tell you the facts, not to be your friend. We need to stop being friends of subject experts and start being their dentists instead.
Take a few minutes to read Grant’s blog post in full. If you care about science, you’ll find it worthwhile.