Several software vendors are integrating natural language processing (NLP) into data visualization tools these days, which should cause us to question the merits of this feature. In most cases, NLP is being used as an input interface—a way to specify what you would like to see—but some vendors are now proposing a reverse application of NLP as an output interface to express in words what already appears in a data visualization. In my opinion, NLP has limited usefulness in the context of data visualization. It is one of those features that vendors love to tout for the cool factor alone.
We express ourselves and interact with the world through multiple modes of communication, primarily through verbal language (i.e., spoken and written words), visual images, and physical gestures (i.e., movements of the body). These modes are not interchangeable. Each exists because different types of information are best expressed using specific modes. Even if you consider yourself a “word” person, you can only communicate some information effectively using images, and vice versa. Similarly, sometimes a subtle lift of the brow can say what we wish in a way that neither words nor pictures could ever equal. We don’t communicate effectively if we stick with the mode that we prefer when a different mode is better suited for the task.
NLP is computer-processed verbal language. Are words an appropriate means to specify what you want to see in data (i.e., input) or to explain what has already been expressed in images (i.e., output)? Let’s consider this.
First, we’ll begin with the usefulness of NLP as a means of input. Let’s sneak up on this topic by first recognizing that words are not always the most effective or efficient means of input. Just because you can get a computer to process words as a means of input doesn’t mean that it’s useful to do so. Would you use words to drive your car? (Please note that I’m not talking about the brief input that you would provide a self-driving car.) The commands that we issue to our cars to tell them where and how fast to go are best handled through a manual interface—one that today involves movements of our hands and feet. We could never equal with words what we can communicate to our cars immediately and precisely with simple movements. This is but one of many examples of situations that are best suited to physical gestures as the means of input. So, are words ever an appropriate means to specify what you’d like to see in data? Rarely, at best. NLP would only be useful as a means of input either in situations when the data visualization tool that you’re using has a horribly designed interface but a well-designed NLP interface (this tool doesn’t exist) or when you need to use a tool but have not yet learned its interface.
The second situation above corresponds to the “self-service” business intelligence or data analysis model that software vendors love to promote but can never actually provide. You cannot effectively make sense of data without first developing a basic set of data analysis skills. If you’ve already developed this basic set of skills, you would never choose NLP as your means of input, for a well-designed interface that you manipulate using manual gestures will almost always be more efficient and precise. Consequently, the only time that NLP is useful as a data visualization input interface is when people with no analytical skills want to view data. For example, a CEO could type or say “Show me sales revenues in U.S. dollars for the last year by product and by month” and the tool could potentially produce a line graph that the CEO could successfully read. Simple input such as this could certainly be handled by NLP. Chances are, however, that the simple requests that this CEO makes of data are already handled by predesigned reports that are readily available. Most likely, what the CEO would like to request using words would be something more complex, which NLP would not handle very well, and even if it could, the CEO might misunderstand once the results are displayed due to a lack of statistical knowledge. It isn’t useful to enable people to request visualizations that they cannot understand.
Now let’s consider the use of NLP as a means of expressing in words what appears in a data visualization. When properly done, we visualize data to present information that cannot be expressed at all or as well using words or numbers. For example, we visualize data to reveal patterns or to make rapid comparisons, which could never be done based solely on words or statistics. If the information can only be properly understood when expressed visually, using NLP to decipher the visualization and attempt to put it into words makes no sense. The only possible situation that I can imagine when this would provide any value at all would be for people who are visually impaired, rendering them unable to see the visualization. The caveat in this case, however, is the fact that words would never provide for someone who is visually impaired what an image could provide if the person could see. So, however cool it might seem when a vendor claims to apply NLP for this purpose, it’s a silly feature without substance.
You might argue, however, that NLP algorithms could be used to supplement a visualization by providing a narrative explanation, much as a presenter might explain the primary message of a chart and point out specific features of interest. Do you really believe that software developers can write computer algorithms that successfully supplement data visualizations in this manner, without human intervention? I suspect that only simple charts could be successfully interpreted using algorithms today.
This is not a topic that I’ve explored extensively, so it is certainly possible that I’m missing potential uses of NLP. If you believe that there is more to this than I’m seeing, please let me know. I will gladly revise my position based on good evidence.