Is there a Role for Natural Language Processing in Data Visualization?
Several software vendors are integrating natural language processing (NLP) into data visualization tools these days, which should cause us to question the merits of this feature. In most cases, NLP is being used as an input interface—a way to specify what you would like to see—but some vendors are now proposing a reverse application of NLP as an output interface to express in words what already appears in a data visualization. In my opinion, NLP has limited usefulness in the context of data visualization. It is one of those features that vendors love to tout for the cool factor alone.
We express ourselves and interact with the world through multiple modes of communication, primarily through verbal language (i.e., spoken and written words), visual images, and physical gestures (i.e., movements of the body). These modes are not interchangeable. Each exists because different types of information are best expressed using specific modes. Even if you consider yourself a “word†person, you can only communicate some information effectively using images, and vice versa. Similarly, sometimes a subtle lift of the brow can say what we wish in a way that neither words nor pictures could ever equal. We don’t communicate effectively if we stick with the mode that we prefer when a different mode is better suited for the task.
NLP is computer-processed verbal language. Are words an appropriate means to specify what you want to see in data (i.e., input) or to explain what has already been expressed in images (i.e., output)? Let’s consider this.
First, we’ll begin with the usefulness of NLP as a means of input. Let’s sneak up on this topic by first recognizing that words are not always the most effective or efficient means of input. Just because you can get a computer to process words as a means of input doesn’t mean that it’s useful to do so. Would you use words to drive your car? (Please note that I’m not talking about the brief input that you would provide a self-driving car.) The commands that we issue to our cars to tell them where and how fast to go are best handled through a manual interface—one that today involves movements of our hands and feet. We could never equal with words what we can communicate to our cars immediately and precisely with simple movements. This is but one of many examples of situations that are best suited to physical gestures as the means of input. So, are words ever an appropriate means to specify what you’d like to see in data? Rarely, at best. NLP would only be useful as a means of input either in situations when the data visualization tool that you’re using has a horribly designed interface but a well-designed NLP interface (this tool doesn’t exist) or when you need to use a tool but have not yet learned its interface.
The second situation above corresponds to the “self-service†business intelligence or data analysis model that software vendors love to promote but can never actually provide. You cannot effectively make sense of data without first developing a basic set of data analysis skills. If you’ve already developed this basic set of skills, you would never choose NLP as your means of input, for a well-designed interface that you manipulate using manual gestures will almost always be more efficient and precise. Consequently, the only time that NLP is useful as a data visualization input interface is when people with no analytical skills want to view data. For example, a CEO could type or say “Show me sales revenues in U.S. dollars for the last year by product and by month†and the tool could potentially produce a line graph that the CEO could successfully read. Simple input such as this could certainly be handled by NLP. Chances are, however, that the simple requests that this CEO makes of data are already handled by predesigned reports that are readily available. Most likely, what the CEO would like to request using words would be something more complex, which NLP would not handle very well, and even if it could, the CEO might misunderstand once the results are displayed due to a lack of statistical knowledge. It isn’t useful to enable people to request visualizations that they cannot understand.
Now let’s consider the use of NLP as a means of expressing in words what appears in a data visualization. When properly done, we visualize data to present information that cannot be expressed at all or as well using words or numbers. For example, we visualize data to reveal patterns or to make rapid comparisons, which could never be done based solely on words or statistics. If the information can only be properly understood when expressed visually, using NLP to decipher the visualization and attempt to put it into words makes no sense. The only possible situation that I can imagine when this would provide any value at all would be for people who are visually impaired, rendering them unable to see the visualization. The caveat in this case, however, is the fact that words would never provide for someone who is visually impaired what an image could provide if the person could see. So, however cool it might seem when a vendor claims to apply NLP for this purpose, it’s a silly feature without substance.
You might argue, however, that NLP algorithms could be used to supplement a visualization by providing a narrative explanation, much as a presenter might explain the primary message of a chart and point out specific features of interest. Do you really believe that software developers can write computer algorithms that successfully supplement data visualizations in this manner, without human intervention? I suspect that only simple charts could be successfully interpreted using algorithms today.
This is not a topic that I’ve explored extensively, so it is certainly possible that I’m missing potential uses of NLP. If you believe that there is more to this than I’m seeing, please let me know. I will gladly revise my position based on good evidence.
Take care,
13 Comments on “Is there a Role for Natural Language Processing in Data Visualization?”
G’day Stephen, a couple of things sprung to mind when I read your opening paragraph and each of them demonstrate how silly NLP is for data visualisation.
First, I thought about the various forms of learning – visual, auditory and kinaesthetic. If an auditory person would benfit from NLP, then the kinaesthetic person should plug their data visualisations into a 3D printer so that they can feel the data!!
My second thought was that this NLP idea probably comes from sci fi shows such as Star Trek where the crew of the Enterprise are able to ask the computer for a complex analysis with just a few words and the computer seems to know and understand how to deal with a huge number of unstated variables. I remember one episode where the computer responded to the question with “Please state parameters”. As a viewer, I was wondering which particular parameters the computer was referring to, but of course the character in the show knew what was needed.
I think that people think that it should be possible to build a NLP interface that works as well as in Star Trek, but of course in reality, the conversation would take a long time to communicate all the information required to adequately specify all the parameters of a request.
Cheers,
Barnaby.
Cheers,
Barnaby.
I’m not sure.
For sure, NLP isn’t there yet , nor is it what the marketers say it is.
But in the spectrum of interfaces we currently have to produce visualisations, I do see a potential place for it.
If we consider the current interfaces, each are suited to a level of granularity, precision and control: graphical editors, coding, command lines, GUIs, wizards…
Some of these can be misused by people with a lack of understanding of what they’re asking. But some offer those people with less knowledge of the dataset to still produce meaningful charts, whether or not they can correctly interpret the results.
NLP would probably sit near to wizards in the spectrum.
Dear Stephan, in regard to the last user case, using NLP to Narratively supplement visual representation. I understand that algorithems would never (well, before true general AI is available at the least) be able to do this alone. But is it possible for algorithems to supplement the human’s job and make the latter’s job a lot easier?
For example, the algorithm may give quite a few interpretations, with most of them irrelevant but maybe one or two good ones. The human can then delete the rest and touch up the good interpretations. It may also serve as a checklist to prevent the human from missing something.
Finally, NLP & text to speech can significantly reduce the effort of audio production if human analyst intends to use audio to narrate his/her diagrams.
Tom,
Can you describe the type of person who can specify what they would like to see in a data visualization and can understand the results who wouldn’t find a well-designed GUI interface more efficient and effective? I’m finding it hard to imagine who this might be.
Horace,
It is potentially useful for a smart statistical application to analyze a data set and then describe in words and charts its findings. This doesn’t replace human analysts, but it can assist them. This is not what NLP does to produce output. The NLP application that some vendors are proposing allegedly examines a data visualization (i.e., one or more graphs) and then attempts to describe in narrative fashion what’s contained in that data visualization.
Regarding the usefulness of NLP text to speech conversion in this context, I’m having a hard time imagining it. Would a human analyst who is recording an audio-visual presentation really choose to incorporate speech generated by NLP along with his or her own speech?
This sounds a lot like Wolfram Alpha (http://www.wolframalpha.com/). Without NLP, it would be painful to figure out what statistics Wolfram has (Crime rate in Boston? Amount of magnesium in an orange?) But after that point, Wolfram gets confused easily. (Crime rate vs. temperature in Boston? Beyonce most popular songs?)
Among graphing software, Wolfram Alpha is probably unique in its diversity of data types. With just about any other data set, the capabilities you’re exploring could fit in a GUI.
BI Vendors marginalize so much already:
– visualization (duh)
– analytical thinking
– scientific research
– most of the IT department
Is it any surprise they’d do the same with basic human-machine intercommunication?
Interesting topic, but I do have to question the desire to take a visualization and then convert that into a natural language description. I mean, isn’t the point of a visualization to help clarify what might be confusing? To that end, might as well tell me directly what is going it to the visualization rather than taking that data, visualizing it, and then trying to describe it with words.
That said, I think yeah, there’s a place for NLP here – but it seems to me that would be on the input side, not so much on the output.
At my company, we are working on a data platform that (shock) has a natural language processing interface. But we also have a machine learning component to make sense of disparate data. This, we feel, is going to be great for those working with IoT data, but really, any connected data source can work with it.
NLP had its uses on the interface side, but we think it is the machine learning to make sense of the data that is going to be the real game changer.
Thoughts?
I was forced to use native language querying in PowerBi recently – when creating real time reports on Azure Stream Analytics data, it’s unfortunately the only way to create certain type of queries. Writing a proper query using “native language” proved to take about 10 times more time then it would take to write SQL or use a graphical interface. The results were very upredictable. Until there’s a true AI available, it seems to me NLP querying will only bring confusion. On the other hand, I find analyzing data enriched using NLP (sentiment analysis etc) quite interesting.
Stephen,
I viewed a basic demo from a major software provider and I was also skeptical. The vendor typed in ‘find X within 30 miles of city Y’. This same system could be created with the vendor’s current parameter functionality, but that parameter would have to be created first. To me, the whole NLP system does not look useful because like @Jakub above, I could probably type the SQL or use the GUI faster than typing the question. There was one aspect of NLP input that could be potentially useful though, and that is if we view the NLP query as a very flexible parameter that doesn’t need to be pre-built.
In my opinion, the real roadblock to understanding is the general lack of curiosity by people. I don’t see how NLP will make people ask questions of their data, if they are already not asking questions. It’s possible that NLP lowers another barrier to question-asking, but if is slows down a skilled analyst then it’s raising a barrier.
Some of the comments here remind me of the programmer who told me he’d never use a BI tools because he could create the reports so much faster using assembly language. (No, he wasn’t kidding, unfortunately.)
One of the major trends in BI has been democratization of access to analytics. So the issue isn’t so much whether an expert would use NLP to create a visualization (or, more generally, to get a better understanding of the information hiding in the data), but how general business users could use an NLP tool to have the system do the technical work of formatting queries and generating charts for them.
Of course, this is a two-edged sword: business users could (and will) create convincing analyses that are complete nonsense. But they (and professionals who think they’re data scientists because they’re fluent in a statistical language) are already doing that.
The real power will come as the background AI gets better both at understanding that “93.5% of the time when users ask me to do X, they really want Y” and at being able to respond to a bad analysis idea with “I’m sorry, Dave, I can’t do that with that data” (okay, perhaps a bad example…)
It’s also useful to remember that it’s typical that when disruptive technologies appear, they look – for a reasonably long time – stupid compared to established technologies. But as time goes on, they get a bit better, and then a bit better, and start to eat up the lower part of their food chain. Then, suddenly, the established technology vendors realize that the upstart has matured enough to be a real threat. Often, by then, it’s too late (or at least very expensive) for them to respond.
So, Stephen, thanks for posting on this topic. No, it probably isn’t ready for prime time yet. But we should be keeping an eye on it and watch for how it progresses (or doesn’t progress).
I agree with most of what Roy says. I am personally an advocate of democratizing data analysis and would rather see more people involved with it – even if they make mistakes. The real question, however, is what areas to focus on most in spreading the use of, and appreciation for, such analysis. NLP (on the output end) seems to me a low priority compared with many other steps in the analysis process. Tools that help people understand the data they have – measurement issues, potential errors, better linking of data to the questions being asked, using data to clarify meaningful questions, etc. – seem far more important than being able to verbally hear what I visualization is telling us (although one specific area where this holds great promise is for the visually impaired).
I would not recommend people write their own programs for word processing – using a developed software program is find. I’m sure writing such a program would yield greater understanding of many things, but the effort does not seem to be justified compared with learning better writing and reading skills. I feel the same way about NLP as described by Stephen. Sure there will be benefits – but they appear small and the potential costs do not justify it much. It the goal is democratization of data analysis and visualization (which I wholeheartedly support)then I’d rather see more meaningful avenues for research being pursued. Surely these people’s talents can be put to better uses (for example, text visualization seems much more important to me than NLP to translate visuals into words).
Roy,
You perspective is incredibly biased if you actually think that some of the comments posted here are similar to the programmer who told you he’d never use BI tools because he could create the reports much faster using assembly language. If NLP could actually contribute to the “democratization of data,” I’d support it wholeheartedly. Contrary to your statement that NLP, as applied to data visualization, is disruptive, I can’t see that it is anything but a distraction. I explained why I find NLP of little use in relation to data visualization. If you disagree, plese contribute to this discussion by backing your case with evidence and reason.
The best way to democratize data is to support people in developing the skills of data sensemaking. Anything short of this will continue the current waste of time and money that plagues our so-called information age.