Visual Business Intelligence – Information Visualization Research Projects that Would Benefit Practitioners

Information Visualization Research Projects that Would Benefit Practitioners

In a previous blog post titled “Potential Information Visualization Research Projects,” I announced that I would prepare a list of potential research projects that would address actual problems and needs that are faced by data visualization practitioners. So far I’ve prepared an initial 33-project list to seed an ongoing effort, which I’ll do my best to maintain as new ideas emerge and old ideas are actually addressed by researchers. These projects do not appear in any particular order. My intention is to help practitioners by making researchers aware of ways that they can address real needs. I will keep a regularly updated list of project ideas as a PDF document, but I’ve briefly described the initial list below. The list is currently divided into three sections: 1) Effectiveness and Efficiency Tests, 2) New Solution Designs and Tests, and 3) Taxonomies and Guidelines.

Some of the projects that appear in the Effectiveness and Efficiency Tests section have been the subject matter of past projects. For example, several projects in the past have tested the effectiveness of pie charts versus bar graphs for displaying parts of a whole. In these cases I feel that the research isn’t complete. Apparently, some people feel that the jury is still out on the matter of pie charts versus bar graphs, so it would be useful for new research to more thoroughly establish, more comprehensibly address, or perhaps challenge existing knowledge.

Please feel free to respond to this blog post or to me directly at any time with suggestions for additional research projects or with information about any projects on this list that are actually in process or already completed.

Effectiveness and Efficiency Tests

Determine the effects of non-square aspect ratios on the perception of correlation in scatterplots.
Determine the effectiveness of bar graphs compared to dot plots when the quantitative scale starts at zero.
Determine the relative speed and effectiveness of interpreting data when presented in typical dashboard gauges versus bullet graphs (one of my inventions).
Determine the effectiveness of wrapped graphs (one of my inventions) compared to treemaps when the number of values does not exceed what a wrapped graphs display can handle.
Determine the effectiveness of bricks (one of my inventions) as an alternative to bubbles in a geo-spatial display.
Determine the effectiveness of bandlines (one of my inventions) as a way of rapidly seeing magnitude differences among a series of sparklines that do not share a common quantitative scale.
Determine if donut charts are ever the most effective way to display any data for any purpose.
Determine if pie charts are ever the most effective way to display any data for any purpose.
Determine if radar charts are ever the most effective way to display any data for any purpose.
Determine if mosaic charts are ever the most effective way to display any data for any purpose.
Determine if packed bubble charts are ever the most effective way to display any data for any purpose.
Determine if dual-scaled graphs are ever the most effective way to display any data for any purpose.
Determine if graphs with 3-D effects (e.g., 3-D bars) are ever the most effective way to display any data for any purpose.
Determine which is more effective: displaying deviations in relation to zero or 100%. For example, if you wish to display the degree to which actual expenses varied in relation to the expense budget, would it work best to represent variances as positive or negative percentages above or below zero or as percentages less than or greater than 100%.
Determine the effectiveness of various designs for Sankey diagrams in an effort to recommend design guidelines.
Determine the best uses of various network diagram layouts (centralized burst, arc diagrams, radial convergence, etc.).
Determine the effectiveness of word clouds versus horizontal bar graphs (or wrapped graphs).
Determine which shapes are most perceptible and distinguishable for data points in scatterplots.
Determine the effectiveness of large data visualization walls versus smaller, individual workstations.
Determine if the effectiveness of displaying time horizontally from left to right depends on one’s written language or is more fundamentally built into the human brain.
Determine if the typical screen scanning pattern beginning at the upper left depends on one’s written language or is more fundamentally built into the human brain.
Determine the relative speed and effectiveness of interpreting particular patterns in data when displayed as numbers in tables or visually in graphs. For example, compare a table that displays 12 monthly values per row versus a line graph that displays the same values (i.e., twelve monthly values per line) to see how quickly and effectively people can interpret various patterns such as trending upwards, trending downwards, particular cyclical patterns, etc. We know that it is extremely difficult to perceive patterns in tables of numbers, but it would be useful to actually quantify this performance.
Determine the relative speed of finding outliers in tables of numbers versus graphs.
Determine the relative benefits of using a familiar form of display versus one that requires a few seconds of instruction. The argument is sometimes made that a graph must be instantly intuitive because making people learn how to read an unfamiliar form of display is too costly in time and cognitive effort. For example, population pyramids provide a familiar way for people who routinely compare the age distributions of males versus females in a group, yet a frequency polygon, although unfamiliar, might provide a way to see how the distributions differ much more quickly and easily. In cases when people can be taught to read an unfamiliar forms of display with little effort, does it make sense to do so rather than continuing to use a form of display that works less effectively.

New Solution Designs and Tests

Develop an effective way to show proportional highlighting, as it pertains in brushing and linking, for portions of the following graphical objects: bars, lines, and boxplots. Various ways to show proportional highlighting have been applied to bar graphs, but not to line graphs and box plots.
Develop a way to automatically attach data labels to the ends of lines in a line graph without overlapping.
Develop a way to temporarily overlay or replace box plots with frequency polygons.
Develop a way to automatically detect the amount of lag between two time series and then align the leading events with the lagging events in a line graph.
Develop potential uses of blindsight to direct a person’s attention to particular sections of a display as needed (e.g., to something on a dashboard that needs attention).
Develop a effective design for waterfall graphs when multiple transactions occur in the same interval of time and some are positive and some are negative.
Develop an algorithm for automatically distributing several sets of time series values uniformly across a 100% scale when they have different starting points, ending points, and durations. For example, this would make it easy to compare the person hours associated with various projects across their lifespans, even when they differ in starting dates, ending dates, and durations.
Develop a full set of interface mechanisms for making formatting changes to charts (turning grid lines on and off, changing the colors of objects, repositioning and orienting objects such as legends, changing the quantitative scale along an axis, etc.) that involves direct access to those objects rather than one that requires the user to wade through lists of formatting commands located elsewhere (e.g., in dialog boxes).

Taxonomies and Guidelines

Develop a useful taxonomy or set of guidelines to help people think about the differences in how data visualizations should be designed to support data sensemaking (i.e., data exploration and analysis) versus data communication (i.e., presentation).

Take care,

Signature

Tuesday, January 19th, 2016 at 4:51 pm

39 Comments on “Information Visualization Research Projects that Would Benefit Practitioners”

By Jeffrey Shaffer. January 20th, 2016 at 7:58 am

Steve,

This is a really excellent list. Thank you for compiling.

Under “New Solutions”, we might consider alternative ways to visualize a categorical comparison or part-to-whole relationships on a map. I see this come up over and over. My current thinking is that this is probably best solved with a visualization in the tooltip of a map point, but I would love to see other ideas that people might have, along the lines of bricks or other similar tools.

As for effectiveness and efficiency, when people study these things you’ve listed, it will be imperative that the researchers isolate the right use case. For example, using donut charts, they would need to look at the effectiveness of a bounded performance metric (100%) where the reader isn’t making a comparison vs. comparing categories within the donut.

Jeff

By Stephen Few. January 20th, 2016 at 10:58 am

Thanks Jeff. I’ll add your suggested project to the list. I absolutely agree with you about use cases. Many infovis research projects currently fail because they don’t understand real use cases and consequently test things that aren’t relevant.

By Chris Love. January 21st, 2016 at 5:58 am

Stephen

re “Develop a way to temporarily overlay or replace box plots with frequency polygons” – to illustrate the concept I have built the below in Tableau Public – I assume this is along the lines you are thinking?

https://public.tableau.com/views/BoxPlotvsFrequencyPolygon/Dashboard?:embed=y&:display_count=yes&:showTabs=y

Chris

By Stephen Few. January 21st, 2016 at 9:55 am

Chris,

What I have in mind is a way to shift between the box plot and frequency polygon in a manner that is less disorienting and can be applied to a single box in a box plot without affecting the others. The purpose is to enhance our ability to see the shape of a particular distribution in greater detail than a box can provide without changing the entire graph. For example, imagine hovering over a single box in a box plot and having the box become visually subdued while a line is superimposed on top of it.

By Naveen Michaud-Agrawal. January 21st, 2016 at 7:24 pm

Stephen,

What are your opinions on violin plots (https://en.wikipedia.org/wiki/Violin_plot)? I would think converting a single box to a violin might be less visually jarring than using an asymmetric frequency polygon (although users are likely to be less familiar with them, so their use would be well informed by project 24 in the “Effectiveness and Efficiency Tests” list). I particularly like the variants that combine both box and violins (here is an example in D3 – http://bl.ocks.org/z-m-k/5014368)

Naveen

By Daniel Zvinca. January 22nd, 2016 at 9:12 am

Naveen,

Violin plot is widely used in Datawatch (former Panopticon) as a replacement of stripplots for selection/filter controls. They do a great job for large amount of items. However, for general audience kernel density (on which violin design is based) seems to be quite difficult to understand. Otherwise, kernel density is a great alternative to frequency polygon and histograms, yet it is usually found only on specialized packages. It seems that the amount of bins as extra parameter required for histograms/frequency polygons is much easier to understand than the smoothing parameter of kernel density.

By Stephen Few. January 22nd, 2016 at 10:04 am

While it’s true that switching to a violin plot would be less perceptually jarring, it would be less effective. I’ve never found a case when a violin plot is the most effective form of display compared to the alternatives. What you’re calling the asymmetry of the frequency polygon is its advantage. Running the frequency values in one direction only makes the shape of the distribution easier to perceive and understand. Running the frequencies in two directions (either left and right or up and down) as the violin plot does makes the shape harder to perceive. Whereas a frequency polygon uses 2-D position to encode the frequency values, which we perceive quantitatively most easily and accurately, a violin plot uses width, which we perceive quantitatively less well.

Thanks for mentioning the violin plot. I’ll add another potential research project to the list to determine if violin plots are ever the most effective form of display.

By Naveen Michaud-Agrawal. January 22nd, 2016 at 11:55 am

Thanks Stephen, that’s a really good point. I have always (incorrectly) considered the violin plot to be a frequency polygon that is mirrored and then filled in, so I would ignore one half (I just liked it for the symmetry), but if it is really using the full width for comparison than it would be difficult to perceive since each value is not even aligned on a common baseline.

By Daniel Zvinca. January 22nd, 2016 at 12:24 pm

The above mentioned violin (related to kernel density) reminds me of another possible but, perhaps, a more general subject: when is appropriate to use smoothing techniques for line or area graphs. Not to be confused with regression techniques used to aproximate a set of values with analytical curves (linear, logarithmic, exponential, polynomial).

By Naveen Michaud-Agrawal. January 22nd, 2016 at 12:37 pm

Also, this is a great list! I would particularly like to see more research around the interaction side of visualization (project 8 in “New Solution Designs and Tests”) as it seems the current state of the art is linked brushing (which I believe is now over 20 years old).

By Stephen Few. January 22nd, 2016 at 1:13 pm

I suspect that various products render the violin plot in different ways. Some probably mirror the frequency values and some use widths to represent them. Even if they mirror the values, however, this creates a perceptual problem because we automatically perceive the widths, not the 2-D positions of one half of the plot.

I agree that more research that is focused on interactive techniques would be useful. Do you have any particular techniques in mind?

By Stephen Few. January 22nd, 2016 at 1:16 pm

Daniel,

By “smoothing techniques,” are you talking about connecting values along a line or area as curves rather than straight lines?

By Naveen Michaud-Agrawal. January 22nd, 2016 at 1:36 pm

I think the only appropriate smoothing technique is perceptually related. IE if you have a million point timeseries which will be drawin in a 1000 pixel wide graph, then subset the data such that each pixel covers a certain range of the data and then calculate the first, min, max, and last points within that bin and draw that set of points. The first and last values are required due to aliasing that occurs at the pixel level (see http://www.vldb.org/pvldb/vol7/p797-jugel.pdf for more details).

By Daniel Zvinca. January 22nd, 2016 at 2:00 pm

No, Stephen. Connecting points using spline curves instead of lines are not among techniques I would consider. I am thinking of techniques that mainly reduce the peaks, resulting curve going between points, not connecting them. This has nothing to do with analytic regression curves (trend curves).

By Stephen Few. January 22nd, 2016 at 2:05 pm

In that case, I don’t know what you’re describing. Does the smoothing technique that you’re describing have a name?

By Daniel Zvinca. January 22nd, 2016 at 2:22 pm

Thank you, Naveen, for the link. I have to read it, my first impression is that is not related to the matter I am talking about. Smoothing data can be applied on smaller amount of points as well, its main purpose is to “adjust” out of ordinary values, considering them possible “noises”. This can help detecting general patterns of change, instead of focusing on step by step real variation.

By Daniel Zvinca. January 22nd, 2016 at 2:32 pm

Stephen, they use smoothing algorithms for noise reduction with different names. Spotfire, for instance, has an example of LOWESS smoothing analysis, but they are many others.

By Stephen Few. January 22nd, 2016 at 3:10 pm

I understand that there are many smoothing methods. Lowess (locally weighted regression, a.k.a., Loess) is but one of many examples of fit models. I was confused by your previous statement that smoothing models should not be confused with regression techniques and fit models such as linear, logarithmic, exponential, polynomial, etc. In fact, Lowess is an example of a regression technique; just another fit model.

I’m trying to clarify what you’re proposing as a potential research project. Are you asking that the Lowess smoothing method in particular be tested for effectiveness? If so, I believe that the effectiveness of Lowess has been well established by William Cleveland.

By Andrew Craft. January 22nd, 2016 at 3:51 pm

Smoothing seems to be an ambiguous term, used across multiple areas in data analysis. Daniel, is this more like what you’re talking about?

https://en.wikipedia.org/wiki/Smoothing

By Stephen Few. January 22nd, 2016 at 4:07 pm

Andrew,

The terms “smoothing” and “fitting” are used interchangeably in statistics. Essentially, they mean the same thing. Lowess smoothing is just one of many ways of summarizing the overall pattern in the data, along with linear, logarithmic, exponential, and polynomial fit models. William Cleveland created Lowess smoothing as a correlation fit model that is more resistant to outliers than most other models.

By Andrew Craft. January 22nd, 2016 at 4:30 pm

I meant that the interchangeability of terms, as well as multiple meanings per term, may be the reason for confusion here.

Never mind Lowess – the “moving average” is a common example of the type of smoothing I suspect Daniel was asking about (Daniel, please confirm?). If so, I’d be interested in your thoughts.

My first thought would be that while moving average seems to smooth out noisy data, there are better solutions. Control charts, for example, seem to do a better job, especially the types that I’ve seen over at Stacey Barr’s site.

By Stephen Few. January 22nd, 2016 at 4:49 pm

Andrew and Daniel,

The loose way in which we use these terms definitely contributes to the confusion. A moving average is yet another example of what we interchangeable refer to a trend, smooth, or fit. It is an expression of routine (typical) pattern in the data.

By Daniel Zvinca. January 22nd, 2016 at 11:30 pm

Your Wikipedia link summarizes correct my interpretation for “smoothness techniques”: “Smoothing may be distinguished from the related and partially overlapping concept of curve fitting in the following ways…”

The way I see it, “fitting” and “smoothing” are indeed both approximations of data, yet different. Fitting curves parameters of analytical functions are calculated using a regression algorithm involving all the points of the sample, while smoothing techniques are used to remove the “noise” of signal having as result anything but an analytical function. It is also correct that smoothing techniques can use regression for segments of data, lowess being one of the methods.

A potential study I am thinking about is not related to the effectiveness of certain smoothing techniques (like Cleveland – lowess), but more related to the effectiveness of using approximations in interpreting data. Kernell density estimation being used as smoothing technique to create above mentioned violin plot, I was thinking why not to have a more general subject dedicated to smoothness of data: when and how this technique can be used to improve data interpretation. For time related line charts, for instance, smoothing data might reveal certain patterns of change not easily observed on higher density raw data. Similar benefits I can see for sparklines as well. The smoothing method itself is less relevant, but the need of using one might worth a separate study.

Is it any benefit in designing “curvy” line graphs? Or better stay with classic line charts?

By Stephen Few. January 23rd, 2016 at 11:02 am

I am still not aware of a clear distinction between fitting and smoothing. The wikipedia article, like many wikipedia definitions, is not clear. The distinctions between curve fitting and smoothing that it identifies are not accurate in that they apply to both terms. Regarding the additional distinction that you made above–“fitting curve parameters of analytical functions are calculated using a regression algorithm involving all points of the sample”–Lowess is a regression technique that considers all of the points when creating the curve, so the distinction still isn’t clear. In my experience, people use the terms interchangeably. My intention here isn’t to quibble over definitions but to clearly understand the research project that you are proposing so I can describe it. To do that, I must first understand what you mean by the term smoothing.

As I understand it, you are proposing a study that would attempt to determine when and for what purposes it is useful to represent a data set, not as the actual values, but as a curve that attempts to represent what is routine. If you’re proposing something different or more specific than this, I don’t clearly understand what that is. If you were writing a paragraph to describe the purpose of the proposed study, what exactly would you say?

By Daniel Zvinca. January 24th, 2016 at 12:25 am

Data smoothing is a technique of replacing each data point with a sort of average of surrounding points. The result is not a known analytical functions, but a collection of connected smooth curved segments.

Data fitting is a technique of summarizing data by fitting it to an analytical model (linear, polynomial, logarithmic, exponential, Gaussians). The process of finding the analytic parameters is usually based on regression methods.

The “smoothing” confusion might come from the fact that the results of both techniques have as results smooth aspects. Yet, the purpose, the usage and the algorithms used are different.

As a reference I propose the online book Numerical Recipes in C, Cambridge University Press (currently there are newer editions, but not online). Ignoring the inserted C code (newer and more efficient algorithms were designed since then), it proved a valuable reading for me.
In Chapter 14 – “Statistical description of data” one of the known related to time series smoothing technique (Savitzky-Golay, mentioned also in Wikipedia) was exhaustively described.
In Chapter 15 – “Modeling of data” you can find a more clear description of curve fitting, Least Squares as a Maximum Likelihood Estimator and Fitting data to a Straight Line or to any provided analytical function.

One extra mention about currently spread method used by chart designers of connecting raw data points using spline curves (instead of lines). This method does not model or change the original data. In my opinion it does nothing good to the chart interpretation, altering the aspect of data variation.

By Daniel Zvinca. January 24th, 2016 at 1:17 am

The study I am thinking about is related to the benefits (if any) of displaying approximation of data instead of raw data for time series, where the approximation is a result of an existing smoothing algorithm. This study has to be conducted from a visual perspective, rather than a comparison between efficiency of different smoothing algorithms.

This might be different from what you call routine, which I guess, is more related to data modeling through curve fitting.

It might be a thin line between curve fitting and data smoothing, I personally see them as different techniques used for different purposes. It was not at all my intention to start a debate regarding this subject, but I don’t think I am able to be more clear regarding this matter.

Thank you, Stephen, for your time and effort spent for clarifying this subject. I can only hope the idea worths enough to conclude into a research proposal.

By Stephen Few. January 24th, 2016 at 10:11 am

Whether there is an agreed upon distinction between curve fitting and smoothing is still not clear. For example, Loess, which is usually called a smoothing method, is used as a fit model when other when greater resistance to outliers is required. Whether a technical distinction exists in the algorithms that produce smoothing versus curve fitting probably doesn’t matter if they serve a common purpose. You stated that they have different purposes and uses, but didn’t identify them. Let’s focus on this. Please describe the purpose and use of smoothing versus curve fitting. I am aware of no distinction. Whether the method is called smoothing or curve fitting, the purpose is to summarize the essential pattern in the data. We do this to distinguish what is routine from what is not routine (e.g., the result of randomness or special causes). This helps us spot potential signals and to predict future behavior.

By Daniel Zvinca. January 25th, 2016 at 1:05 am

The reason we use curve fitting for data analysis is to see how far or close certain set of data is to a known model. Let’s consider that data we analyze is a time series.
A common approach is to model the values after known analytic functions like linear (which would reveal constant variation), exponential (accelerated variation), logarithmic (decelerated variation). Curve fitting approach does not always implies good data approximation. Yet we can come to certain conclusion even if data does not fit good enough to the proposed model. Shaping data after an existing model involves all the points of the sample, any change of one of the raw values will have as result a change in the resulting parameters of the analytical curve, so it will influence the shape of the entire curve. The usual method used for curve fitting being regression, in many cases the result of a curve fitting operation is called regression curve. Those curves are usually related to predictions and forecast. For time related charts they are most known as Trend Curves.

The reason we use smoothing techniques for raw data is to remove the noise and simplify the aspect of the data variation for further interpretation. In this case we are always talking about approximation, each replacement point being kind of an average of surrounding points. If we talk about time series, is like polishing the roughness of a line chart to a certain level. The resulting set of points are not a known analytical curve, but a group of connected smooth segments. We do not seek for similarities with existing models, we just use a possible smooth approximation for a clean interpretation of pattern of change, without being distracted by continuous or occasional noise. Any change of a raw value will have a local influence to the resulting smooth shape, the rest of the curved segments remaining unchanged.

Obviously both methods are eventually used to improve data interpretation, yet, they differ enough to belong to different chapters of data analysis. Data modeling is trying to fit existing data as good as possible to an existing analytical curve with already know properties and behavior. It may or it not may approximate enough to the model, yet conclusions can be drawn. On the other hand, data smoothing does not seek for any known shape, it approximates and simplifies the aspect of data variation removing distracting elements. For this last case I suggested that a research from a visual perspective might be useful. How a smooth aspect of an approximate curve improves data interpretation against raw data representation. This, in my opinion, should not mix with the curves obtained form data modeling, for those curves the interpretation being in most of case well known.

By Nate. January 25th, 2016 at 1:20 pm

Daniel…. Smoothing data is very dangerous if the resulting smoothed data is compared to another set of smoothed data – stronger correlations will always be found (by definition – straight lines correlate better to each other than jagged ones!). And also when the smoothed data is carried forward for further analysis – the uncertainty introduced by smoothing is almost always never also carried forward.

By Daniel Zvinca. January 25th, 2016 at 1:49 pm

Nate,

Any data “adjustment” is a potential danger. It has to be used with care. Using smoothed data with further transformations can introduce unaccepted distorsions. This is why it would be nice to find out from a research if any good can come from using such of technique to display and then visually analyse data. They are many methods used in statistics for data smoothing and they are there with a purpose. But I am not aware of any paper that addresses this matter from visual analytics perspective.

By Stephen Few. January 25th, 2016 at 7:13 pm

I appreciate your efforts to explain the ways in which fit models are different in purpose and use from smoothing techniques. Despite your diligent efforts, however, I believe that a great deal of confusion still exists. I think that many people use the terms fit model (a.k.a., curve fitting) and smoothing interchangeably. I also think that these terms are sometimes used to describe patterns that belong to fundamentally different categories. For example, in my opinion we should distinguish between fit models or smoothing methods that describe patterns of correlation from those that describe patterns of change through time, even though in both cases they attempt to distinguish the essential pattern (i.e, what’s routine) from the actual values so we can see how they differ.

Given the fact that this particular blog post was written to invite suggestions for potential data visualization research projects, it is not the best place to pursue this discussion any further, but I would like to continue this discussion elsewhere. For this purpose, I’ve opened a new topic in my discussion forum titled “Fit Models versus Smoothing Techniques.”

By Bella Gotie. January 30th, 2016 at 6:00 am

20. https://www.psychologytoday.com/blog/culture-conscious/201204/why-left-is-less-and-right-is-more-sometimes

By Stephen Few. January 30th, 2016 at 11:22 am

Bella,

Thanks for bringing this article to my attention. Based on the research studies that it mentions, it appears that our tendency to view quantitative increase from left to right is only true for people whose language is written from left to right. I’ll cross the study that I proposed about this off of the list.

By Brad Daniels. February 2nd, 2016 at 2:49 pm

Stephen –
I am particularly interested in the last subject here – namely, the difference between presenting data for analysis (sense-making) vs. pure presentation (telling a story). I have been in the Business Intelligence space for quite a while now, and much of the talk is a around what is termed ‘dashboards’ in most places. I have often replied that these are not true dashboards, but rather, analytical applications – as they are designed for analysis. This analysis is often very ‘guided’, as we spend a great deal of time with our users/businesses to understand what types of analysis are most common and useful. We build quite a lot of flexibility into our tools to that users may explore the data (somewhat), without getting lost in so many ‘bells and whistles’ (aka dropdowns, buttons, widgets) that our tools seem overwhelming and difficult to use.
What we are often challenged with, however, is “why can;t you give me a tool that can produce some thing ‘like THIS’. The THIS is often a presentation – the END RESULT of analysis that is attempting to tells a particular story or highlight a particular outcome of that analysis.
I feel it is extremely difficult for a single tool to provide both the analytical capability as well as the ability to be used to succinctly tell a story (or be used for a high level presentation).
As I see it, these are two very distinct purposes and goals, and trying to achieve both thru one tool is challenging, if not impossible.
I also think these two concepts or use cases have very different audiences. Typically, leaders and executives want nothing to do with the actual analysis of the data. They just want the end story. Thus, they want the finished dashboard/infographic. The challenge is, as a ‘developer’ I cannot predict what story needs to be told – that is based on the data. What I can provide are tools to do that analysis, and determine what story needs to be told. There may be other tools (some of which we can provide, some of which are already in the users’ hands) which are best for the actual story-telling.
Drawing this distinction is often difficult for many to understand, and people search for the silver – bullet. One tool/application that can do all things for all people. I don;t think it exists, nor do I think it ever will. Any vendor who attempts to serve both parties, I fear, will end up muddying their tool to the point where is does neither easily or well.

Your thoughts?

By Stephen Few. February 2nd, 2016 at 3:35 pm

Hi Brad,

I agree that a distinction between data sensemaking and data presentation must be made because they involve different tasks and, to some extent, differently designed visualizations. It would be difficult–perhaps impossible–for a single tool to support both of these purposes well. You can see this in some of the awkward and complicated interfaces that have resulted from attempts by software vendors to do this. Other distinctions must also be made within the realm of data presentation alone. For example, a tool that supports the creation of standard lookup reports must work differently in some ways from a tool that supports the creation of infographics, or one that supports the creation of dashboards (i.e., rapid monitoring displays). It would be great if a single tool could support the full range of data sensemaking and presentation capabilities that are needed, but I fear that such a tool would inevitably be difficult to use. It is more realistic to create a suite of tools that share a great deal in common, but also provide the unique sets of features and functions that are needed for each particular purpose.

By Malcolm Rees. February 15th, 2016 at 6:34 pm

I would be interesting in seeing “ideal examples” of different data visualisations specifically for survey data. Especially those data visualisations that tell a specific story.

I would also like to see some research into data visualisation of qualitative data. This might align with item 17 word clouds versus horizontal bar charts perhaps.

By Stephen Few. February 16th, 2016 at 5:21 pm

Malcolm,

Regarding examples of survey data, I suggest that you post a specific question in my discussion forum where people can respond by posting examples. I also suggest that you ask your question in a more specific way.

Regarding visualizations of word clouds versus bar graphs for displaying the number of times specific words or phrases occur, do you really think that research is needed? It is clear that word clouds do a poor job of displaying quantitative information. They are eye-catching, but useless for doing anything beyond making a few words or phrases stand out because they are much larger than the others. To understand how much more frequently they occurred than other words or phrases or to compare all of the words and phrases with ease, a horizontal bar graph performs much more effectively.

By Matt. April 9th, 2017 at 1:08 pm

Related to points 15 and 16 of the “Effectiveness and Efficiency Tests”, I would be interested in knowing what the most effective option(s) are for conveying any kind of process which involves multiple, conditional steps; in other words, what is/are the most effective ways of flow-charting for any kind of workflow type of process. There is the traditional Visio-style type of diagram, but…is this the best way to convey such processes? if yes, what are best practices within this sphere? …are there other, more effective ways than a flowchart? etc.

By Stephen Few. April 9th, 2017 at 3:00 pm

Hi Matt,

I suspect that it would certainly be worthwhile to test the effectiveness of various flowcharting methods and designs, but this falls outside the scope of data visualization research as it’s usually defined. Data visualization (also, information visualization), as I use the term, refers to the visual display of information that includes a quantitative component. For example, both Sankey Diagrams and Network Diagrams, which you refered to (i.e., items 15 and 16 in my list of potential projects), typically include quantitative information (e.g., amount of things that moved from one category to another in a Sankey Diagram or number of connections between entities in a Network Diagram) in addition to the relationships that are displayed.

Although it is certainly true that flowcharts visualize information, they fall somewhat outside of the scope of visualizations that are typically defined as data visualizations. Unfortunately, I’m not at all familiar with research that has been done regarding flowchart diagrams.

Information Visualization Research Projects that Would Benefit Practitioners

39 Comments on “Information Visualization Research Projects that Would Benefit Practitioners”

Leave a Reply

Archives