Lollipop Charts: “Who Loves You, Baby?â€
If you were around in the ‘70s, you probably remember the hard-edged, bald-headed TV police detective named Kojak. He had a signature phrase—“Who loves you, baby?â€â€”and a signature behavior—sucking a lollipop. The juxtaposition of a tough police detective engaged in the childish act of sucking a lollipop was entertaining. The same could be said of lollipop charts, but data visualization isn’t a joke (or shouldn’t be). “Lollipop Chart†is just cute name for a malformed bar graph.
Bar graphs encode quantitative values in two ways: the length of the bar and the position of its end. So-called lollipop charts encode values in the same two ways: the length of the line, which functions as a thin bar, and the position of its bulbous end.

A lollipop chart is malformed in that it’s length has been rendered harder to see by making it thin, and its end has been rendered imprecise and inaccurate, by making it large and round. The center of the circle at the end of the lollipop marks the value, but the location of the center is difficult to judge, making it imprecise compared to the straight edge of a bar, and half of the circle extends beyond the value that it represents, making it inaccurate.
What inspired this less effective version of a bar graph? I suspect that it’s the same thing that has inspired so many silly graphs: a desire for cuteness and novelty. Both of these qualities wear off quickly, however, and you’re just left with a poorly designed graph.
You might feel that this is “much ado about nothing.†After all, you might argue, lollipop charts are not nearly as bad as other dessert or candy charts, such as pies and donuts. This is true, but when did it become our objective to create new charts that aren’t all that bad, rather than those that do the best job possible? Have we run out of potentially new ways to visualize data effectively? Not at all. Data visualization is still a fledgling collection of visual representations, methods, practices, and technologies. Let’s focus our creativity and passion on developing new approaches that work as effectively as possible and stop wasting our time striving for good enough.
Take care,
56 Comments on “Lollipop Charts: “Who Loves You, Baby?—
I believe that lollipops have their uses. Bar graphs with more than, say, 8 or 9 bars, often look busy and clunky. Lollipop graphs may solve that problem by increasing the amount of white space in between bars significantly.
I agree that marking with the center of the circle makes lollipop graphs a bit more imprecise than a regular bar graph, but there may be an easy solution: Reduce the size of the circles (they are a bit too big for my taste in the example you show) or even mark the value with the upper point of the circle instead.
You may argue that an alternative in a case like this would be just a bar graph with thinner bars and more white space. I’d be fine with that, too, but I think that the little circle on top makes it easier to spot the point value.
Ooops, I meant to write “I agree that marking THE VALUE with the center of…”
Alberto,
I don’t agree that bar graphs with more than eight or nine bars look busy or clunky. They certainly can exhibit these problems if they’re poorly designed (overly bright colors, overly wide bars, insufficient white space between the bars, etc.), but that’s a failure of design, not of bar graphs. A properly designed bar does not need a bulbous end to be clearly seen. I have created bar graphs with up to 100 bars, and wrapped bar graphs with up to 500 bars, that look good and work well.
Alberto, I wonder if you can provide an example of a bar graph that was improved by redesigning it as a lollipop? I am open to the idea that they could be useful, but I don’t see it. Certainly, Steve’s example above wouldn’t look clunky with the addition of another couple of bars, but maybe a larger number of data points would be sufficient to demonstrate the utility of the lollipop?
Alberto,
If you’d like to provide an example, as David suggested, we can continue this discussion in my Discussion Forum. Unfortunately, by blog software does not allow images to be included with comments.
I’ve never seen a lollipop chart used, and I have to say my brain just parsed the one in this example as a dot plot – a perfectly valid visualization for ranking, nominal comparison, and partial time-series visualization (according to Show Me the Numbers). To me, the lines serve no purpose except to help align the dot with the categorical value – they could have extended the length of the chart, but that would actually make it busier and more inky (the Graph Selection Matrix derived from Show Me the Numbers actually uses full-length line for this).
I certainly agree that a lollipop graph does a poor job of replacing a bar chart, but as long as you use it as you would a dot plot and treat the lines as grid lines rather than value-encoding lines, it seems harmless enough and perhaps even slightly easier to read than a dot plot in cases where the chart is fairly large and it is hard to align a dot with its category axis because of distance.
I am somewhat curious – if the lines encoded a secondary value instead of repeating the value encoded by the location of the dot – e.g. movement between last year and this year – would this potentially be helpful? Obviously, it would cease to be a lollipop at that point, but I am somewhat curious if the result would have merit as a chart, as this sort of requirement(usually fulfilled by a grouped bar chart, which isn’t ideal) could certainly use a good solution.
I find lollipops to be a weaker method of encoding for a couple of reasons. The chief amount them is that by using a line instead of a bar we’ve de-emphasized the length comparison component; which I find to be more effective than comparing 2-d position alone. A bar graph features both more or less equally while the lollipop emphasizes the 2-d position. Perhaps, there are situations in which this might be desirable, but none come to mind.
As to Alberto’s proposition, while made in good faith, I’m doubtful such an example will yield a more effective visualization.
Dot plots are great for comparing relative values in a tight range far from zero, where comparing lengths of bars is not easy. In other words, dots are an alternative when bars just won’t work.
I’m not sure why anyone would use dots when bars DO work though. And I especially don’t see how mixing two chart types would be incredibly useful.
Or,
The line that extends from the axis to the dot functions as an information-carrying component of the lollipop chart: it encodes the value as length. This transforms what would otherwise function as a dot plot into a bar graph. Grid lines that extend across the entire plot area are frequently included in dot plots to link the label to the dot. Properly rendered as thin and light, grid lines do no add harmful clutter to dot plots. In response to your question, adding a line to a dot plot to encode the difference of a value to another value, such as a past value, works fine. When this is done, usually a dot appears at both ends of the line: one for the current value and one for the other value (e.g., the past value). I’ve designed dot plots in this manner to display change between two points in time on many occasions.
Hi there,
I think we could argue forever, but let me emphasize 2 interesting properties of Lollypop graphs:
– They follow Tufte rule #3 ” erase non-data-ink”
– “Double side lollypops” are more efficient at showing differences (for example between female and male) than two bars (with two different colours) glued side by side.
So they may have a role in modern visualisation even if the above critics make sense too. Creativity comes from disagreements sometimes..
Best,
Christophe,
You are incorrect in saying that lollipop charts follow Tufte’s suggestion to erase non-data ink. Bars are data. Reducing a bar is not a reduction of non-data ink but of data ink. Tufte also advocated the reduction of data ink, but his specific recommendations (e.g., reducing a bar to a thin line) have been discredited as excessive minimalism. Data ink should be easily seen, not minimized in visual weight.
I don’t know what you mean by a “double sided lollypop.” If you’re referring to a graph that displays two dots per item along the axis, with a line connecting them to represent the difference between them, this is not a lollipop chart. Rather, it is a version of a dot plot that I’ve been using for many years.
I think when we do blanket statements like this we are poorly using the principle that we have learned in data visualization. Lately I am somewhat appalled that we talk about a data visualization without first asking the purpose or requirements of the visualization. Although I appreciate that data visualizations should be as accurate as possible the reality is that data visualizations are in part a “representation of reality”. Nowadays, I don’t think most visuals are really tools to extract the actual data. Years ago charts were really important in engineering/economics and other fields because they were used to extract data that was used in analyses/designs and others. People needed super computers if they needed to crunch the numbers so graphs/figures were tools to deliver the actual information. Nowadays most likely you can get the numbers directly by either connecting to the data source or solving the equations.
My point with this is that I somewhat agree with Alberto and also see this not much different than a dot plot. If I was restricted with space in a document (e.g., page break) and had too many categories I will give lollipop a chance. The one here needs a little work but in this case unless the point I was trying to make was really hindered with the idea that I cannot tell if Spain value is 0 or 3 then lollipop could be a better option.
rjss,
Your assumption that lollipop charts might be useful when space is restricted isn’t born out by the facts. A bar graph requires no more space than a lollipop chart.
On what are you basing your opinion that the lollipop chart that I’ve shown above would work better than the bar graph if the value for Spain were more clear? Every dot in the lollipop chart suffers from the same problem of imprecision that the dot for Spain exhibits.
As responsible practitioners of data visualization, we need to make judgments about the relative merits of charts. We should not embrace charts that work poorly. If there is any situation in which a lollipop chart would work better than a bar graph, it hasn’t been identified so far in this discussion. Until we can identify a good use of lollipop charts, we should reject them as ineffective.
I welcome new methods of visually representing data so long as the new methods are equally or (preferably) more effective than existing methods. Sadly, many new methods that are introduced do not hold up when thoroughly examined. I commend the efforts of those that continue to look for new ways visualize information, but a healthy dose of restraint when it comes to adoption would do most of us a great deal of good.
Until such a scenario is provided that demonstrates the superiority of a lollipop graph, I agree with Stephen that we should refrain from employing them in our work.
I’ve seen the error of my ways – going to turn myself in to the authorities for this:
Fun With SAS ODS Graphics – All the Presidents’ Heights
https://communities.sas.com/t5/SAS-GRAPH-and-ODS-Graphics/Fun-With-SAS-ODS-Graphics-All-the-Presidents-Heights/td-p/354854
Seeking a reduced sentence, since I at least cropped the photos and tried to align the tops of the heads with the ends of the bars (OK, I knew I was doing something wrong!). Plus, I pleaded guilty to creating a cheesy chart, sparing the need for a trial.
Not affiliated with SAS, so don’t blame their safe-when-used-as-intended software for my DataViz crimes. :-)
Stephen. I was probably not clear. I am not saying that the uncertainty of presenting the data is different for those categories. I am just saying that if it was important to differentiate that the value is not zero then I have an issue with that. I can see easily cases were à need simple annotation within the chart and the bar chart will make things not easy. I have more a problem with the names. It seems some of us even see it as a dot plot that need a little bit of work. I feel saying to not use a lollipop is like saying that a barchart is always the right answer.
rjss,
If you’re concerned about the value for Spain, you should be equally concerned about all of the values, because they all exhibit the same degree of imprecision. I don’t know what you’re saying about needing “simple annotation within the chart.” Annotations can be applied to bars as easily as to lollipops.
Saying that I am not aware of any occasion on which the lollipop chart is as effective as a bar graph is not the same as saying that “a bar chart is always the right answer.” Your statement isn’t logical. A bar graph is only the right answer for certain types of data and occasions. The person who coined the term “lollipop chart” identified it as an alternative to a bar graph, not a dot plot. The line in a lollipop chart functions as a bar.
tc,
Thanks for sharing. Although I agree that most people here will think this is a bad visualization à can think of uses where this will be more useful than other visualization even if it hurts our eyes. The point à have been trying to make is that we are forgetting to ask more frequently about the purpose of our visualizations (at least when is possible)
Stephen
Lets not worry about Lolipop for now. Too much candy for me. I did not say the magnitude of uncertainty was different. I only said that if knowing that Spain was or was not zero then it could be a problem. I could have said the same thing about the other corresponding categories with their respective value.
When à say annotation à refer to text. Yes you can use them in bar graphs but if you have a few bars and restricted space your labels could be a little to big for your bar graphs. In this case we can use the cotton candy version with a few more tweaks if it allow us to comunicate the message that we want.
I would like to point out that this article and many of these comments are opinions. There is no research that I am aware of that has tested Steve’s claims that the lollipop chart makes the length harder to see or that the dot on the end has rendered it less precise. There has been lots of research comparing the various attributes of encoding data, but far less when it comes to variations of encoding data using the same attribute in different ways (in this case both are using length and position). If any readers know of research addressing this particular topic then please post.
Dots are used in many other chart types, ex. a scatter plot or dot plot, and accuracy compared to other encoding methods has been studied. Position has consistently shown to be very accurate for precise comparisons. I’ve seen no evidence that adding a thinner line makes the comparison of the dot harder or inaccurate. The size of the dot will certainly affect the precision, but that’s the case with any chart using a dot. If precision counts, then a smaller dot would be better. Even better, in your example, simply adding data labels to the end of the bar/lollipop will solve this. The reader could then read the exact value without estimating against the axis.
I have seen an additional use of a lollipop that could be useful. As Steve points out the bar chart uses length and position. While the lollipop chart does encode data in the same manner, additional data can also be encoded using the size of the dot. Example, a bar chart can show that the state of Rhode Island is only 85% of goal next to California with the same value. When using a lollipop to show that same data, # of orders, state population or some other context for that value could be added to the lollipop by encoding that data using the size of the dot. This helps to visualize the scope of the value/problem, making it easy to see that RI is a small problem and CA is a big problem.
Hi Steve
When I created lollipop charts in 2011, I was addressing a specific problem: charts with many long bars are unpleasant to look at.
I wasn’t considering bar charts where there is a large range of data, and only a few bars. In that situation, bar charts are unpleasant to look at. The lollipop is more pleasing to the eye. I’ll add some images to the forum to illustrate the issue.
rjss,
You said, “I can think of uses [of a lollipop chart] where this will be more useful than other visualizations,” but you have not described such a case. You also mentioned that “we are forgetting to ask more frequently about the purpose of our visualizations,” but this is not a failure that I’ve exhibited. Rather, it is a failure that you’re exhibiting until you identify a situation in which a lollipop chart would display data more effectively than a bar graph.
Jeffrey,
The problems of imprecision, inaccuracy, and a minimal rendering of length that I’ve described are not opinions, they are empirical observations that everyone can confirm. While it is true that the problem of imprecision can be reduced by making the dots smaller, the examples of lollipos charts that appear in your book actually render the dots much larger than I have. I’ve never seen lollipop charts designed with small dots. Regarding the addition of data labels following the dots to solve the imprecision problem, the same could be said for other problematic graphs, such as pie charts. And then there is the problem of clutter that the labels introduce, which lollipop charts are supposedly meant to reduce. The problem of inaccuracy is due to the fact that the middle of the dot marks the position of the value, but by adding a line that extends to the dot, thus encoding the value as length in addition to position, the length extends to the outer edge of the dot, which exaggerates the value. This problem of inaccuracy does not apply to dot plots or scatter plots.
You suggested that the lollipop chart could work better than a bar graph in cases when the size of the dot varies to encode a second quantitative variable. Actually, this is not the case. Bars in a bar graph could vary in width to encode a second quantitative variable in a manner that can be perceived more accurately than the size of a dot. This is not a practice that I recommend, however, because adding a second set of bars to encode the second variable works better than varying the width of bars or varying the sizes of dots.
Andy,
Examples of what you decided in 2011 to call a lollipop chart existed long before you coined the term. In response to these examples, I’ve been warning participants in my Show Me the Numbers course since 2004 against the problem of adding grid lines to dot plots that only extend to the dots. I point out that grid lines in dot plots should not end at the dot but should continue across the entire plot area to avoid adding line length as an additional means of encoding the values. This problem was easily introduced by adding drop lines to dots in Excel.
You mentioned above that you originally envisioned lollipop charts as replacements for bar graphs when there were many bars. Why then do you encourage people to use them when there are only a few values, as you do in your book? Also, you state that “bar charts are unpleasant to look at” when there are “many long bars,” but, in line with Jeffrey’s distinction between opinion versus established fact, your assertion that “the lollipop is more pleasing to the eye” is only an opinion. Even if we could establish that this opinion is shared by most people, a dot plot would solve this problem more effectively than a lollipop chart.
Steve,
Both the bar chart and lollipop examples that you provide here require the reader to estimate the value. Readers scanning the bar chart, which you claim as more precise in this case, will have to rely on the x-axis labels to estimate the values of all of those bars just as they do in the lollipop chart. You and I have had the axis debate before. I would advocate that we remove the x-axis where you have 9 axis labels to help us estimate 7 points of data (if there are 36 bars/lollipops then I would not advocate for this and instead label a few of them). I don’t think the problem of imprecision is “reduced” with data labels, I think it’s solved. What is more precise than having the actual value as a label next to it?
You are correct that the dot accuracy based on the line and its size is not an opinion. That is indeed easy to see. What is opinion is that this deficiency in the chart will cause the reader to interpret any different then the bar chart. I think in both cases the reader will interpret “the UK is about twice as much as Italy” just as easily.
Steve Wexler’s examples (chapters 30 and 35) in our book were in a very specific context. The client wanted bubble charts. They were not going to use bar charts. So Steve created a compromise that they really liked. In addition, Steve states at the end of chapter 30 that he likes the bar chart solution that they rejected.
I think we can all agree, based on research, that size will be less accurate. Varying the width of the bar is a form of that. For an approximate comparison, the size of the dot certainly works. Introducing an additional bar chart to encode that value won’t always be an option, for example due to space considerations.
I would be very interested to see a study of this.
Just in case it isn’t clear to readers here, I truly value Steve’s opinion for all things data visualization related. My earlier comment of pointing out that this thread is based on opinion should not be read that I discount his. His experience has proven invaluable to me. I just happen to disagree on this particular one.
Jeff
Hi Steve:
Although, hypothetical I was thinking a scenario where you need to group/sort people by height and the only way to recognize the person was with their appearance (photograph). I am not saying that this is the best way but that it is important to know the reason/purpose you are visualizing a dataset. In other words if that was the purpose and somebody bring me this visualization and ask for help improving it and I never ask the purpose then I go and create a great bar chart that is for the most part useless.
rjss,
No one here is debating the need to understand the purpose of a visualization before creating it. I applaud your commitment to understanding the purpose as your starting point.
Jeff,
Graphs, by their very nature, require the reader to estimate values to some degree, usually by associating value-encoding objects with a quantitative scale. We should make that task as easy as possible. Lollipop charts make that task more difficult and they exaggerate the values.
You are correct that by labeling values in a graph, preferably outside of the plot area so you don’t introduce distracting clutter, the problem of imprecision is solved, but labeling values to resolve problems in a lollipop chart that would not exist in a bar graph is not an argument in favor of lollipop charts.
Regarding the lollipop charts that appear in your book, after reading your comments above I had to look back through the book to confirm that the only lollipop charts that appear in it are those that Steve Wexler included as examples of compromises when the client was insisting on less effective forms of display. Actually, two other examples of graphs that are identified as lollipop charts appear in your book, but neither are in fact lollipop charts, so you are absolutely correct. In fact, in no cases in the book were lollipop charts used as a replacement for bar graphs, even though many of the bar graphs included a large set of values. I appreciate the fact that you are not advocating the use of lollipop charts in your book other than for this one particular situation, but, in my opinion, this one occasion does not justify their inclusion in your Glossary of Chart Types.
Regarding cases when the size of the dot might vary to encode a second quantitative value, and it is justified because there is no room to include a second bar graph, a dot plot with dots of varying sizes would be a better solution that a lollipop chart.
Hi Steve
I don’t recall seeing these chart types before I created mine. If I had, I had no conscious memory. If they existed prior to my blog post, then I acknowledge that. Maybe I was the first to call them lollipops? In which case, you can blame me for the cutesy reference.
Andy
Andy,
Yes, for good or ill, you get credit for calling these graphs lollipop charts. The name is exceptionally appropriate, based both on appearance and nutritional value, and it is certainly catchy in a way that sticks. Life was simpler when I could stick a Tootsie Roll in my mouth and enjoy the sweet thrill sugar, but I’ve since learned to prefer fresh fruit. Sometime in the last year or so participants in my Show Me the Numbers course began to occasionally ask about lollipop charts. In fact, this occurred last week in Oslo. This, combined with the fact that they’re mentioned in your book, prompted me to write about them her in my blog.
While I am a huge fan of Stephen Few’s body of work and thinking, count me in the group who believe in the value of lollipop charts.
It is reasonable to believe that, all else being equal, an isolated bar chart has an edge in allowing better decoding than an isolated lollipop chart. However, often all else is not equal. A set of visualizations that had just bar charts would likely capture less attention and result in less information being decoded than one that had both bar and lollipop charts.
Often, the encoding goal of the designer is go beyond a single dish and instead provide a balanced meal whose elements play off each other. Personally, I believe the type of lollipop chart discussed so far offers a great way to keep users engaged and results in more information being shared overall.
For showing differences between two values, there is another type of lollipop chart offers tremendous value and should be used much more than it is – http://drawingwithnumbers.artisart.org/lollipops-for-quality-improvement/.
Paul,
Your case for lollipop charts is based 1) an opinion that isn’t backed by evidence and 2) a fallacy:
1) The opinion: Lollipop charts are more engaging than bar graphs.
2) The fallacy: All graphs that connect data points with a line are lollipop charts.
There is no evidence that lollipop charts are more engaging that bar graphs. Even if they were, you would need to show that they were more engaging in a manner that leads to better understanding.
There term lollipop chart was coined by Andy Cotgreave to describe graphs that serve as alternatives to bar graphs, with a line that extends from the lowest value on the scale to a dot. Graphs of this type appeared in the wild long before Andy coined the term. In fact, I’ve been warning people against their use in my Show Me the Numbers course since 2004. The “Drawing with Numbers” example is not a lollipop chart. It is a useful version of a dot plot that has existed for many years. This type of chart is used to feature the difference between two values by connecting them with a line. Usually, dots appear at both ends of the line, but this isn’t essential. I wrote graph design guidelines for UNESCO in 2006 that included dot plots of this type. They don’t have a name apart from dot plot and don’t need one. They are certainly not called lollipop charts.
P.S. Even the name “lollipop chart” was not originally coined by Andy. A “lollipop graph,” which is used in mathemtics, existed before Andy’s adoption of the term, but he wasn’t aware of this fact.
Stephen,
I completely agree this is just one person’s opinion, not backed by evidence.
Without any evidence, I would suspect in comparing that charts you present for the blog post, the lollipop chart might prove to be less prone to the decoding bias demonstrated by George Newman for bar charts showing averages. In his study published in the Psychonomics Bulletin in May 2012, he found “viewers judge points that fall within the bar as being more likely than points equidistant from the mean, but outside the bar—as if the bar somehow “contained†the relevant data. â€
What do you think?
Do you think that what we are calling a lollipop chart can be shown to be as poor as visualization practice using the degree of thoughtful evidence and examination you have presented in the past showing the failings of pie charts?
Paul,
George Newman seems to make a valid claim that when bar graphs are used to displays averages, such as means, people tend to interpret them incorrectly. This makes perfect sense. A mean is a measure of central tendency, but the bar suggests that the values represented by the mean all fall below it. A lollipop chart would suffer from the same problem becuase of the line. A dot plot, however, would solve this problem nicely.
The problems that I’ve described regarding lollipop charts are relatively slight when compared to those that are associated with pie charts. This is because our ability to perceive areas, angles, and the length of arcs (the three ways that pie charts encode values) suffers from greater difficulty and inaccuracy than the inaccuracy and imprecision that’s associated with lollipop charts. In my opinion, it wouldn’t be worth the effort to do experimental studies to determine the degree of inaccuracy and imprecision that is associated with lollipop charts when compared to bar graphs. This is because there is no reason whatsoever to use a chart that suffers from a perceptual problem to any degree unless it can be shown to do at least one thing that existing graphs don’t do as well or better.
Stephen,
Respectfully, could we be too hasty here? Perhaps the “lollipop stick” can be seen more as a reference line tying the descriptor to a dot than as a mark encoding a measure?
As you suggest, the ultimate test would be an empirical study. But the test would be most meaningful if we were to evaluate subjects’ abilities to decode information from an overall ensemble of charts. As stated above, my unproven suspicion is that just relying on bar charts to show single values for an overall series of visualizations could lead to less information being decoded due to “visual monotony.”
Without a doubt, a dot is less precise of a marker than an end of a bar chart. But is it absolute precision we are after in visualizations?
Paul,
If, by reference line, you’re referring to a grid line that connects the label to the dot, it should be displayed in the usual way by spanning the entire plot area. By ending the line at the dot, you automatically add line length as a value-encoding component to the graph, which makes it function as a bar graph.
There is no evidence that the use of bar graphs rather than lollipop charts produces the effect that you’re calling “visual monotony.” Too many bad graphs have been introduced in an attempt to introduce visual variety, supposedly to create greater engagement at the cost of lesser effectiveness in other respects. The flipside of what you’re calling visual monotony is meningless variety, which forces people to adapt to new charts without benefit, meaninglessly shifting between different methods of perceiving the same types of data. These meaningless shifts come at a cost.
Absolute precision is neither a goal nor even a possibility in graphical displays. As we’ve already discussed above, even bar graphs are not precise. Nevertheless, we do strive for graphical forms that work as effectively as possible. Even though perfect precision is not within our reach, greater precision always beats lesser precision. We strive for the best, not the good enough. (At least, that’s what I strive for.)
Interesting topic. A few things:
1. Stephen, if I am understanding your point so far you are against the lollipop mostly because the imprecision/uncertainty. So there have been a few advantages mentioned and I will summarize as lower data ink ratio (lollipop). For me this reflected in terms of better spacing to add annotation. Others used the word busy or pleasing to the eye. Once all this is considered the problem still seem to be that the line (length) should not extend only to the point. However, I think depending on the design (opacity/color of the line) the point you are trying to make may not be completely supported by what we know in terms of data visualization and cognition. I will somewhat explain in the following somewhat unrelated point.
2. We have to accept that the field of data visualization as we know it today is very young when compared to other fields. We simply do not necessarily have a very good understanding with the current studies (data) of many of the elements we are talking here and how they interrelate. For example, the George Newman article highlighted here show an interesting idea but leave us puzzling with many question. For example, we are not completely sure the misinterpretation (bias) in using barchart for measurements of central tendency is because the participants do not understand the concept of mean when talking about a distribution of values or is really because the visualization induce a bias on our interpretation. I would actually think that the questions in the study do not necessarily capture the difference between bias and ignorance of the statistical concept. So should we stop using barcharts when displaying averages? I woulds say not at least in most of the cases. I would definitely include this in my toolbox if am talking about the underlying distribution and focusing on the shape of the distribution. If I am plotting averages and just comparing the values among different categories I am not so sure.
The point is that we have to acknowledge that our limited knowledge do not necessarily allow to know (presently) who is correct (or more correct). I would not call some of our comments here opinions but maybe “professional judgement”. I think it sounds better =)
Maybe time will tell.
It is not surprising that Stephen would defend the “perceptual edge” as being the critical argument for bar charts over lollipop charts. :)
Jarod
@rjss: “We have to accept that the field of data visualization as we know it today is very young when compared to other fields.”
How young is very young? Many of the simplest (and effective) visualizations have already had centuries to prove themselves.
rjss,
Although the field of data visualization is relatively young as an academic discipline, we’ve been visualizing quantitative data for well over 200 years. Even during the last few decades of research into data visualization we have learned a great deal. Most of what we’ve learned comes from fields of study that are much older than data visualization, especially psychology. We understand why a bar graph works well. My objections to a lollipop version of a bar graph are grounded in science. It is not true that we cannot know presently what works and what doesn’t regarding simple matters such as the effectiveness of bar graphs versus alternative forms of display that are intended as substitutes for bar graphs.
To correct a statement that you made, lollipop charts do not increase the data-ink ratio. The data-ink ratio is the amount of ink that displays data compared to the total amount of ink in the display. Bars represent data, so they qualify as data ink. As I explained to Christophe previously, lollipop chart decrease the data ink by replacing bars with thin lines. This provides no benefit and might in fact make the data slightly more difficult to read.
Regarding the use of bars for displaying measures of average, although I’ve probably been guilty of this myself at times, this is not a good practice for the reasons that Newman identified. A graphical representation of abstract data should correspond to the data that it represents. A bar, which extends from the base of the graph to the measure of central tendency, such as a mean, visually suggests that the data that it represents resides between zero and the end of the bar, which is not the case. It is usually a bad idea to display a distribution based solely on a measure of central tendency, because doing so tells us nothing about the spread or shape of the distribution. In rare cases then diplaying central tendencies only is appropriate, it would make more sense to use dots in the form of a dot plot to represent them for this will lead to less confusion than a bar.
Lollipops also represent the same piece of information redundantly via line length and dot position. So one of the components (line or dot) is unnecessary, and should be considered non-data ink. From that view, lollipops actually decrease the data-ink ratio.
As for better spacing to add annotation: I don’t see how replacing bars with lines allows for better annotation. Unless you are planning to add annotation between the lines, which seems awfully cluttered to me.
On the question of “within the bar bias†there are some interesting empirical findings and suggested alternatives presented here http://graphics.cs.wisc.edu/Vis/ErrorBars/ . For another study, also confirming this bias, see http://dx.doi.org/10.1080/00031305.2016.1141706 .
To give one person’s subjective opinion, I’ve always perceived lollipop lines (including the example presented at the start of this blog post) as reference lines tying dots to their label, not as thin bars encoding a measure. Indeed, if the reference example was shown as just a dot plot, I do not think it would be as readily decoded.
Further, wouldn’t alternative approaches to relate the circle to its label actually increase pixel to information ratio?
At the risk of sounding like a broken record, I would suggest that the ultimate question is “how effectively is information decoded in the overall context?†If that context were to have multiple bar charts, I propose that adding variety through something like a lollipop chart could actually increase the total information decoded. The decoding value of the whole is not always the sum of the parts.
Paul,
If only thinking we could override perception so easily. When you said, “I’ve always perceived lollipop lines as reference lines,” what you should have said was, “I’ve always conceived of lollipop lines as reference lines.” Perception occurs before thought, and it is difficult to override perception with cognition. When the line extends from the base of the chart to the dot, it leads to the perception of length. This is why, when William Cleveland introduced dot plots long ago, he used grid lines that ran across the entire plot area to assist the eye in connecting the labels along the axis to the dots.
Although grid lines of this type do not qualify as data ink, they serve a valuable purpose that justifies their existence. It is a mistake to treat Tufte’s data-ink ratio as a rigid rule by thinking that you must always fully maximize the ratio. It is more useful to treat it as a general guideline, which can be stated as, “Include no more non-data ink and render it no more salient than is necessary to support the graph’s intended use.”
If, on a single screen or page, you display multiple graphs and they all perform the same function, why would you vary the form of display? Arbitrarily displaying some data sets as bar graphs and others as lollipop charts provides no benefit. Instead, you are forcing readers to make slight adjustments in their decoding efforts when they switch from one type of graph to another because of the differences in their appearance, even though both of these graphs encode data in the same fundamental way. If you’re concerned about visual monotony, assuming that it is actually a factor, which is unknown, there are better ways of solving this problem than by varying chart types.
Stephen,
I completely agree that Tufte’s thinking should be thought of as a guiding starting point and not an end point for design. As an aside, I also count your work as also being among greatest contributions to thinking on how to make good visualizations.
In my case, I’ve found occasion where I believe it contributed more to the whole to use a lollipop chart (or, if vertical, what I believe should be more properly called a “balloon chart†;-) ).
At heart, I think philosophically, the question “how will the visualization be decoded?†is ultimately more important than the question “what is the best way to encode the data?†Without a doubt, the answers to both questions inform each other and both questions deserve much consideration.
I will continue to rely heavily on your work as I seek to answer both questions.
Thanks for correcting the error. It should have been “higher data ink ratio” and that is why I was implying some people may see that as an advantage (as I mentioned for various reasons). After reading the post a little more and my questions/comments I think I have not debating my point clearly. Is not that I am debating between bar graphs and lollipop. I think my point is the following: Is lollipop really different from a dot plot? Does extending (or not extending) that reference line make a difference on how the information is perceived. Also how the properties (dashed/opacity/thickness) of that line affect perception. I think this is where I am saying that we may not necessarily have data to support this assertion.
When I said that the field of data visualization was very young yes this is definitely for debate but I am not even talking much about time but what we have developed in terms of tools/understanding. As Stephen pointed out a lot of knowledge had been borrowed from other fields such as Psychology. I will say we have a very good grasp/understanding of separate elements at this time but we dont necessarily know how they play together.
Can we take two visualizations put them through a model and get an output that allow us to quantitatively pick a most optimal solution?. Not that I am aware. In that aspect data visualization is still very young compared to other field in engineering/economics. I am not even saying we should strive for this (at least right now) as I think we have other things we probably want to worry about and I am definitely not an expert in this area. By now I probably have ventured outside the scope of the topic in this blog. Any comments?
While we have no single coherent and comprehensive model that we can rely on to evaluate all all data visualizations to determine their effectivenss, we have several models that have emerged from empirical research that we can use to evaluate most cases. This is one of those cases. We don’t need to wait until everything is known to develop and follow best practices that we can rely on with confidence. As with all scientific findings, these models and the best practices that they inform can certainly be challenged and retested, but we shouldn’t ignore them or pretend that they don’t exist.
There is still much about data visualization that we don’t understand. Shouldn’t we be spending most of our time exploring the unknown where the greatest potential discoveries exist rather than wasting our time covering the same well known territory over and over again? Coming up with cute, less effective variations of proven forms of display, such as bar graphs, is not a good use of our time.
Thanks Stephen. I did not imply we need perfect understanding (or a complete model) as that will be depressing. Especially, if we start considering models (specially with our current understanding) that will be plagued with uncertainty and inherent variability between stimuli and perception.
Not sure this was clear but at this point I am past the argument of barchart vs lollipop (by the way I dont like the name). I guess over the past few post I realize I was not arguing one over the other as for me both plots (barchart and dot plot) are for different purposes (with their own advantages and limitations). I am talking about more of a dot plot and a lollipop. I am not convinced they are markedly different. Why I would want to shorten the grid lines? Hard to describe for me in text but again I may want to add an annotation to the right of the dots and the grid lines interfere with the text.
Dont want to keep going back and forth but would appreciate to see some references that may help me understand the effects of shortening the reference line on the perception of the position of the dot. Again not an argument that I want to make a lollipop. Hopefully that is clear now (maybe a bit of a language barrier).
@rjss
The shortened lines are visually confusing. In a dotplot, data are represented by the dot’s position. Lollipop lines imply that the line lengths also means something. If the chart range starts at zero, you’re fine (although you are better off with a bar chart in most cases). But if the chart doesn’t start at zero (often the case with dotplots), then the line lengths are very misleading.
For example, let’s say you’re comparing two percentages, 96% and 93%. A dotplot makes it easy to compare these values, as the chart can range from 90% to 100%. But a lollipop chart would visually suggest that the 96% item is double the 93% item, which is wrong.
Now let’s say you also have a value at 89% – your chart range might instead start at 85%, or 80%. With a different range, the first two values – which have not changed – are perceived differently. How the data are compared in a dotplot should not depend on the range, but this is exactly the visual cue that lollipop lines give.
It is better to avoid that confusion by either removing the lines or making them gridlines that span the chart’s range. If the grid-line is very light, there isn’t any reason you can’t still add annotation.
rjss,
In those relatively rare cases when it makes sense to add annotations related to particular values in the plot area of the graph, you can do so over properly designed grid lines in a dot plot without concern. Grid lines should be thin and light, just visible enough to lead the eye. Placing text on top of a grid line of this type works fine.
Extending a grid line only to the dot in a dot plot rather than across the entire plot area, especially a line that is as visually salient as the lines of a lollipop chart, leads the eye from the base of the scale to the dot, which adds the attribute of length to the data. We perceive that length whether we want to or not. Length thus functions as a data-encoding component of the graph. Unfortunately, the length that we perceive extends from the base of the scale to the outer edge of the dot, which introduces a degree of inaccuracy to the value that it represents, even when the scale starts at zero, which is equal to the radius of the dot. One can certainly argue that this inaccuracy is minor, but why introduce any inaccuracy at all when it can be easily avoided by removing length as a data-encoding component of the graph, simply by extending light and thin grid lines, when they are needed, across the entire plot area?
Thanks Stephen and Andrew. Stephen a few comments for your consideration. Don’t undermine the power of annotations. I think in many cases the world would be a better place if people add annotations. If you discovered something during your data analysis and you are using the visualization to explain that point don’t lose that opportunity. I guess this can be a touchy topic as there is the argument that we may be influencing the observer by simply adding annotations (ethics I guess a topic for another blog). However, in this date and age of additional transparency I guess annotations will rank low at least on my list.
Also talking about bar charts and axis that do not start on zero. Since the dot plots minimize but do not solve this issue. We may actually solve this by normalizing our results or presenting a more meaningful variable (e.g., rate of change). To borrow something from my field; we always say “All models are wrong but some are useful”; I think the same apply to data visualizations. Talking about ethics (we all have seen them in the news) the bar chart that does not start on zero blasting a candidate or something. I have always considered those examples of “good visualizations” with bad ethical practices.
rjss,
I have not and would not undermine the power or usefulness of annotations. On the contrary, I promote their use. In my previous statement, I merely pointed out that the need for placing annotations in the plot area of a graph is “relatively rare†(i.e., a low percentage). I was referring to graphs that we use to communicate information to others. Relatively few of the graphs that are produced for communication purposes need annotations in the plot area. When annotations in the plot area are needed, we must take care to manage their size, position, and salience to prevent them competing with the graphical elements of the graph. If you are referring to annotations that you, as a data analyst, might place in a graph to document information for yourself only, that is a different matter. You can place as many annotations as you wish in graphs that are for yourself alone.
Regarding the quantitative scale, while using bars with a scale that does not include zero represents the values inaccurately based on the lengths of the bars, using a scale that does not start at zero with graphs that use only 2-D position to encode the values does not represent the data inaccurately. If all of the values reside within a narrow range far from zero, using a zero based scale can make important differences between them difficult to see, which is why we often narrow the scale to include only the relevant range. Depending upon your audience, however, you might be concerned that some people might misinterpret values in graphs that don’t use a zero-based scale. This doesn’t mean that you must always include zero in the scale, but that you should do something to point out to people that the scale does not use zero as its base, which causes differences in values to appear greater than they are. In such cases, I often use two graphs: one with a zero-based scale and one with a narrowed scale. After seeing the graph with the zero-based scale, it is unlikely that anyone will misinterpret the version with the narrowed scale.
In the examples that we’ve been discussing here, “rate of change” is not a relevant way to normalize the data. Rate of change only relates to time-series data. Even when we are dealing with time-series data, expressing the rate of change is not way to normalize the amount of change. The rate of change and the amount of change are different measures; they are not interchangeable.
I just read the entire thread and have a few thoughts:
1) Excellent discussion. Thank you.
2) I was surprised that no one – including Stephen – used the Data Visualization Effectiveness Profile from Steve’s last newsletter to visually display their arguments for and against the two charts.
3) I like other readers first glanced at the lollipop chart, did not notice the thin lines and thought it was a dot plot. I had to concentrate on the plot to see what it really was. This brings me to my second point. As a User Experience Designer in the Oil and Gas industry – specifically the drilling industry – our team is always focused on glance time vs concentration time. Which are we trying to solve with this data visualization? I would argue the bar chart is far superior in the glance time category. Definitely if placing such an information visualization in an industrial setting such as a busy oil rig where the driller needs to glance across a room or several monitors and see critical information they need to perform their task anything that support glance time wins. However, I would argue the downtown office Drilling Engineer rushing to their morning 6 am meeting with a printout of the morning data on the state of their well also is going to be glancing at the data not concentrating on the charts as he/she has a million other concerns on their minds. We regularly test for this in informal studies on drilling rigs, in the downtown offices and when we bring customers into our shop for usability tests on our interfaces. It can be done rigorously for those who feel that need. It is not opinion based. It can be measured in observing how fast the customer understands the interface and completes the task. So I would argue the comments above failed to mention the context of the viewer in their arguments.
4)Not wishing to sidetrack the thread but I have to ask. Several comments brought up labels. I have found myself doing this of late on bar graphs and using the data table feature of Excel so as not to clutter the graph. But I also find myself arguing with myself. Why am I supplying numbers on a graph? If you need to show the exact numbers on a graph then that says a table is the correct representation. A graph is for showing trends. Why?
Mark,
Graphs do more than show trends. They make it possible for us to see and do several things that we cannot see or do with tables of numbers. Regarding trends in particular, keep in mind that a trend is but one of many potential patterns that graphs can reveal. Regarding the provision of numbers along with a graph, it isn’t accurate to say that if people need precise values in the form of numbers they don’t need a graph. For exsample, what if they need to see patterns, make rapid comparisons, and also access precise values on occasion?
Mark; thanks for sharing. A few comments:
1. Regarding your question about the labels; I believe there is no way to say it is wrong or right. I actually have been thinking in a philosophical manner what does constitute to design the best visualization possible. In brief I think it is the one that transmit the intended message in the least amount of time. Then it is on us to use the principles we learned in data visualization (e.g., perception) to design the appropriate visual.
2. I read the post about the Data Visualization Effectiveness Profile. I am not a fan. I like the idea of informally doing a calculation to have an idea if you are truly stuck between alternatives. But sometimes I wonder if this is a bad advice. The shape of the profile is pretty much meaningless. I would not like somebody to spend time creating plots about a visualization. I would think spending time to think about who the user will be and the message that the visualization needs to transmit to be far more important. Maybe Stephen is providing us a thorough explanation of the process but he is not suggesting us to do this in our work (at least until we know much more).
3. I actually ended up re-reading some old papers for some ideas I have been working on. I noticed that Cleveland (1984; The American Statistician) have a dot plot with truncated grid lines. In fact he talked about it in the paper so it was not a printing error. I am not saying that I would truncate the grid lines just for fun. But if need to then I don’t see much proof yet in the literature that shows a large discrepancy in perception between the truncated (thin) grey lines and not truncated.
rjss,
The Data Visualization Effectiveness Profile is my attempt to replace the informal calculations and judgments that you advocate, which for most peopoe are not based on solid principles, with more useful criteria. The shape of the profile, in and of itself, is meaningless, but it is useful for comparing the effectiveness of various visualizations. I am not suggesting that true experts in data visualization should spend time constructing Data Visualization Effective Profile charts for every visualization that they create, but rather that these profiles can be used to evaulate the comparative merits of visualizations when needed.
In the article by William Cleveland that you mentioned, I assume that he showed an example of a dot plot with grid lines that ended at the dot to discourage the practice. I assume this because he never ended the grid lines at the dots in his two books. Is this correct?
Stephen:
Actually W Cleveland do not give this as an example of something that we shouldn’t do. At least from what I can tell. But you are going to “love” this. Regarding the message from that particular figure he writes the following: “When there is a zero on the scale-or some other meaningful baseline value-and no scale break, the dotted lines can emanate from the baseline value and end at the data dots, as in Figure 6.”. So I think this mean he recognizes that the grid line (length) could encode information and can be interpreted as a bar chart. Did he unintentionally proposed the first lollipop?
I guess this bring me back to my original though; is the lollipop very different from the dot plot? I mentioned before that if needed I wont feel bad to truncate the grid lines. However; I think that was incorrect or at least incomplete. I should add that if the dot plot scale do not start in zero we should be very careful about truncating those lines.
Stephen thanks for bringing these conversations. I definitely enjoy and learned from these blogs.