Even though several research studies over the years have sought to compare the relative effectiveness of pie charts vs. bar graphs, only for one task have bar graphs failed to outperform pie charts. The one potential advantage of pie charts was identified in a study by Spence and Lewandowsky titled “Displaying Proportions and Percentages” (Applied Cognitive Psychology, Vol. 5, 1991). This study has probably been cited more often than any other to support the pie chart’s worth. I suspect that most of these citations, however, were made by researchers who never actually read the original paper, so they tend to give pie charts more than their due. In all fields of research, not just information visualization, studies are routinely cited that weren’t actually read, resulting in misrepresentations of the original work’s findings. According to a study by Mikhail Simkin and Vwani Roychowdhury, only about 20 percent of scientists who cite an article have actually read the paper (“Read Before You Cite!”, Complex Systems, 14 , 2003). In most cases, researchers have only read comments in secondary sources about the studies that they cite—sources that were often written by others who also relied on secondary sources. This is one of the ways that errors proliferate and sometimes become common knowledge, even in scientific circles.
Few researchers bother to mention that the study by Spence and Lewandowsky robbed bar graphs of their quantitative scales. Perhaps, because pie charts lack quantitative scales, Spence and Lewandowsky felt that scales should be removed from the bar graphs to even the playing field. In fact, a pie chart has an implied scale that goes from 0% to 100% in a circle around the perimeter of the pie, but it is never shown because it isn’t helpful. By removing the scales from bar graphs, however, their study failed to measure the effectiveness of bar graphs as actually used.
Nevertheless, even when hamstrung in this way, bar graphs performed better than pie charts for every task except comparisons of summed parts. Imagine a pie chart with four slices, labeled A through D, and a bar graph with four bars, one for each of the same values. Now imagine the following task: either compare the sum of slices A and B to the sum of slices C and D to determine which is greater or perform the same comparison using the corresponding bars in the bar graph. The study found that test subjects could estimate the sums of two slices and compare them to the sums of another two slices more effectively than they could estimate and compare the combined lengths of bars. This isn’t surprising, but even this one advantage of pie charts might not have been found had the bar graphs possessed their scales.
Comparing the lengths of two bars that share a common baseline is handled by the visual cortex of the brain in a preattentive manner that is fast and as precise a comparison as visual perception supports. Comparing the sizes or angles of pie slices is also handled by the visual cortex, but not as precisely and usually not as quickly either, because we typically strive for a level of precision that the pie chart doesn’t support, which slows us down. Decoding the value represented by a slice of pie requires us to estimate the percentage of the circle that belongs to the slice, which is difficult. Decoding the value represented by a bar involves a straightforward lookup: we compare the end of the bar to the nearest value along the scale. When a bar graph is properly designed, we can perform this task quickly, easily, and precisely.
The fundamental superiority of bar graphs over pie charts is rooted in a fact of visual perception: we can compare the 2-D positions of objects (such as the ends of bars) or their lengths (especially when they share a common baseline), more easily and precisely than we can compare the sizes or angles of pie slices. When people like Edward Tufte, William Cleveland, Naomi Robbins, and I express disdain for pie charts, it is for this reason and this reason alone. We love circles as much as anyone, but we don’t worship them and we don’t expect from them what they can’t provide.
Despite the perceptual problems associated with pie charts, which are well established, every once in awhile some new study or book comes along and suggests that the experts have been wrong all along. Even when utterly absurd and completely unfounded, lovers of pie charts, especially software vendors that promote silly, ineffective data visualization practices, celebrate these studies: “Mission accomplished! We have proven the worth of our beloved pie.” To quote the conclusion of a recent journal article: “The pie is a communication chart par excellence…pies are from Venus, bars are from Mars” (Charles Wesley Ervin, “Pie charts in financial communications,” Information Design Journal, 19:3, 2011). People love circles, there’s no doubt about it, but they are rarely useful for displaying quantitative information.
A few days ago I discovered the latest paper that gives an undeserved thumbs up for the pie chart: “Using fNIRS Brain Sensing to Evaluate Information Visualization Interfaces,” written by students and faculty at Tufts University. I discovered this paper when reading a blog post that cited my negative opinion of pie charts and then pointed to this paper as potential evidence of my error. This paper has been accepted for presentation at CHI 2013 in Paris later this year and is already available in published form. This study is misdesigned, misinterpreted, and misrepresented. I wish I could say that this is an anomaly, but sadly, I cannot. If you are not intimately acquainted with academic research, you might assume that most of it is well done, and that getting published is a sure sign of credibility. This is far from true. Bad research gets published in every field, but in the field of information visualization, it sometimes even wins awards.
The following bold statement appears in the paper’s abstract (emphasis mine):
In this paper, we use the classic comparison of bar graphs and pie charts to test the viability of fNIRS [functional near-infrared spectroscopy] for measuring the impact of a visual design on the brain. Our results demonstrate that we can indeed measure this impact, and furthermore measurements indicate that there are not universal differences in bar graphs and pie charts.
In fact, this study demonstrates nothing of the kind. It does not meaningfully measure the impact of visual design on the brain, and it definitely does not indicate anything universal or even otherwise about differences between bar graphs and pie charts. The primary problem with this study is the fact that it did not simulate any of the actual tasks that people perform when using bar graphs and pie charts. This is only apparent, however, if you read beyond the abstract.
I’ll describe the tasks that test subjects performed. See if you can identify the problem. Subjects performed multiple series of tasks. Each time they were shown 11 slides in sequence, lasting 3.7 seconds per slide. Each slide in a particular series displayed either a single bar graph or pie chart. Each chart displayed multiple bars or slices. Among them, one bar or slice was marked with a large black dot and one with a small red dot. The subject was required to compare the length of the bar or size of the slice marked with the red dot to the length of the bar or slice of the pie marked with the black dot on the previous slide. Items marked with black dots always represented values that were greater than those marked with red dots. The subject’s task for each slide was to estimate how much larger the item marked with the black dot on the previous slide was compared to the item marked with the red dot on the current slide, to the nearest 10%. In other words, they would indicate that it was approximately 10% greater, 20% greater, 30% greater, etc., which they did by pressing an appropriate key on a keyboard. After making this choice, they then had to quickly look at the item marked with the black dot in the current slide before the 3.7 seconds were up so they could remember it when the next slide appeared and they were required to compare it to the item marked with the red dot there. The figure below shows an example of three slides in an eleven-slide series, in this case consisting of pie charts:
Think about this task. Is this what we do when we compare values in bar graphs or pie charts? It isn’t. What’s different from our actual use of these charts? The things that subjects compared were never simultaneously visible.
When we use a chart to compare either slices or bars, we almost always compare values within a single chart. The values are right there near one another, which allows the visual cortex of the brain to handle the comparison. On less frequent occasions when we compare values that reside in separate charts, we always put those charts in front of our eyes at the same time, such as in a trellis display. This is a fundamental practice of data visualization. Why? Because, if the things that we need to compare are not simultaneously visible, we must rely on working memory, which is extremely limited. Work is transferred from the visual cortex to working memory—from our strength to our weakness—which is just plain dumb.
The designers of this study created a task that was handled by working memory because they wanted to demonstrate the usefulness of fNIRS technology for data visualization research and fNIRS can only measure neural activity in the prefrontal cortex, not the visual cortex. They created an unrealistic, artificial task. In doing so, they created something to measure in the prefrontal cortex, but it had nothing to do with a realistic use of charts.
This study was not actually designed to compare the effectiveness of bar graphs vs. pie charts, yet it makes the claim that “there are not universal differences in bar graphs and pie charts.” Instead, this study was designed to demonstrate a use for fNIRS technology in the field of data visualization research. It failed to achieve the latter and should have made no claims regarding the former.
Only one potentially meaningful finding should have been claimed by this study: a positive correlation between test subjects’ subjective sense of difficulty associated with the use of bar graphs vs. pie charts and hemoglobin oxygenation levels in the prefrontal cortex. Subjects who felt that bar graphs were more difficult exhibited higher levels of oxygenation when using bar graphs. Those who felt that pie charts were more difficult exhibited higher levels of oxygenation when using pie charts. This tells us nothing about the relative effectiveness of bar graphs vs. pie charts. Subjects’ preferences for one type of chart over the other might have been a predisposition, but predispositions were not tested. Whether or not a predisposition existed, we don’t know if test subjects’ sense of difficulty and higher levels of oxygenation have any relationship to the effectiveness of the charts. What the experiment found is that working memory performed equally well (or equally poorly) regardless of the chart that was used.
This and other studies done at Tufts University interpret higher levels of hemoglobin oxygenation in the prefrontal cortex as “cognitive load,” by which they imply “cognitive overload.” A negative connotation is assumed. Measuring hemoglobin oxygenation levels in the prefrontal cortex may be a valid measure of brain activity, but we have no reason to believe that this activity is necessarily negative. Perhaps high levels of activity correlate to greater insights rather than counterproductive overload. In truth, oxygenation levels probably indicate neural activity of many types: some positive and some negative. To date, we don’t know how to discriminate between them.
The use of neuroimaging such as fNIRS in HCI studies is still in its infancy. fNIRS may be useful, but we must be careful to read no more into these measures than our current understanding can actually support. Using fNIRS to interpret neural activity is a bit like using temperature readings inside a building to determine the specific activities that are going on within, even though we are separated from those activities by a solid, opaque wall.
The authors of this study indicated the need for caution, but notice how they failed to heed this concern (emphasis mine):
During the course of this paper, we have been intentionally ambiguous about assigning a specific cognitive state to our fNIRS readings. The brain is extremely complex and it is dangerous to make unsubstantiated claims about functionality. However, for fNIRS to be a useful tool in the evaluation of visual design, there also needs to be an understanding of what cognitive processes fNIRS signals may represent. In our experiment, we have reason to believe that the signals we recorded correlate with levels of mental demand.
Notice the reasoning here. We can’t assign specific cognitive states to fNIRS readings, but these readings are useless to us unless we can assign specific states to them, so we’re going to do so. After the disclaimer, they went on to declare:
Our findings suggest that fNIRS can be used to monitor differences in brain activity that derive exclusively from visual design. We find that levels of deoxygenated hemoglobin in the prefrontal cortex (PFC) differ during interaction with bar graphs and pie charts. However, there are not categorical differences between the two graphs. Instead, changes in deoxygenated hemoglobin correlated with the type of display that participants believed was more difficult.
“Differences in brain activity that derive exclusively from visual design”? What they actually found were differences related to subjective feelings of difficulty and oxygenation levels associated with those feelings, which they assumed were “derived exclusively from visual design.” It is entirely possible, however, that those subjective feelings were derived from dispositions regarding bar graphs vs. pie charts that did not grow out of differences in visual design.
Because fNIRS can only measure activity in the prefrontal cortex, not the visual cortex, the authors acknowledge that it is only potentially useful for measuring more complex tasks that involve the prefrontal cortex.
We find that fNIRS can provide insight on the impact of visual design during interaction with difficult, analytical tasks, but is less suited for simple, perceptual comparisons.
Even this statement contains an error. Remembering the size of a slice or bar so it can be compared to another slice or bar later is indeed a difficult task because of working memory’s limitations, but is it an analytical task? Does it require reasoning? It is entirely a task of memory. The prefrontal cortex handles many tasks, but we cannot currently use fNIRS to specifically measure analytical tasks because it cannot discriminate among different neural activities.
Research studies like this should prompt us to ask several questions, including:
- How can students earn PhD’s while focusing on information visualization without first learning the fundamental skills required of the discipline (best practices of graph design, the basic tenets of the scientific method, an understanding of visual perception and cognition, and critical thinking)?
- Do the professors who participate in these studies and the reviewers who approve them also lack these skills?
- Do the professors who advise these students review these studies carefully?
- Why aren’t researchers in information visualization asked to go back and correct their work prior to approval for publication based on feedback from reviewers?
I am not writing about this particular study because it is extraordinarily bad, but merely because its claims address topics of interest to me. This paper is typically bad. The problems that we see in it arise from deeper problems that are both endemic and systemic. Papers get published and awards are given when studies exhibit novelty or make controversial claims. A study that tests a hypothesis that turns out to be false is rarely published, even though it is still informative. A study that tries to replicate a past study to confirm or deny its findings is considered boring and thus avoided. Student in doctoral programs are encouraged to find something sexy. Sometimes this takes the form of studies that supposedly challenge long-established best practices. When you’re a young up-and-comer, it’s exhilarating to take a leader in the field down a peg or two. What academics sometimes forget, however, is that their work affects the world. People trust their findings and make decisions based on them. When studies make erroneous claims, they do harm. Research should be better reviewed for the merits of its content. We need fact checkers; not after the fact, such as this review that I’m writing, but prior to publication. Students should receive corrective guidance during the course of their research rather than being subjected to corrective reviews like this post-publication. The bar must be raised, but that won’t happen until academics themselves become willing to speak up.