The following information graphic was brought to my attention by Joy Bonaguro of the Greater New Orleans Community Data Center, one of the people who attended my 2007 West Coast Visual Business Intelligence Workshop:
As you can see, this graphic provides statistics about the homeless population in the United States. In designing the graphic, it appears that the creators fell into a common trap; they thought their information was boring so they tried to spruce it up. Instead of standard graphs, they used silhouettes of people, placed side by side, which are meant to be read like a horizontal bar graph. Each little figure represents 1% of the total homeless population and each row contains 100 figures. As such, each row represents 100% of the homeless population, which means that these statistics represent part-to-whole relationships. In some rows, such as the "Composition" rows, the figures have been broken up into multiple groups (this is similar to stacked bars). In other rows, such as the "Background" rows, a portion of the homeless figures are colored or black, signifying that a particular trait applies to them, while those that the trait does not apply to are "grayed out."
There are several problems with this design. First, all of the gray figures constitute unnecessary pixels which serve only to clutter the display. For instance, because every row represents 100%, if we know that 10% of homeless people are veterans, it's redundant to also tell us that 90% of them aren't veterans. Besides being redundant, the additional clutter caused by the gray figures keeps the black figures from jumping out like they should, which undermines the graphic's effectiveness.
Another problem with this display involves its ability to be interpreted accurately. Because of the gap down the middle of the graphic, it's fairly easy to tell if a value is close to 50%. Any other amount, however, cannot be judged very accurately, because quantitative scales have not been provided. Sure, you could count the figures, but that is time-consuming and defeats the purpose of using visual communication. Also, the complex shapes of the figures themselves make it harder to judge their cumulative magnitude than a simple shape like a bar would.
The "creative" figure designs disrupt communication as well. For instance, the "Ethnicity" row uses a background gradient that goes from white to red to black. While I assume that the intention was to imply the difference in skin color between the various ethnicities, they ended up making ethnicity stand out as more visually salient, and therefore, more important than their other statistics. The colored figures that represent "Drug or Alcohol Dependent" people and "Veterans" also make those rows jump out, unnecessarily. And I don't know what to say about the inverted figures that represent the "Mentally Disabled" homeless.
By presenting this information using tried-and-true graphs that are backed by solid science, it can be transformed from a graphic that requires study, to one where the important information is immediately understandable. Here is my redesign:
Because the information about change in homeless composition is time-series data, I have displayed it using a line graph, which helps highlight the change between 1998 and 2005. I have used large data points and relatively thin lines to reinforce the fact that only data for those two years was available and homeless composition between these two points in time did not necessarily change in the perfectly linear fashion that the lines suggest.
The information contained in the Age, Location, and Ethnicity bar graphs all exhibit part-to-whole relationships. To reinforce this intimate relationship between the values, I grouped the bars so that they are touching and I have provided the percentages of each bar, as well as their total (100%). It should be noted that the age bar also represents a distribution, because it displays the amount of people who fall into contiguous intervals along a numeric scale. Unfortunately, however, the age intervals are not equal. They range in intervals of less than 1 year (Under 1) to 30 years (51-81). According to Joy, who sent me the original, it is standard for social services to use unequal intervals such as these, in order to group people into meaningful age classifications (infant, toddler, elementary school age, teenager, etc). By using unequal intervals, however, we lose our ability to accurately focus on the shape of the distribution, or accurately compare the different values. Because I only had the original with its unequal intervals to work with, I had no choice but to use unequal intervals in my redesign. Given the conventional use of these unequal intervals of age for statistics of this type, I might have chosen to stick with these intervals anyway to conform to the way that data of this type is usually displayed.
The two bottom graphs present percentages, but they differ from the part-to-whole relationships in the top three bar graphs. While each measure independently represents a portion of the total homeless population, the sum of the bars does not equal 100%. Additionally, in the case of the "Miscellaneous" graph, the bars aren't even mutually exclusive. For instance, a homeless person who is unsheltered could also be drug or alcohol dependent, mentally disabled, a veteran, or even all three. To visually highlight the difference between the two bottom bar graphs and the rest of the bar graphs, I have used horizontal bars with gaps between them. This should be enough to clue someone who is quickly scanning the graph into the fact that these are different.
By redesigning the graphic, I was able to take a source with important, but slow to digest information, and turn it into something that could be read more quickly, easily, and accurately.