Avoiding Quantitative Scales That Make Graphs Hard to Read

This blog entry was written by Nick Desbarats of Perceptual Edge.

Every so often I come across a graph with a quantitative scale that is confusing or unnecessarily difficult to use when decoding values. Consider the graph below from a popular currency exchange website:

Example of poorly chosen quantitative scale
Source: www.xe.com

Let’s say that you were interested in knowing the actual numerical value of the most recent (i.e., right-most) point on the line in this graph. Well, let’s see, it’s a little less than halfway between 1.25 and 1.40, so a little less than half of… 0.15, so about… 0.06, plus 1.25 is… 1.31. That feels like more mental work than one should have to perform to simply “eyeball” the numerical value of a point on a line, and it most certainly is. The issue here is that the algorithm used by the graph rendering software generated stops for the quantitative scale (0.95, 1.10, 1.25, etc.) that made perceiving values in the graph harder than it should be. This is frustrating since writing an algorithm that generates good quantitative scales is actually relatively straightforward. I had to develop such an algorithm in a previous role as a software developer and derived a few simple constraints that consistently yielded nice, cognitively fluent linear scales, which I’ve listed below:

1. All intervals on the scale should be equal.

Each interval (the quantitative “distance” between value labels along the scale) should be the same. If they’re not equal, it’s more difficult to accurately perceive values in the graph, since we have to gauge portions of different quantitative ranges depending on which part of the graph we’re looking at (see example below).

Unequal intervals
Source: www.MyXcelsius.com

2. The scale interval should be a power of 10 or a power of 10 multiplied by 2 or 5.

Powers of 10 include 10 itself, 10 multiplied by itself any number of times (10 × 10 = 100, 10 × 10 × 10 = 1,000, etc.), and 10 divided by itself any number of times (10 ÷ 10 = 1, 10 ÷ 10 ÷ 10 = 0.1, 10 ÷ 10 ÷ 10 ÷ 10 = 0.01, etc.). We find it easy to think in powers of 10 because our system of numbers is based on 10. We also find it easy to think in powers of 10 multiplied by 2 or 5, the two numbers other than itself and 1 by which 10 can be divided to produce a whole number (i.e., 10 ÷ 2 = 5 and 10 ÷ 5 = 2). Here are a few examples of intervals that can be produced in this manner:

Sample Powers of 10

Here are a few examples good scales:

Good Scales

Here are a few examples of bad scales:

Bad Scales

After this post was originally published, astute readers pointed out that there are some types of measures for which the “power of 10 multiplied by 1, 2 or 5” constraint wouldn’t be appropriate, specifically, measures that the graph’s audience think of as occurring in groups of something other than 10. Such measures would include months (3 or 12), seconds (60), RAM in Gigabytes (4 or 16) and ounces (16). For example, a scale of months of 0, 5, 10, 15, 20 would be less cognitively fluent than 0, 3, 6, 9, 12, 15, 18 because virtually everyone is used to thinking of months as occurring in groups of 12 and many business people are used to thinking of them in groups of 3 (i.e., quarters). If, however, the audience is not used to thinking of a given measure as occurring in groups of any particular size or in groups that number a power of 10, then the “power of 10 multiplied by 1, 2 or 5” constraint would apply.

3. The scale should be anchored at zero.

This doesn’t mean that the scale needs to include zero but, instead, that if the scale were extended to zero, one of the value labels along the scale would be zero. Put another way, if the scale were extended to zero, it wouldn’t “skip over” zero as it passed it. In the graph below, if the scale were extended to zero, there would be no value label for zero, making it more difficult to perceive values in the graph:

Extended scale does not include zero stop
Source: www.xe.com, with modifications by author

In terms of determining how many intervals to include and what quantitative range the scale should span, most graph rendering applications seem to get this right, but I’ll mention some guidelines here for good measure.

Regarding the actual number of intervals to include on the scale, this is a little more difficult to capture in a simple set of rules. The goal should be to provide as many intervals as are needed to allow for the precision that you think your audience will require, but not so many that the scale will look cluttered, or that you’d need to resort to an uncomfortably small font size in order to fit all of the intervals onto the scale. For horizontal quantitative scales, there should be as many value labels as possible that still allow for enough space between labels for them to be visually distinct from one another.

When determining the upper and lower bounds of a quantitative scale, the goal should be for the scale to extend as little as possible above the highest value and below the lowest value while still respecting the three constraints defined above. There are two exceptions to this rule, however:

  1. When encoding data using bars, the scale must always include zero, even if this means having a scale that extends far above or below the data being featured.
  2. If zero is within two intervals of the value in the data that’s closest to zero, the scale should include zero.

It should be noted that these rules apply only to linear quantitative scales (e.g., 70, 75, 80, 85), and not to other scale types such as logarithmic scales (e.g., 1, 10, 100, 1,000), for which different rules would apply.

In my experience, these seem to be the constraints that major data visualization applications respect, although Excel 2011 for Mac (and possibly other versions and applications) happily recommends scale ranges for bar graphs that don’t include zero, and seems to avoid scale intervals that are powers of 10 multiplied by 2, preferring to use only powers of 10 or powers of 10 multiplied by 5. I seem to be coming across poorly designed scales more often, however, which is probably due to the proliferation of small vendor, open-source and home-brewed graph rendering engines in recent years.

Nick Desbarats

24 Comments on “Avoiding Quantitative Scales That Make Graphs Hard to Read”


By Stuart. May 26th, 2016 at 7:31 am

Do you have any justification for point 3? Why would that scale be any harder to read if it were 0.90, 1.05, 1.20, 1.35, 1.50, etc. which would have a marker for zero if extended down? Perceptually you’re still having to judge fifteenths and add them onto non-round numbers which contravenes your point 2.

An example not breaking your other rules such as 1000, 3000, 5000, 7000, etc. may have been better.

By Xan Gregg. May 26th, 2016 at 7:42 am

Typo: the text says “1.25 and 1.40” but the graph shows 1.20 and 1.35.

In addition to 2 and 5, I sometimes use 2.5 as a multiplier when trying to create a “nice” default scale. Any experience there?

Dates and times have their own rules, of course. For instance, 6 could be a good multiplier for a months axis.

By Nick Desbarats. May 26th, 2016 at 8:53 am

Thanks for your comment, Stuart.
I didn’t suggest that anchoring the scale in the chart to which you’re referring would yield a good scale, only that the fact that it wasn’t anchored at zero was problematic.
As you point out, that scale also happens to have another, separate problem, which is that it has a 1.5-based interval (i.e., not 1-, 2- or 5-based). I agree that I should have perhaps chosen a different scale example (i.e., one with a good interval but that is not anchored at zero) in order to avoid any confusion on this point.

By Nick Desbarats. May 26th, 2016 at 11:21 am

Thanks for your comment, Xan.

Steve and I actually spent quite a bit of time going back and forth on whether 2.5 should be included in the list of multipliers. Ultimately, we decided to not include it for two reasons:

1 . Intervals based on 2.5 require users to perform more difficult mental math. For example, if a point on a graph is located about halfway between the 175 and 200 labels of scale (i.e., a 2.5-based scale), it’s a harder to mentally calculate that value than if, for example, it falls halfway between 180 and 200, or halfway between 190 and 200.

2. With interval multipliers of 1, 2 or 5, it should be possible to generate a scale that will closely fit virtually any data set, and that will allow for whatever precision the graph creator believes that their audience will require. As I mention in the post, Excel doesn’t even seem to use 2, and it’s still able to generate good scales for almost any set of data (notwithstanding the other issues with Excel that I mentioned in the post).

You raise an excellent point regarding data about which people tend to think in bases other than 10. I could see that 3 would be a good interval for a scale of months, since many business people are used to thinking about these values as quarters of 12, and not fractions of 10 or a power of 10. I could see similar exceptions for baseball innings, degrees of rotation, etc.

Thanks also for pointing out the typo. The error was actually that the first chart image was wrong, but it’s now been replaced with the correct image.

By Daniel Zvinca. May 27th, 2016 at 11:24 pm

Nick,

I will (partialy) argue with statement 2 (power of 10 or power of 10 multipled by 2 or 5).

Picture a scenario where full scale is between 0 and 100, but available vertical space to draw the axis labels is reduced. Acording with your statement a 0, 20, 40, 60, 80, 100 would do just fine. Probably, but I am sure that 0, 25, 50, 75, 100 scale could be more informative especially for situations where comparison against quarters is relevant.

The measurement system can be also the reason people think in different bases. A foot has 12 inches, a yard has 3 feet. A pound has 16 ounces. A gallon has 4 quarts.
For the metric system statement 2 is reasonable, but it cannot be generalized.

By Nick Desbarats. May 29th, 2016 at 7:51 pm

Thanks for your comment, Daniel.

I agree that there are certain specific types of measures where intervals based on multipliers other than 1, 2 or 5 would be appropriate (this is reflected in my May 26 response to Xan Gregg, above, who raised the same good point). I’m not sure that the correct test to determine whether a given measure falls into this category is whether or not the measure is metric, however. In order to determine if one could choose an interval that’s based on a multiplier other than 1, 2 or 5, I think that the correct question to ask would be whether or not the audience is used to thinking about that measure as occurring in groups of something other than 10, usually because of some historical convention. “Non-10-grouped measures” would then also include measures that aren’t metric or Imperial per se, such as months (i.e., most people think of months in groups of 12, not 10), beers (12 or 24), hockey periods (3), etc. (Can you tell that I’m Canadian?). Measures that don’t fall into this category and that should, therefore, have scales based on multipliers of 1, 2 or 5 would then comprise all other types of measures, including those expressed in most metric units of measurement, but also many others (dollars, patients, percentages, sales orders, etc.)

Regarding your example of needing to use an interval that’s based on a multiplier of 2.5, I think that the scenario that you present is a rare edge case, i.e., where 6 labels is too crowded, and where 5 is not too crowded, and where 3 labels (0, 50, 100) doesn’t offer sufficient precision, and where the data spans a very specific range (e.g., if the data spanned 0 to 101 instead of 0 to 100, then the 2.5-based scale would also need 6 labels). As such, my opinion is that I don’t think that the trade-off of forcing users to do harder math in their head (see my previous comment on this post) is worth adding 2.5 to the list in order to handle a small number of edge cases. If, however, the audience is used to thinking of the measure in question in terms of quarters due to some historical convention, then, yes, a 2.5-based interval might be appropriate.

By Robert Monfera. May 30th, 2016 at 2:14 pm

Interesting article, thanks!

I’dd add that there’s some more that goes into the quantitative scale. The article focuses on one aspect, which is the raster. Another interesting topic is the extent (range). Steve Few, e.g. in his paper about bandlines, highlighted the utility of determining the extent based on a broader data set than what’s visible. So self-updating bandlines would use a value extent that’s based on not only the currently visible minimum / maximum, but also on a period of past data – in part, to show current values in context, and in part, to retain some constancy between values and their visual projections.

Closer to the topic of this article, the quantitative scale extent is influenced by certain perception issues directly relating to the axis ticks and what data is in view.

1. It can be useful if there’s padding above / below the extent that the data represents. In other words, often it’s preferable for a Y axis and the highest tick mark, to be somewhat above the highest data point. On bound scales, e.g. 0-100% or on bandlines, it may not be an issue, but on open-ended scales, e.g. temperature, asset price etc. it alludes to the possibility that the values could have been in theory, higher. So there’s a tradeoff between maximalizing the vertical space (dynamic range) for showing detail vs. conveying the notion that the typical distribution of the data may have resulted in higher values – for example, as if comparable stock price increase rates were present. Otherwise a few percentage points of increase in a year can look like a solid trend.

2.A chart may be more aesthetic and easier to read if there’s a 3-5% padding on the top (and sometimes on the bottom if values can be negative or on a log scale).

3. Depending on what needs to be conveyed, if the scale extent is ‘close enough’ to something round or noteworthy, it’s best to round up. For example, web browser market shares fluctuated heavily but at at no point did any one browser exceed a 95% market penetration. Yet, for the sake of roundness and completeness for little cost of space, it’s best to show the full 100% extent. (It’s just illustration; a stacked chart might be a better choice anyway.)

4. Even if there’s no apparent need or wish for padding, the tick labels (printed numerical values) occupy some vertical space. This is a collision of projective (value) space vs. readability (font height) space. Often, especially on small multiple or other constrained charts, the font height covers a significant part of the scale extent, and about half the height of the top tick label overshoots the highest possible value. Sometimes the solution is to not show the extreme tick labels (e.g. having ticks or grid lines for 0, 20, 40, 60, 80 and 100, but having tick labels for 20, 40, 60 and 80 only). An alternative solution is the calculation of the top padding on the basis of the font size: if the user increases the font size, and the overall vertical space is constrained (e.g. on a dashboard) then the padding gets increased a bit and the usable screen height for the chart gets decreased a bit.

5. There may be ticks without labels even inside the extent, or a hierarchy of ticks. E.g. a scale with a raster increment of 4 is considered problematic by the article, but if there are three unlabelled minor ticks between adjacent major ticks, it may work out fine.

6. The discussion focuses on static charts. With computer screens as dynamic media, the update of the axis is a special consideration. For example, the above mentioned font size change – as an accessibility feature, for example – may mean that fewer ticks can be labelled, to avoid overlap or close proximity of text. Similar needs may arise from changing the browser size, or from the possibility that the same electronic document may need to be rendered on a 3.5″ mobile screen or a 28″ display. Moreover, the data may dynamically change. I think it’s still an open question as to how best transition from one axis grid to another, because minor ticks may become major ticks, or they can go away; it’s best to stick to advice in this article, but other factors, e.g. tick constancy or avoidance of overlap or avoidance of overly sparse tick layout may conflict with it; and despite fancy animated axis transitioning, most current tools don’t convey the shift in the axis extent well enough. The effect is cool but the utility is varying.

I suppose the upshot of all these is that the recommendation of the article makes a lot of sense, and in general, designing an axis needs to observe this constraint but as seen above, a lot of other constraints and goals too, which makes it ain interesting optimization problem especially when a solution can’t assume a lot about the type of the data and the range of responsive features expected of it. It’s a great topic and I’m interested in the discussion of all other aspects that go into axis design or axis layouting logic.

By Daniel Zvinca. May 31st, 2016 at 1:51 am

Nick,

I am not saying that statement 2 is not valid in most of cases. I am just saying that it is not valid in some cases not necessarily related to historical reasons.

0, 25, 50, 75, 100 was just a quick example I find more informative than 0, 20, 40, 60, 80, 100 for situations where quarters are our references even if the space allows us to have both versions. I had no intention to show a corner case where six labels did not fit and five did. I also agree with Robert Monfera (see above) that not labeled ticks at 5, 10, 15 … can bring more clarity to such of scale.

One of the examples you propose as bad designed is the sequence 0, 4, 8, 12, …, 32. Take for instance a chart showing some statistics for GB RAM capacity of laptops. We know that these days RAM installed on computers is multiple of 4. I see no reason to design a scale for such of graph as 0, 5, 10, 15, 20, 25, 30, 35. As a matter of fact 0, 4, 8, 12, 16, 20, 24, 28, 32 is the right one.

Computer data, data transfer speed are measured in multiples of power of 2. A chart monitoring the traffic of a router with variation between 0 and 2MBytes/sec can be designed to have a scale with labels at 0, 256k, 512k, 768k, 1024k, …, 2048k with no mistake.

I am not sure about other measurement systems, but the metric system I mentioned which has a standard set of prefixes in powers of ten is 100% suitable for your second statement. Obviously, scales designed for other unit of measures (as monetary values) not related to metric system can follow the same rule.

In my opinion, decimal numeral system we all learned in school is the main reason we perform easy data interpretation in multiples of power of 10 and its divisors: 2 and 5. Yet they are cases where we need to adopt a different multiplying factor.

By Nick Desbarats. May 31st, 2016 at 6:32 am

Thanks for the detailed comments, Robert.

You, Daniel and Xan have raised some valid and important points, and I’ll be revising the post in the next few days to include several of them so that future readers who don’t happen to read the comments can benefit from them.

Stay tuned -thanks again.

By Stuart. June 1st, 2016 at 5:42 am

Thanks for your reply Nick. However, you still haven’t explained why anchoring at zero (especially when zero is not in the axis region) is necessary to improve perception.

If the axis scale is regular and uses ’round’ numbers then I would assume the user can estimate the value of a point between 2 gridlines or axis markers regardless of whether or not the axis would eventually have a marker or gridline at 0. Is there any research contradicting my assumption?

I would agree that having a 0 marker or gridline is better if the axis does cross 0, but my reasoning would be more for presentation than perception.

By Jonathon Carrell. June 1st, 2016 at 8:22 am

While there are noteworthy counter-points in the above comments, the article outlines what can be considered good rules of thumb for most situations.

By Nick Desbarats. June 1st, 2016 at 10:00 pm

Thanks for your comment, Stuart.

I think your intuition that zero should be a value label if the scale crosses zero also applies to scales that straddle any power of 10. If the scale straddles 100, for example, I find that it doesn’t look right if the value labels don’t include 100, and this is a possibility if the scale isn’t anchored at zero. The currency exchange rate graph in this blog post illustrates this well, with value labels of $0.95 and $1.10, but no $1.00 which, to my eye, seems like a glaring omission in the same way that a scale that straddles zero and that doesn’t include zero would have a glaring omission. The “anchoring at zero” constraint in combination with the other two constraints ensures that, if the scale crosses a power of 10, that that power of 10 will be one of the value labels. I’m not aware of any research that supports this intuition, however.

I agree that the scale values should be “round numbers”, but this is a subjective term. I think we’d agree, for example, that these are not “round number” scales:

174, 194, 214, 234
511, 561, 611, 661, 711

But would you consider one or more of these to be “round number” scales?

23, 28, 33, 38, 43
130, 150, 170, 190, 210
970, 990, 1,010, 1,030

Even if you consider any of these to be “round number” scales, to my eye anyway, their cognitive fluency would still be improved (though perhaps not dramatically improved) by anchoring them at zero:

25, 30, 35, 40, 45, 50
120, 140, 160, 180, 200
960, 980, 1,000, 1,020

As I mentioned, however, I’m not aware of any studies that have confirmed this.

By Stuart. June 2nd, 2016 at 2:22 am

I tend to agree with your anchoring point in general. I think because in order to check the scale of an axis, most people would read the first 2 labels and find the difference. That is always going to be easier if you’ve anchored it at a power of 10, or some other ’round’ number (in the sense that you’re doing the simplest possible arithmetic). I would suggest it’s easier for the average user to ascertain that the scale is 5 when the first two labels are 0 and 5 rather than 2 and 7, for example.

I would consider the latter two of your examples (130, 150,… and 970, 990,…) as OK. In both, it is very easy to see that the scale is 20 and calculating fractions of the distance between two markers or gridlines is simple. I don’t believe that starting them at 120 or 960 offer any real improvement, but would be interested to see whether others feel the same way. I just don’t think I’d consider whether the axis would cross 0 at all when I was reading the chart.

I agree with your other example though (23, 28,…) – perhaps because the scale now straddles the multiples of 10 (30, 40,…) and people are most familiar with a decimal counting system.

By Daniel Zvinca. June 2nd, 2016 at 4:10 am

Most of the countries use Celsius degree as unit of measures for temperature, still Farenheit degree seems very popular in US.

The range of 180° between 32F° (melting ice point) and 212 F° (boiling water point) can be a common interval for temperature variation related charts. If the temperature analyse focuses on variation of values in (hypothetically) 6 equal intervals of 180, it may be a chance that 32, 62, 92, 122, 152, 182, 212 scale can become relevant for certain audience. The gridlines in this case would offer a slightly more difficult value decoding possibility, but a clear delimitation of the 6 zones they might study.

I personally find unusual and difficult to read labels for a scale 32, 62, 92, …212. If Farenheit degree is a must I would probably design a scale between 0 and 220, use 20 as interval and use light color tones for those 6 zones.

Should the existing tools provide scale design features like custom limits (not anchored to zero) and a fixed amount of intervals (4, 5, 8, 10, 16 … equal intervals) ?

As a programmer I believe I have to offer this possibility to an analyst, but not as defaults. The default scale would anchor to zero and the proposed intervals would be power of 10 multiplied by 2 or 5.

By Andrew Craft. June 3rd, 2016 at 9:18 am

@Stuart: “I don’t believe that starting them at 120 or 960 offer any real improvement, but would be interested to see whether others feel the same way.”

For me, it’s not so much that they start at 120 or 960 that is the improvement, but rather that they straddle 200 or 1000 and so it seems to make more sense to include them.

But I suppose another way to explain it is to simply drop the rightmost zeroes and see what it leaves you:

12, 14, 16, 18, 20 makes more sense than 13, 15, 17, 19, 21.
96, 98, 100, 102 makes more sense than 97, 99, 101, 103.

To me, strictly even numbers seem easier to decipher than strictly odd numbers. Generally speaking, that is – of course there will always be exceptions.

Scale intervals may be, as you say, “easy to see” even when using a non-zero baseline (i.e. strictly odd numbers), but “calculating fractions of the distance between two markers or gridlines” may not be as simple as you suggest. Not for me anyway – I mean, of course I can do it, but I have always found it a little distracting.

In any case, I think we can all agree there should be a study on this topic; Stephen should probably add it to his long list of things for the infovis community to research (instead of whether personal preferences are ever more important than perception, or whether pie charts really are just the best thing ever).

By Nick Desbarats. June 3rd, 2016 at 11:02 am

Thanks to important comments made by readers Xan Gregg and Daniel Zvinka, I’ve updated the post to cover situations where the audience is used thinking of a measure as occurring in groups that number something other than 10. Thanks for the feedback.

By Nick Desbarats. June 3rd, 2016 at 11:04 am

Thanks for your detailed and valuable comments, Robert.

You raise very valid points, several of which I hadn’t considered before. I agree with almost all of them, and that much more could be written on this topic. My hope for this post, however, is that it provides simple guidelines to eliminate the most common and egregious cases.

My only comment would be wrt your fifth point. I agree that minor (unlabelled) ticks would improve cognitive fluency for a scale with an interval of 4, but one would still potentially end up with a scale that, for example, includes 8 and 12 but skips 10, or includes 28 and 32 but skips 30. As such, my inclination would still be to stick with intervals based on multipliers of 1, 2 or 5. Apart from that, I agree with you on all points.

By Jon W. June 9th, 2016 at 8:10 am

Nick, surely the “anchor” for a graph of conversion rates between Canadian and American dollars should be 1.00, not 0.00. The former has much more meaning than the latter in this context. Of course it makes no difference if you enforce the “powers of ten” rule, as 0.00 and 1.00 will both fall on the same grid. But being able to see 0.90, 1.00, 1.10, 1.20 on the chart at a glance is important because of those values’ natural reference to 1.00, not because if you extend the scale down to 0 (which is meaningless in this context) it will include 0 as a labelled point.

By Nick Desbarats. June 10th, 2016 at 12:45 pm

Thanks for your comment, Jon.

I agree with you. If a power of 10 is straddled by a scale, then that power of 10 should be one of the value labels on the scale. If a scale straddles more than one power of 10 (e.g., a scale from 0 to 200 straddles 100, 10, 1, 0.01, 0.001, etc.), then the largest power of 10 (i.e., 100 in this case) should appear as a value label on the scale.

If the three constraints in the post are all respected, any powers of 10 (e.g., 1.00, 1,000, 0.001, etc.) that are straddled by the scale will always appear as value labels on the scale. As such, one of the consequences of anchoring the scale in the currency exchange chart at zero (and respecting the other two constraints in the post) would force “1.00” to be among the value labels. I didn’t explicitly mention this as one of the rationales for the three constraints in the post, however, it is among them.

By Robert Monfera. June 27th, 2016 at 3:30 pm

Nick,

thank you for your reply! Indeed, your article is more tightly focussed than my comments, I was just taking the liberty as a reader :-) Regarding my #5, I was thinking of a chart where the major, labeled ticks are by 4, the minor, unlabeled ticks are by 1, and this does not rule out a special marker (e.g. showing a grid line, or a more salient grid line by 10). You’re right that it’s not generally preferred (and major by 5 is probably better) but if a major tick raster by 4 is even considered, then there may be a domain specific reason for it, e.g. showing computer memory, or pretty much anything that relates to binary or DNA base pairs, or even something arrived at via log(2), something like a historical supercomputer teraflops chart. Again, for generic use, I wouldn’t switch to a 4-raster over a 5-raster, unless…

… unless there were other good reasons for it. For example, orizontal space is very limited, and a 4-raster yields a superior chart; maybe the numeric range is tight and 5 would be too big, or would waste horizontal space; also, having just one grid line may be worse than having two or more (grid of 1 is hard to interpret).

This leads us to one of my favorite topics, that designing or generating data visualization is an optimization problem, replete with constraints, composite, weighted, partly antagonistic goals, uncertainties and assumptions about the loss function and variables that control chart type, presence and salience of aesthetics, choice of glyphs, use of highlight, and amount of context. The nature of the optimization is in line with the goal of data visualization: solving the focusing problem. If we remove the focus aspect, then pretty much any chart would do, as most information can technically be printed as a large SPLOM. The problem of focus is what may override otherwise sensible practices like axis tick raster, if there’s a lot of gain to offset the loss of a decimal-friendly raster.

All these present interesting challenges. For example, enlarging a chart yields more space than in the original version, and it will be possible to show more contextual information. But adding more context can detract from, or otherwise lessen, focus or key message. Which way it falls fully depends on the reader or user, and her current level of interest. A static newspaper can’t judge this so the editors will call the shot, based on one or two reader personas. But an online version can use information learnt of the user, and can monitor in real time the length of engagement, the nature of activity (scrolling, hovering, clicking around, switching tabs vs staying on, and with permission, even eye tracking; historical activity and user profile information; current events, timeliness of the article).

My feeling is that currently, most design is one-off and informal; in the future, it might be more like shaping or contouring a space of results, based on best practices, declarative rules and optimization, taking into account all relevant information and constraints, as opposed to yielding a single design. Why would not a future news web site switch to thicker lines and fewer, larger text annotations if the user removes his glasses? Why wouldn’t a sparkline get upgraded to a bandline if the user becomes interested in the context of the data distribution, and then on to a richly annotated, full screen time series explorer that links to related visualizations, if the user feels like exploring?

By Robert Monfera. June 27th, 2016 at 3:33 pm

P.S. ‘showing computer memory’ is Daniel Zvinca’s example.

By CM Lim. June 27th, 2016 at 11:07 pm

Hi Nick,
Appreciated your article. Just want to point out that to know the most recent value of the chart in XE.com, one only need to hover your mouse at that point, and it will tell you the value e.g. 1.80449. No mental calculation required, it tells you up till 5 decimal point.

Thanks
cm

By Nick Desbarats. June 28th, 2016 at 8:15 am

Thanks for the great comments, Robert.

Your vision of dynamically adaptive graphs is a potentially powerful one, and I’m interested in seeing how this will play out, as well. Just as sophisticated websites such as Amazon adapt themselves based on what they know about the individual user and machine learning trained by the behavior of many users, it will be interesting to see if and how online graphs can evolve in a similar manner. If I go by the track records of most data visualization technology vendors, however, I think that the more likely outcome is that we will be dismayed with these new features because, for example, graphs might adapt themselves in ways that cause different users come away with materially different and incompatible understandings of the underlying data. I hope that this doesn’t turn out to be the case, although regular readers of this blog will know how unfortunately necessary it is to temper any optimism they may feel regarding the motives and competence of many data visualization tool vendors.

By Nick Desbarats. June 28th, 2016 at 8:57 am

Thanks for your comment, CM.

Indeed, in this case, the user can hover their cursor over objects in the graph to see an exact value. This is a useful feature, but I don’t think that it makes it any less important to follow the constraints in this post since users will be forced to hover over objects to get exact values far more often if the scale is poorly designed.

Leave a Reply