Visual Statistics — A worthwhile new book, but one that is definitely for statisticians

I returned late last week from nearly three weeks of work in Europe, which ended with a two-day workshop that I taught for the Swiss Statistical Society. Nestled in a majestic valley in the Swiss Alps, we spent our days talking about how these talented statistical analysts could enhance their work by learning to communicate their findings more clearly and by using their eyes to supplement abstract statistical techniques. Later this year at their annual conference, they will hear a keynote presentation from Michael Friendly, Ph.D., who is a professor in the Department of Psychology at York University in Toronto, Canada. Among his many talents, Friendly is a trained statistician and an aficionado in the use of visual techniques for statistical analysis. Along with two other authors, Forrest W. Young, Ph.D., of the University of North Carolina (recently deceased), and Pedro M. Valero-Mora, Ph.D, of the University of Valencia in Spain, Friendly has written a new book on the topic entitled Visual Statistics: Seeing Data with Dynamic Interactive Graphics (John Wiley & Sons, Inc., 2006). Always eager to find new sources of insight into data visualization, especially as it applies to analysis, I read the book during my recent stay in Europe.

I don’t intend to review the book comprehensively in this brief blog post, but I would like to comment on its potential usefulness for my primary audience, which consists largely of business people who work with data, but lack advanced statistical training. I was encouraged when I began to read the introduction that this might be a book I could recommend to this audience. The authors’ message rang true to my experience and seemed to share my goals:

Statistical data analysis provides the most powerful tools for understanding data, but the systems currently available for statistical analysis are based on a 40-year-old computing model, and have become much too complex. What we need is a simpler way of using these powerful analysis tools.

Visual statistics is a simpler way. Its dynamic interactive graphics are in fact an interface to these time-proven statistical analysis tools, an interface that presents the results of the hidden tools in a way that helps ensure that out intuitive visual understanding is commensurate with the mathematical statistics under the surface. Thus, visual statistics eases and strengthens the way we understand data and, therefore, eases and strengthens our scientific understanding of the world around us.

…

It is our aim to communicate the intrigue of statistical detective work and the satisfaction and excitement of statistical discovery, by emphasizing visual intuition without resorting to mathematical callesthenics [sic]…Seldom is there mention of populations, samples, hypothesis tests, and probability levels…This book is written for readers without strong mathematical or statistical background, those who are afraid of mathematics or who judge their mathematical skills to be inadequate; those who have had negative experiences with statistics or mathematics, and those who have not recently exercised their match or stats skills. Parts I, II, and III are for you.

The book only seems to consist only of Parts I, II, and III, so I interpret the final statement to mean that non-statisticians should find the book non-intimidating and accessible. What I discovered in reading the book, however, is that, despite how useful it might be as a primer in visual analysis for statisticians, it is steeped in the concepts and language of statistics, and lacks the explanations that would be needed by non-statisticians to make use of the material. I have no doubt that the authors attempted to reach out to non-statisticians. I suspect, however, that they are too immersed in an academic statistical mindset to recognize when they are using terms and discussing concepts that are unfamiliar to the uninitiated. Terms such as Box-Cox transformation, Euclidean space, kernel density curve, p-value, and Pearson’s chi square are par for the course. Early in chapter 2, which provides some actual data sets and analytical challenges that are used throughout the book, the reader is already faced with material like the following:

The spreadplot (a kind of multiplot visualization that is introduced in chapter 4) for the initial model, (GPE)(M) is shown in Figure 2.9 (on the following two pages). This model fits very poorly, of course (G2 = 107, df = 7, p < 0.001). The G2 measure is a badness-of-fit measure. Low values are good, high values are bad. The empty model, reported here, has a very large value of G2, meaning the fit is very poor, which, of course, it must be, since it has no terms. The hypothesis test, when rejected, as is the case here, indicates that the model does not fit the data. 

At this point, as someone whose statistical knowledge can fit comfortably in a thimble, my eyes began to glaze over. Please don’t misunderstand me. I am not saying that this is not a good book. I suspect that this is a very important book for statisticians, because it introduces them to the power of visual analysis, which most statisticians under-appreciate. This just isn’t a book for non-statisticians.

One more observation that I want to make about this book is one that applies to many books on data visualization: the value of books on this topic is dramatically undermined when they are not printed in color. I felt badly for the authors when they bemoaned this unfortunate decision by the publisher to save costs by printing the book in black-on-white:

Unfortunately, mosaic displays are best viewed in color, and we are forced to use black and white. (We do the best we can, but to be honest, the black-and-white versions…do not do justice to the mosaic displays. If you can view this online, please do; it will help).

It wasn’t only the mosaic displays that would have benefited from color. Perhaps the authors already had their contract in place with John Wiley & Sons, Inc., before they realized that color was not an option, and then found that they had no power to change this. If you ever plan to write a book about data visualization, get an up-front guarantee from the publisher that the book will be printed in color, or you’ll end up having to make sad disclaimers to your readers like the one above.

Take care,

Signature

3 Comments on “Visual Statistics — A worthwhile new book, but one that is definitely for statisticians”


By John Dawson. July 2nd, 2007 at 12:27 pm

As a former statistician working in the marketing industry, i’ve found that data visualisation is a terribly difficult thing to achieve in the run of the mill packages available to most stasticians. This is one reason why I started my own software business called marketingQED. To try and bring easier to understand statistics to marketing and business decision making.

I won’t claim that we’ve solved the issue of great stats but bad communication but we’re getting better at it all the time.

By Jorge Camoes. July 2nd, 2007 at 2:50 pm

Stephen
I am curious about this book. I think there is a gap between visualization and statistics that doesn’t make sense and somewhere in the future they will merge (it is a bold statement, I know…). It is all about understanding and/or communicating the data, and what really matters is to find the right tool for the job, be that an average or a pie chart.

I admire Tukey and Cleveland (among others) for the bridges they built and this book seems to follow that noble tradition.

By Chris C. August 30th, 2007 at 7:58 pm

I agree there is a consistent, traditional gap.
Most statisticians lack the exposure to significant visualization tools/techniques in school, they are not trained as computer scientists to understand ETL operations or the power of databases to manipulate data, but they are really experts at interpreting good quality data sets = files. There are numerous places in in which the 80/20 SAS ratio is known, meaning 80% of the SAS use is for data cleaning, scrubing, deduping, joining…pardon “merging by” etc and 20% analysis, including maybe visualization. I guess the confusion about visualization as a presentation vs. analytic tool still persists. At the end of the day, this (statisticians) is a lot better group to work with than classy BI IT groups, that has a totally different perspective (dashboards). Perhaps explaining the differnce between a report and an analysis could be another topic for PE’s library.