Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

2014: A Year to Surpass

January 6th, 2015

Perhaps you’ve noticed that I didn’t write a year-in-review blog post about 2014, extolling the wonderful progress that we made and predicting the even-more-wonderful breakthroughs that we’ll make in 2015. That’s because, in the field of data sensemaking and presentation in general and data visualization in particular, we didn’t make any noticeable progress last year, despite grand claims by vendors and so-called thought leaders in the field. Since the advent of the computer (and before that the printing press, and before that writing, and before that language), data has always been BIG, and Data Science has existed at least since the time of Kepler. Something did happen last year that is noteworthy, however, but it isn’t praiseworthy: many organizations around the world invested heavily in information technologies that they either don’t need or don’t have the skills to use.

I know that during the last year many skilled data sensemakers used their talents to find important signals in data that made a difference to their organizations. Smart, dedicated, and properly skilled people will always manage to do good work, despite the limitations of their tools and the naiveté of their organizations. I don’t mean to diminish these small pockets of progress in the least. I just want data sensemaking progress to become more widespread, less of an exception to the norm.

Data sensemaking is hard work. It involves intelligence, discipline, and skill. What organizations must do to use data more effectively doesn’t come in a magical product and cannot be expressed as a marketing campaign with a catchy name, such as Big Data or Data Science.

Dammit! This is not the answer that people want to hear. We’re lazy. We want the world to be served up as a McDonald’s Happy Meal. We want answers at the click of a button. The problem with these expectations, however, is not only that they’re unrealistic, but also that they describe a world that only idiots could endure. Using and developing our brains is what we evolved to do better than any other animal. Learning can be ecstatic.

Most of you who read this blog already know this. I’m preaching to the choir, I suppose, but I keep hoping that, with enough time and effort, the word will spread. A better world can only be built on better decisions. Better decisions can only be made with better understanding. Better understanding can only be achieved by thoughtfully and skillfully sifting through information about the world. Isn’t it time that we abandoned our magical thinking and got to work?

Take care,

Signature

Seats Still Available at Inaugural Signal Workshop

January 5th, 2015

This blog entry was written by Bryan Pierce of Perceptual Edge.

It’s just over one week until Stephen will be teaching his new advanced course Signal: Understanding What Matters in a World of Noise in Berkeley, CA on January 13–14. If you’re interested in attending, there are still some seats available. We look forward to seeing you in Berkeley!

-Bryan

Stephen Few’s Public Workshops for 2015

December 16th, 2014

This blog entry was written by Bryan Pierce of Perceptual Edge.

In 2015, Stephen Few will offer different combinations of five data visualization courses at public workshops around the world. He’ll teach his three introductory courses, Show Me the Numbers: Table and Graph Design (now as a two-day course, with additional content and several more small-group exercises and discussions), Information Dashboard Design, and Now You See It: Visual Data Analysis. He’s also introducing two new advanced courses for people who have already attended the prerequisite introductory courses or read the associated books and are looking to hone their skills further: Signal: Understanding What Matters in a World of Noise and Advanced Dashboard Design.

Stephen will teach the following public workshops in 2015:

  • Berkeley, California on Jan 13 – 14: Signal: Understanding What Matters in a World of Noise
  • Berkeley, California on Jan 27 – 29: Advanced Dashboard Design (Sold Out!)
  • Copenhagen, Denmark on Feb 24 – 26: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • London, U.K. on Mar 2 – 3: Show Me the Numbers: Table and Graph Design
  • London, U.K. on Mar 4 – 6: Advanced Dashboard Design
  • Sydney, Australia on Mar 23 – 24: Signal: Understanding What Matters in a World of Noise
  • Sydney, Australia on Mar 25 – 27: Advanced Dashboard Design
  • Stockholm, Sweden on Apr 21 – 23: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Portsmouth, Virginia on Apr 28 – 30: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • Soest, Netherlands on May 6 – 8: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Soest, Netherlands on May 11 – 12: Signal: Understanding What Matters in a World of Noise
  • Minneapolis, Minnesota on Jun 2 – 4: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Portland, Oregon on Sep 22 – 24: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Dublin, Ireland on Oct 6 – 8: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • Wellington, New Zealand on Nov 4 – 6: Show Me the Numbers: Table and Graph Design and Information Dashboard Design (Registration not yet open)
  • Sydney, Australia on Nov 11 – 13: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis (Registration not yet open)

These workshops are a great way to learn the data visualization principles that Stephen teaches in his books.

-Bryan

Big Dataclast: My Concerns about Dataclysm

December 11th, 2014
Dataclysm_Cover

If you’re familiar with my work, you know that I am an iconoclast within the business intelligence (BI) and analytics communities, refusing to join the drunkard’s party of hyperbolic praise for information technologies. The “clast” portion of the term “iconoclast” means “to break.” I often break away from the herd, and break the mold of convention, to say what I believe is true in the face of misinformation. My opinion that so-called Big Data is nothing more than marketing hype is a prime example of this. Despite my opinion of Big Data, I approached Christian Rudder’s book Dataclysm with great interest. The following excerpt from the book’s opening paragraph gave me hope that this was not just another example of marketing hype.

You have by now heard a lot about Big Data: the vast potential, the ominous consequences, the paradigm-destroying new paradigm it portends for mankind and his ever-loving websites. The mind reels, as if struck by a very dull object. So I don’t come here with more hype or reportage on the data phenomenon. I come with the thing itself: the data, phenomenon stripped away. I come with a large store of the actual information that’s being collected, which luck, work, wheedling, and more luck have put me in the unique position to possess and analyze.

What can this large store of actual information reveal?

Practically by accident, digital data can now show us how we fight, how we love, how we age, who we are, and how we’re changing. All we have to do is look: from just a very slight remove, the data reveals how people behave when they think no one is watching.

You can imagine my excitement upon reading these words. Actual insights gleaned from large data stores; a demonstration of data’s merits rather than confusing, hyperbolic claims. What is this “phenomenon,” however, that Rudder has “stripped away” from the data? To my great disappointment, I found that the context that’s required to gain real insights was often stripped away. Only a few pages into the book I already found myself stumbling over some of Rudder’s assumptions and practices. I enjoyed his casual, hip voice, and I greatly appreciated the excellent, no-nonsense design of his graphs (excluding the silly voronoi tessellation treemap), but some of his beliefs about data sensemaking gave me pause.

Rudder explains that he is trying to solve a real problem with many of behavioral science’s findings. Experimental research studies typically involve groups of test subjects that are not only too small for trustworthy results but also cannot be generalized because they consist almost entirely of homogeneous sets of college students.

I understand how it happens: in person, getting a real representative data set is often more difficult than the actual experiment you’d like to perform. You’re a professor or postdoc who wants to push forward, so you take what’s called a “convenience sample”—and that means the students at your university. But it’s a big problem, especially when you’re researching belief and behavior. It even has a name: It’s called WEIRD research: white, educated, industrialized, rich, and democratic. And most published social research papers are WEIRD.

Rudder’s concern is legitimate. His solution, however, is lacking.

Rudder is a co-founder of the online dating service OKCupid. As such, he has access to an enormous amount of data that is generated by the choices that customers make while seeking romantic connections. Add to this the additional data that he’s collected from other social media sites, such as Facebook and Twitter, and he has a huge data set. Even though the people who use these social media sites are more demographically diverse than WEIRD college students, they don’t represent society as a whole. Derek Ruths of McGill University and Jürgen Pfeffer of Carnegie Mellon University recently expressed this concern in an article titled “Social Medial for Large Studies of Behavior,” published in the November 28, 2014 issue of Science. Also, the conditions under which the data was collected exercise a great deal of influence, but Rudder has “stripped away” most of this context.

Contrary to his disclaimers about Big Data hype, Rudder expresses some hype of his own. Social media Big Data opens the door to a “poetry…of understanding. We are at the cusp of momentous change in the study of human communication.” He believes that the words people write on these sites provide the best source of information to date about the state and nature of human communication. I believe, however, that this data source reveals less than Rudder’s optimistic assessment. I suspect that it mostly reveals what people tend to say and how they tend to communicate on these particular social media sites, which support specific purposes and tend to be influenced by technological limitations—some imposed (e.g., Twitter’s 140 character limit) and others a by-product of the input device (e.g., the tiny keyboard of a smartphone). We can certainly study the effects that these technological limitations have on language, or the way in which anonymity invites offensive behavior, but are we really on the “cusp of momentous change in the study of human communication”? To derive useful insights from social media data, we’ll need to apply the rigor of science to our analyses just as we do with other data sources.

Rudder asserts:

Behind every bit in my data, there are two people, the actor and the acted upon, and the fact that we can see each as equals in the process is new. If there is a “-clysm” part of the whole data thing, if this book’s title isn’t more than just a semi-clever pun or accident of the alphabet—then this is it. It allows us to see the full human experience at once, not just whatever side we happen to be paying attention to at a given time.

Having read the book, I found that the book’s title is only a “semi-clever pun.” Contrary to Rudder’s claim, his data does not “allow us to see the full human experience at once.” In fact, it provides a narrow window through which we can observe anonymous interactions between strangers in particular contexts that are designed for specific purposes (e.g., getting a date). Many of the findings that Rudder presents are fun and interesting, but we should take them with a grain of salt.

Fairly early in the book, Rudder presents his findings about women’s preferences in men and also about men’s preferences in women, but it isn’t clear what the data actually means because he’s stripped it of context. For example, when describing women’s age preferences—“the age of men she finds most attractive”—he fails to mention the specific data he based his findings on and the context in which it was collected. Were women being shown a series of photographs, two men at a time, and asked to select the one they found more attractive of the two; was the data based solely on the ages of the men that women contacted in hope of a date; or was the data drawn from some other context? Scientists must describe the designs of their studies, including the specific conditions under which they collected their data. Without understanding this context, we can’t understand the findings and certainly can’t rely on them.

Later in the book, Rudder presents findings about the number of messages people receive on OKCupid in correlation to their physical attractiveness. His graph of his findings displays a positive correlation between the two variables that remains steady through most of the series but suddenly increases to an exponential relationship beginning around the 90th percentile of attractiveness. He sliced and diced his findings in several ways, but never provided a fundamental and important piece of information: how was physical attractiveness measured? Important insights might reside in this data, but we can’t trust them without the missing context.

Rudder seems to be endorsing a typical tenet of Big Data hype that concerns me, which I’ll paraphrase as, “With Big Data we no longer need to adhere to the basic principles of science.” I applaud Rudder’s efforts to expose bad science in the form of small, demographically homogeneous groups of test subjects, but his alternative introduces its own set of problems, which are just as harmful. I suspect that Rudder endorses this particular alternative because it is convenient. He’s a co-founder of a social media site that collects tons of data. It’s in his interest to enthusiastically endorse this Big Data approach. I trust that Rudder’s conscious intentions are pure, but I believe that his perspective is biased by his role, experience, and interests.

Sourcing data from the wild rather than from controlled experiments in the lab has always been an important avenue of scientific study. These studies are observational rather than experimental. When we do this, we must carefully consider the many conditions that might affect the behavior that we’re observing. From these observations, we carefully form hypotheses, and then we test them, if possible, in controlled experiments. Large social media data sets don’t alleviate the need for this careful approach. I’m not saying that large stores of social media data are useless. Rather, I’m saying that if we’re going to call what we do with it data science, let’s make sure that we adhere to the principles and practices of science. How many of the people who call themselves “data scientists” on resumes today have actually been trained in science? I don’t know the answer, but I suspect that it’s relatively few, just as most of those who call themselves “data analysts” of some type or other have not been trained in data analysis. No matter how large the data source, scientific study requires rigor. This need is not diminished in the least by data volume. Social media data may be able to reveal aspects of human behavior that would be difficult to observe in any other way. We should take advantage of this. However, we mustn’t treat social media data as magical, nor analyze it with less rigor than other sources of data. It is just data. It is abundantly available, but it’s still just data.

In one example of insights drawn from social media data, Rudder lists words and phrases that are commonly used by particular groups of people but aren’t commonly used by other groups. He also listed the opposite: words that are least often used by particular groups but are commonly used by other groups. His primary example featured the words and comments of men among the following four ethnic groups: white, black, Latino, and Asian. Here’s the top ten words and phrases that Rudder listed for white men:

my blue eyes
blonde hair
ween
brown hair
hunting and fishing
Allman brothers
woodworking
campfire
redneck
dropkick murphys

I’m a white man and I must admit that this list does not seem to capture the essence of white men in general. I found the lists interesting, when considered in context, but far less revealing that Rudder claimed. Here’s an example of the “broad trends” that Rudder discerned from this approach to data analysis:

White people differentiate themselves mostly by their hair and eyes. Asians by there country of origin. Latinos by their music.

One aspect of the data that Rudder should have emphasized more is that it was derived from people’s responses to OKCupid profile questions. This means that we shouldn’t claim anything meaningful about these words and phrases apart from the context of self-description when trying to get a date. Another more fundamental problem is that by limiting the list to words and phrases that were used uniquely by particular groups, the list fails to represent the ways in which these groups view themselves overall. In other words, if the top words and phrases used by each of these groups were listed without filtering them on the basis of uniqueness (i.e., little use by other ethnic groups), very different self-descriptions of these groups would emerge.

I appreciate Rudder’s attempts to mine the relatively new and rich data resources that are available to him. Just as others have done before, he has an opportunity to reveal unknown and interesting aspects of human behavior that are specific to the contexts from which he has collected data. If he had stayed within these natural boundaries, I would have enjoyed his observations thoroughly. Unfortunately, Rudder strayed beyond these boundaries into the realm of claims that exceed his data and methods of analysis. This is an expression of Big Data hyperbole and technological solutionism. The only “data science” that is worthy of the name is, above all, rooted in the principles and practices of science. We dare not forget this in our enthusiasm.

Take care,

Signature

Assessing Risk versus Uncertainty: A Review of “Risk Savvy”

December 2nd, 2014
Risk Savvy

In the opening chapter of Risk Savvy: How to Make Good Decisions by Gerd Gigerenzer, he writes:

When something goes wrong, we are told that the way to prevent further crisis is better technology, more laws, and bigger bureaucracy. How to protect ourselves from the next financial crisis? Stricter regulations, more and better advisers. How to protect ourselves from the threat of terrorism? Homeland security, full body scanners, further sacrifice of individual freedom. How to counteract exploding costs in health care? Tax hikes, rationalization, better genetic markers. One idea is absent from these lists: risk-savvy citizens.

In this wonderful book, Gigerenzer shows how to assess and deal with risk (i.e., known risks) versus uncertainty (i.e., unknown risks). “If risks are known, good decisions require logic and statistical thinking…If some risks are unknown, good decisions also require intuition and smart rules of thumb.” He provides practical guidelines for applying the right types of thinking to many real problems that we face in life, such as those related to finance and health care, leading to better decisions. He exposes the folly of trusting statistical algorithms when they’re applied to high levels of uncertainty, such as the behavior of the stock market. He also proposes ways to simplify the understanding of probabilities associated with quantifiable risks, such as the risk of dying from prostate cancer if you test positive on a prostate-specific antigens (PSA) test, which many medical doctors misunderstand.

He cares deeply about health care and is intimately acquainted with its failures in decision making.

The major cause is the unbelievable failure of medical schools to provide efficient training in risk literacy. Medical progress has become associated with better technologies, not with better doctors who understand these technologies. Medical students have to memorize tons of facts about common and rare diseases. What they rarely learn is statistical thinking and critical evaluation of scientific articles in their own field. Learning is geared toward performance in the final exam, which shows little correlation with clinical experience.

He argues that society as a whole must become risk literate, which involves a basic level of statistical thinking that few of us learn in school. Of all branches of mathematics, statistics is the most practical and broadly applicable to life, yet it isn’t routinely taught in school along with algebra, geometry, and calculus. Taught properly, children can easily learn the basics of statistical thinking. Gigerezner, and I also, believe that this training is essential and becoming more so each day. “As a general policy, coercing and nudging people like a herd of sheep instead of making them competent is not a promising vision for a democracy.”

Here, in his own words, is what he argues in the book:

  1. Everyone can learn to deal with risk literacy. In this book, I will explain principles that are easily understood by everyone who dares to know.
  2. Experts are part of the problem rather than the solution. Many experts themselves struggle with understanding risks, lack skills in communicating them, and pursue interests not aligned with yours. Giant banks go bust for exactly these reasons. Little is gained when risk-illiterate authorities are placed in charge of guiding the public.
  3. Less is more. When we face a complex problem, we look for a complex solution. And when it doesn’t work, we seek an even more complex one. In an uncertain world, that’s a big error. Complex problems do not always require complex solutions. Overly complicated systems, from financial derivatives to tax systems, are difficult to comprehend, easy to exploit, and possibly dangerous. And they do not increase the trust of the people. Simple rules, in contrast, can make us smart and create a safer world.

His arguments are convincing and the principles that he teaches are practical and accessible. I recommend this book.

Take care,

Signature