Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

 

TIBCO Spotfire Promotes an Insidious Myth

March 17th, 2015

The number of viable visual data exploration and analysis tools can be counted on the fingers of one hand. TIBCO Spotfire is among them. The merits of this product are undermined, however, by the irresponsible ways that TIBCO is currently promoting it. A new marketing campaign by TIBCO illustrates what happens when marketing professionals who either don’t understand analytics or care little for the truth are allowed free rein.

Here are a few lines from TIBCO Spotfire’s new “Finally…Answers Made Easy” campaign (emphasis mine), supposedly written by the company’s CTO, Matt Quinn:

At TIBCO, we believe just visualizing data isn’t enough. Embedded deep in the brains of data scientists lies a knowledge set that can truly benefit any one of us who has ever struggled with the dilemma of which graph to choose for a given data set. How many times have you highlighted a data set in Excel, selected Insert Chart and ended up with nonsense? You try a different chart, play with the axes, change the numerous options – before you know it, you’ve wasted an hour and haven’t made any progress. You certainly haven’t gotten anywhere near insight or understanding. Imagine if your software knew what you needed to see, even if you didn’t?

We have mined the data in the data scientists’ brains and shared what they know about visualizations: all the arcane rules about using measures on density plots, when to use aggregations and how to use time series correctly. Spotfire will automatically examine your data and recommend the best visualizations for it. Allowing our algorithm to choose the correct visualization will let you focus on what you know best – your business.

When you took your driving test, they didn’t ask you to explain the principles of the internal combustion engine – you just trust it works. Whereas your grandparents may have been a dab hand with a spanner and an oil can, life has moved on. So it will be for the future of analytics – it will work smarter, so you don’t have to.

Similar to Tableau, Spotfire attempts to determine an appropriate chart based on the data that you’ve selected. This is a useful time saver when it’s done well, but it can’t peek into your mind to determine what you want to see, so its guesses are frequently wrong. This feature can also serve as a useful guide for data analysis novices, but in this potential also lies the problem: you can’t let software do your thinking for you. The big lie that’s being told here appears in the last few words: “It will work smarter, so you won’t have to.” This is not only a lie—it’s a dangerous lie that keeps organizations trapped in ignorance, wasting their time, unable to tap into the value of their data.

Well-designed software can indeed help you “work smarter,” but not “so you won’t have to” work smart yourself.  Data exploration and analysis software, no matter how good it is, cannot provide a workaround for your lack of analytical skill. Software vendors hurt you and ultimately hurt themselves when they claim that their products can be used effectively without the requisite analytical skills. They hurt themselves because, when customers learn that they were sold a lie and can’t actually use the software effectively, they become disgruntled and eventually move on to another product. Sadly, they rarely make a better choice the next time around, and the doomed process begins anew. No one wants to believe that a product that they spent a great deal of money to buy won’t solve their problems.

This marketing lie is in line with the “self-service BI” lie that’s been told for ages. The notion that BI software can auto-magically enable people without analytical skills to make sense of data is ludicrous, yet it’s an appealing lie. We want something for nothing, but the world doesn’t work this way. Analytical tools can’t help us do better and faster what we don’t already know how to do ourselves. It can only augment our intelligence—extend our reach and help us work around limitations—never replace our need for intelligence and skill.

TIBCO is certainly not alone in its willingness to spread misinformation in its attempts to sell its products. Every one of the viable visual data exploration and analysis software vendors have played fast and loose with the truth and mislead potential buyers to varying degrees. Most of the wannabe (i.e., not viable) vendors in the space are even worse.

I suspect that the first vendor in the analytics space that’s willing to tell the truth about its product and what’s required to use it will eventually lead the market, assuming that its product is good, even though they’ll lose many sales in the process. A vendor could differentiate itself from the pack by being truthful. The people who spend their days trying to make sense of data tend to respect truth. They’d find it refreshing to witness honesty coming from a software vendor. This vendor could honestly say, “Here’s the good news. The skills needed to analyze data can be learned by any reasonably intelligent person, given the right resources and enough practice.” This is indeed good news, but it’s not as sexy as the claim that a software product can replace the need for skill.

Damn, damn, damn…getting value from data requires skill and effort. After all of these years of trying and failing to get value from data without paying our dues, why are we still so willing to believe otherwise? There are no shortcuts to enlightenment.

Take care,

Signature

2014: A Year to Surpass

January 6th, 2015

Perhaps you’ve noticed that I didn’t write a year-in-review blog post about 2014, extolling the wonderful progress that we made and predicting the even-more-wonderful breakthroughs that we’ll make in 2015. That’s because, in the field of data sensemaking and presentation in general and data visualization in particular, we didn’t make any noticeable progress last year, despite grand claims by vendors and so-called thought leaders in the field. Since the advent of the computer (and before that the printing press, and before that writing, and before that language), data has always been BIG, and Data Science has existed at least since the time of Kepler. Something did happen last year that is noteworthy, however, but it isn’t praiseworthy: many organizations around the world invested heavily in information technologies that they either don’t need or don’t have the skills to use.

I know that during the last year many skilled data sensemakers used their talents to find important signals in data that made a difference to their organizations. Smart, dedicated, and properly skilled people will always manage to do good work, despite the limitations of their tools and the naiveté of their organizations. I don’t mean to diminish these small pockets of progress in the least. I just want data sensemaking progress to become more widespread, less of an exception to the norm.

Data sensemaking is hard work. It involves intelligence, discipline, and skill. What organizations must do to use data more effectively doesn’t come in a magical product and cannot be expressed as a marketing campaign with a catchy name, such as Big Data or Data Science.

Dammit! This is not the answer that people want to hear. We’re lazy. We want the world to be served up as a McDonald’s Happy Meal. We want answers at the click of a button. The problem with these expectations, however, is not only that they’re unrealistic, but also that they describe a world that only idiots could endure. Using and developing our brains is what we evolved to do better than any other animal. Learning can be ecstatic.

Most of you who read this blog already know this. I’m preaching to the choir, I suppose, but I keep hoping that, with enough time and effort, the word will spread. A better world can only be built on better decisions. Better decisions can only be made with better understanding. Better understanding can only be achieved by thoughtfully and skillfully sifting through information about the world. Isn’t it time that we abandoned our magical thinking and got to work?

Take care,

Signature

Seats Still Available at Inaugural Signal Workshop

January 5th, 2015

This blog entry was written by Bryan Pierce of Perceptual Edge.

It’s just over one week until Stephen will be teaching his new advanced course Signal: Understanding What Matters in a World of Noise in Berkeley, CA on January 13–14. If you’re interested in attending, there are still some seats available. We look forward to seeing you in Berkeley!

-Bryan

Stephen Few’s Public Workshops for 2015

December 16th, 2014

This blog entry was written by Bryan Pierce of Perceptual Edge.

In 2015, Stephen Few will offer different combinations of five data visualization courses at public workshops around the world. He’ll teach his three introductory courses, Show Me the Numbers: Table and Graph Design (now as a two-day course, with additional content and several more small-group exercises and discussions), Information Dashboard Design, and Now You See It: Visual Data Analysis. He’s also introducing two new advanced courses for people who have already attended the prerequisite introductory courses or read the associated books and are looking to hone their skills further: Signal: Understanding What Matters in a World of Noise and Advanced Dashboard Design.

Stephen will teach the following public workshops in 2015:

  • Berkeley, California on Jan 13 – 14: Signal: Understanding What Matters in a World of Noise
  • Berkeley, California on Jan 27 – 29: Advanced Dashboard Design (Sold Out!)
  • Copenhagen, Denmark on Feb 24 – 26: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • London, U.K. on Mar 2 – 3: Show Me the Numbers: Table and Graph Design
  • London, U.K. on Mar 4 – 6: Advanced Dashboard Design
  • Sydney, Australia on Mar 23 – 24: Signal: Understanding What Matters in a World of Noise
  • Sydney, Australia on Mar 25 – 27: Advanced Dashboard Design
  • Stockholm, Sweden on Apr 21 – 23: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Portsmouth, Virginia on Apr 28 – 30: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • Soest, Netherlands on May 6 – 8: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Soest, Netherlands on May 11 – 12: Signal: Understanding What Matters in a World of Noise
  • Minneapolis, Minnesota on Jun 2 – 4: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Portland, Oregon on Sep 22 – 24: Show Me the Numbers: Table and Graph Design and Information Dashboard Design
  • Dublin, Ireland on Oct 6 – 8: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis
  • Wellington, New Zealand on Nov 4 – 6: Show Me the Numbers: Table and Graph Design and Information Dashboard Design (Registration not yet open)
  • Sydney, Australia on Nov 11 – 13: Show Me the Numbers: Table and Graph Design and Now You See It: Visual Data Analysis (Registration not yet open)

These workshops are a great way to learn the data visualization principles that Stephen teaches in his books.

-Bryan

Big Dataclast: My Concerns about Dataclysm

December 11th, 2014
Dataclysm_Cover

If you’re familiar with my work, you know that I am an iconoclast within the business intelligence (BI) and analytics communities, refusing to join the drunkard’s party of hyperbolic praise for information technologies. The “clast” portion of the term “iconoclast” means “to break.” I often break away from the herd, and break the mold of convention, to say what I believe is true in the face of misinformation. My opinion that so-called Big Data is nothing more than marketing hype is a prime example of this. Despite my opinion of Big Data, I approached Christian Rudder’s book Dataclysm with great interest. The following excerpt from the book’s opening paragraph gave me hope that this was not just another example of marketing hype.

You have by now heard a lot about Big Data: the vast potential, the ominous consequences, the paradigm-destroying new paradigm it portends for mankind and his ever-loving websites. The mind reels, as if struck by a very dull object. So I don’t come here with more hype or reportage on the data phenomenon. I come with the thing itself: the data, phenomenon stripped away. I come with a large store of the actual information that’s being collected, which luck, work, wheedling, and more luck have put me in the unique position to possess and analyze.

What can this large store of actual information reveal?

Practically by accident, digital data can now show us how we fight, how we love, how we age, who we are, and how we’re changing. All we have to do is look: from just a very slight remove, the data reveals how people behave when they think no one is watching.

You can imagine my excitement upon reading these words. Actual insights gleaned from large data stores; a demonstration of data’s merits rather than confusing, hyperbolic claims. What is this “phenomenon,” however, that Rudder has “stripped away” from the data? To my great disappointment, I found that the context that’s required to gain real insights was often stripped away. Only a few pages into the book I already found myself stumbling over some of Rudder’s assumptions and practices. I enjoyed his casual, hip voice, and I greatly appreciated the excellent, no-nonsense design of his graphs (excluding the silly voronoi tessellation treemap), but some of his beliefs about data sensemaking gave me pause.

Rudder explains that he is trying to solve a real problem with many of behavioral science’s findings. Experimental research studies typically involve groups of test subjects that are not only too small for trustworthy results but also cannot be generalized because they consist almost entirely of homogeneous sets of college students.

I understand how it happens: in person, getting a real representative data set is often more difficult than the actual experiment you’d like to perform. You’re a professor or postdoc who wants to push forward, so you take what’s called a “convenience sample”—and that means the students at your university. But it’s a big problem, especially when you’re researching belief and behavior. It even has a name: It’s called WEIRD research: white, educated, industrialized, rich, and democratic. And most published social research papers are WEIRD.

Rudder’s concern is legitimate. His solution, however, is lacking.

Rudder is a co-founder of the online dating service OKCupid. As such, he has access to an enormous amount of data that is generated by the choices that customers make while seeking romantic connections. Add to this the additional data that he’s collected from other social media sites, such as Facebook and Twitter, and he has a huge data set. Even though the people who use these social media sites are more demographically diverse than WEIRD college students, they don’t represent society as a whole. Derek Ruths of McGill University and Jürgen Pfeffer of Carnegie Mellon University recently expressed this concern in an article titled “Social Medial for Large Studies of Behavior,” published in the November 28, 2014 issue of Science. Also, the conditions under which the data was collected exercise a great deal of influence, but Rudder has “stripped away” most of this context.

Contrary to his disclaimers about Big Data hype, Rudder expresses some hype of his own. Social media Big Data opens the door to a “poetry…of understanding. We are at the cusp of momentous change in the study of human communication.” He believes that the words people write on these sites provide the best source of information to date about the state and nature of human communication. I believe, however, that this data source reveals less than Rudder’s optimistic assessment. I suspect that it mostly reveals what people tend to say and how they tend to communicate on these particular social media sites, which support specific purposes and tend to be influenced by technological limitations—some imposed (e.g., Twitter’s 140 character limit) and others a by-product of the input device (e.g., the tiny keyboard of a smartphone). We can certainly study the effects that these technological limitations have on language, or the way in which anonymity invites offensive behavior, but are we really on the “cusp of momentous change in the study of human communication”? To derive useful insights from social media data, we’ll need to apply the rigor of science to our analyses just as we do with other data sources.

Rudder asserts:

Behind every bit in my data, there are two people, the actor and the acted upon, and the fact that we can see each as equals in the process is new. If there is a “-clysm” part of the whole data thing, if this book’s title isn’t more than just a semi-clever pun or accident of the alphabet—then this is it. It allows us to see the full human experience at once, not just whatever side we happen to be paying attention to at a given time.

Having read the book, I found that the book’s title is only a “semi-clever pun.” Contrary to Rudder’s claim, his data does not “allow us to see the full human experience at once.” In fact, it provides a narrow window through which we can observe anonymous interactions between strangers in particular contexts that are designed for specific purposes (e.g., getting a date). Many of the findings that Rudder presents are fun and interesting, but we should take them with a grain of salt.

Fairly early in the book, Rudder presents his findings about women’s preferences in men and also about men’s preferences in women, but it isn’t clear what the data actually means because he’s stripped it of context. For example, when describing women’s age preferences—“the age of men she finds most attractive”—he fails to mention the specific data he based his findings on and the context in which it was collected. Were women being shown a series of photographs, two men at a time, and asked to select the one they found more attractive of the two; was the data based solely on the ages of the men that women contacted in hope of a date; or was the data drawn from some other context? Scientists must describe the designs of their studies, including the specific conditions under which they collected their data. Without understanding this context, we can’t understand the findings and certainly can’t rely on them.

Later in the book, Rudder presents findings about the number of messages people receive on OKCupid in correlation to their physical attractiveness. His graph of his findings displays a positive correlation between the two variables that remains steady through most of the series but suddenly increases to an exponential relationship beginning around the 90th percentile of attractiveness. He sliced and diced his findings in several ways, but never provided a fundamental and important piece of information: how was physical attractiveness measured? Important insights might reside in this data, but we can’t trust them without the missing context.

Rudder seems to be endorsing a typical tenet of Big Data hype that concerns me, which I’ll paraphrase as, “With Big Data we no longer need to adhere to the basic principles of science.” I applaud Rudder’s efforts to expose bad science in the form of small, demographically homogeneous groups of test subjects, but his alternative introduces its own set of problems, which are just as harmful. I suspect that Rudder endorses this particular alternative because it is convenient. He’s a co-founder of a social media site that collects tons of data. It’s in his interest to enthusiastically endorse this Big Data approach. I trust that Rudder’s conscious intentions are pure, but I believe that his perspective is biased by his role, experience, and interests.

Sourcing data from the wild rather than from controlled experiments in the lab has always been an important avenue of scientific study. These studies are observational rather than experimental. When we do this, we must carefully consider the many conditions that might affect the behavior that we’re observing. From these observations, we carefully form hypotheses, and then we test them, if possible, in controlled experiments. Large social media data sets don’t alleviate the need for this careful approach. I’m not saying that large stores of social media data are useless. Rather, I’m saying that if we’re going to call what we do with it data science, let’s make sure that we adhere to the principles and practices of science. How many of the people who call themselves “data scientists” on resumes today have actually been trained in science? I don’t know the answer, but I suspect that it’s relatively few, just as most of those who call themselves “data analysts” of some type or other have not been trained in data analysis. No matter how large the data source, scientific study requires rigor. This need is not diminished in the least by data volume. Social media data may be able to reveal aspects of human behavior that would be difficult to observe in any other way. We should take advantage of this. However, we mustn’t treat social media data as magical, nor analyze it with less rigor than other sources of data. It is just data. It is abundantly available, but it’s still just data.

In one example of insights drawn from social media data, Rudder lists words and phrases that are commonly used by particular groups of people but aren’t commonly used by other groups. He also listed the opposite: words that are least often used by particular groups but are commonly used by other groups. His primary example featured the words and comments of men among the following four ethnic groups: white, black, Latino, and Asian. Here’s the top ten words and phrases that Rudder listed for white men:

my blue eyes
blonde hair
ween
brown hair
hunting and fishing
Allman brothers
woodworking
campfire
redneck
dropkick murphys

I’m a white man and I must admit that this list does not seem to capture the essence of white men in general. I found the lists interesting, when considered in context, but far less revealing that Rudder claimed. Here’s an example of the “broad trends” that Rudder discerned from this approach to data analysis:

White people differentiate themselves mostly by their hair and eyes. Asians by there country of origin. Latinos by their music.

One aspect of the data that Rudder should have emphasized more is that it was derived from people’s responses to OKCupid profile questions. This means that we shouldn’t claim anything meaningful about these words and phrases apart from the context of self-description when trying to get a date. Another more fundamental problem is that by limiting the list to words and phrases that were used uniquely by particular groups, the list fails to represent the ways in which these groups view themselves overall. In other words, if the top words and phrases used by each of these groups were listed without filtering them on the basis of uniqueness (i.e., little use by other ethnic groups), very different self-descriptions of these groups would emerge.

I appreciate Rudder’s attempts to mine the relatively new and rich data resources that are available to him. Just as others have done before, he has an opportunity to reveal unknown and interesting aspects of human behavior that are specific to the contexts from which he has collected data. If he had stayed within these natural boundaries, I would have enjoyed his observations thoroughly. Unfortunately, Rudder strayed beyond these boundaries into the realm of claims that exceed his data and methods of analysis. This is an expression of Big Data hyperbole and technological solutionism. The only “data science” that is worthy of the name is, above all, rooted in the principles and practices of science. We dare not forget this in our enthusiasm.

Take care,

Signature