Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.


If Big Data Is Anything at All, This Is It

August 18th, 2014

The first time that all but a few of us heard the term “Big Data,” we heard it in the context of a marketing campaign by information technology vendors to promote their products and services. It is this marketing campaign that has made the term popular, leading eventually to the household name that it is today. Despite its popularity, it remains a term seeking a definitive meaning. There are as many definitions of Big Data as there are individuals and organizations that would like to benefit from the belief that it exists. My objective in this brief blog article is to ask, “Does Big Data signify anything that is actually happening, and if so, what is it?”

Long before the term came into common usage around the year 2010, it began to pop up here and there in the late 1990s. It first appeared in the context of data visualization in 1997 at the IEEE 8th Conference on Visualization in a paper by Michael Cox and David Ellsworth titled “Application-controlled demand paging for out-of-core visualization.” The article begins as follows:

Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.

Two years later at the 1999 IEEE Conference on Visualization a panel convened titled “Automation or interaction: what’s best for big data?

In February of 2001, Doug Laney, at the time an analyst with the Meta Group, now with Gartner, published a research note titled “3D Data Management: Controlling Data Volume, Velocity, and Variety.” The term Big Data did not appear in the note, but a decade later, the “3Vs” of volume, velocity, and variety became the most common attributes that are used to define Big Data.

The first time that I ran across the term personally was in a 2005 email from the software company Insightful, the maker of S+, a derivative of the statistical analysis language R, in the title of a course “Working with Big Data.”

By 2008 the term had become used enough in scientific circles to warrant a special issue of Nature magazine. It still didn’t begin to be used more broadly until February, 2010 when Kenneth Cukier wrote a special report for The Economist titled “Data, Data Everywhere” in which he said:

…the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly… The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: “big data.”

It was around this time that the term was snatched from the world of academia to become the most successful information technology marketing campaign of the current decade. (I found most of the historical references to the term Big Data in the Forbes June 6, 2012 blog post by Gil Press titled “A Very Short History of Big Data.”)

Because Big Data has no commonly accepted definition, discussions about it are rarely meaningful or useful. Not once have I encountered a definition of Big Data that actually identifies anything that is new about data or its use. Doug Laney’s 3Vs, which describe exponential increases in data volume, velocity, and variety, have been happening since the advent of the computer many years ago. You might think that technological milestones such as the advent of the personal computer, Internet, or social networking have created exponential increases in data, but they have merely sustained exponential increases that were already happening. Had it not been for these technological advances, increases in data would have ceased to be exponential. Recently, definitions have emphasized the notion that Big Data is data that cannot be processed by conventional technologies. What constitutes conventional vs. unconventional technologies? My most recent encounter with this was the claim that Big Data is that which cannot be processed by a desktop computer. Based on this rather silly definition, Big Data has always existed, because personal computers have never been capable of processing many of the datasets that organizations collect.

So, if Big Data hasn’t been defined in an agreed-upon manner and if none of the existing definitions identify anything about data or its use that is actually new, does the term really describe anything? I’ve thought about this a great deal and I’ve concluded that it describes one thing only that has actually occurred in recent years:

Big Data is a rapid increase in public awareness that data is a valuable resource for discovering useful and sometimes potentially harmful knowledge.

Even if Big Data is this and nothing more, you might think that I’d be grateful for it. I make my living helping people understand and communicate information derived from data, so Big Data has produced a greater appreciation for my work. Here’s the rub: Big Data, as a term with no clear definition, which serves as a marketing campaign for technology vendors, encourages people to put their faith in technologies without first developing the skills that are needed to use those technologies. As a result, organizations waste their money and time chasing the latest so-called Big Data technologies—some useful, some not—to no effect because technologies can only augment the analytical abilities of humans; they cannot make up for our lack of skills or entirely replace our skills. Data is indeed a valuable resource, but only if we develop the skills to make sense of it and find within the vast and exponentially growing noise those relatively few signals that actually matter. Big Data doesn’t do this, people do—people who have taken the time to learn.

Take care,

A Sure Path to Learning

July 8th, 2014

Even though we all claim to value education, teaching and learning is rarely done well. To achieve good outcomes, teachers and students must understand how the brain learns. Unfortunately, few teachers have more than a passing acquaintance with the science of learning. Many of the strongly held and frequently espoused notions about learning practices (e.g., good study habits), which seem intuitive, are dead wrong. Scientific investigation into the learning brain has revealed a great deal, especially in recent years, but the findings seldom reach the teachers and learners who would benefit from them. Peter Brown, Henry L. Roediger III, and Mark A. McDaniel have responded to this problem in the form of a wonderful new book titled Make It Stick: The Science of Successful Learning (2014).

Don’t confuse this with another fine book titled Made to Stick (2007) by brothers Chip and Dan Heath, which teaches how to get messages across in clear and compelling ways. Make It Stick presents in accessible terms the latest research findings regarding learning, both for people who want optimize their own learning efforts and for teachers who want to create successful learning experiences for their students.

By learning, the authors mean “acquiring knowledge and skills and having them readily available from memory so you can make sense of future problems and opportunities.” They’re not talking about simple recall. Learning involves memory, but extends beyond mere recall into the realm of application. Real learning is “effortful.” For example, “When you’re asked to struggle with solving a problem before being shown how to solve it, the subsequent solution is better learned and more durably remembered.” Effort alone isn’t enough, however. It has to be the right effort.

To apply knowledge and skills to new problems and opportunities when they arise, we must possess more than procedural familiarity; we must have a conceptual understanding that is generalizable. “People who learn to extract the key ideas from new material and organize them into a mental model and connect that model to prior knowledge show an advantage in learning complex mastery.” We can all become better learners by developing better learning practices. One such practice is frequent testing. Whether you’re studying on your own or in a structured learning setting, frequently testing your understanding and ability to apply what you’re learning strengthens it and provides the feedback that you need to focus your efforts where they’re most needed.

Many popular beliefs about learning, such as the benefits of cramming (a.k.a., massed practice) and rereading material over and over, are flawed. The ability to perform well on a multiple-choice test soon after cramming or rereading material is short-lived. Spaced practice, interleaved with other material, results in better learning than non-stop focus on a single topic or skill. Some of the best learning practices are counter-intuitive and don’t necessarily feel like progress during the learning process itself, even though they dramatically outperform other practices that feel more productive. Some beliefs about learning that have garnered attention in recent years are downright wrong. One that I’ve encountered frequently in my own work is the notion that people learn best when they engage in the learning style that they prefer.

The popular notion that you learn better when you receive instruction in a form consistent with your preferred learning style, for example as an auditory or visual learner, is not supported by the empirical research. People do have multiple forms of intelligence to bring to bear on learning, and you learn better when you “go wide,” drawing on all of your aptitudes and resourcefulness, than when you limit instruction or experience to the style you find most amenable.

Our brains are designed to think in several modes (e.g., verbally, numerically, and visually), which we should shift between fluidly, as needed, depending on the nature of the material and the perspective from which we wish to consider it.

Another popular but misguided notion is called student-directed learning. “This theory holds that students know best what they need to study to master a subject, and what pace and methods work best for them.” While it’s true that students should take more responsibility for their own learning, “most students will learn academics better under an instructor who knows where improvement is needed and structures the practice required to achieve it.”

Fundamentally, the purpose for which we pursue the acquisition of information and skills has a significant effect on learning. There is a huge difference between focusing on performance versus focusing on learning.

In the first case, you’re working to validate your ability. In the second, you’re working to acquire new knowledge and skills. People with performance goals unconsciously limit their potential. If your focus is on validating or showing off your ability, you pick challenges you are confident you can meet…But if your goal is to increase your ability, you pick ever-increasing challenges, and you interpret setbacks as useful information that helps you sharpen your focus, get more creative, and work harder.

I could go on, but I won’t, because I merely want to whet your appetite for more. This is an excellent book and one that is desperately needed. As the authors say, “No matter what you may set your sights on doing or becoming, if you want to be a contender, it’s mastering the ability to learn that will get you in the game and keep you there.”

Although I was already familiar with much of the material in this book, because of extensive reading about learning theory, 40 years of reflective teaching experience, and a lifelong love of learning, a great deal was new to me. Enough, in fact, that I will soon be redesigning my table and graph design course, Show Me the Numbers, to last two days rather than one so I can add frequent tests, additional discussions, and many more group exercises to guarantee that my students leave with a stronger foundation to build on. I’ve been teaching the concepts well, but not fully providing the learning experience that will make those concepts stick.

Take care,

Design vs. Art

May 12th, 2014

I am writing these words in Amsterdam. Yesterday, when I arrived here, I visited the Stedelijk Museum of contemporary art and design. The featured exhibition was the work of the Dutch industrial designer Marcel Wanders. This exhibit was timely, for I’m currently reading a book titled Design This Day by Walter Dorwin Teague, one of the founders of industrial design. The juxtaposition between Wanders’ current work and Teague’s formative concept of design struck me as extreme. Wanders is the antithesis of Teague. The exhibition of Wander’s work featured this huge photograph above the entrance:

Wanders work exhibits conscious, unapologetic self-expression—”Look at me!” One of the quotes writ large on the museum’s wall expressed Wanders’ belief that a designer’s work should exhibit his personal signature. I disagree, as does Teague.

When speaking of the rightness of a design, Teague declares that all aspects “should derive their sanction from something more necessary than a designer’s fancy.” Design strives to solve a problem, to serve human needs, not to express the personality of the designer.

Wanders’ notion of design is quite different.

It is our responsibility to be magicians, to be jesters, to be alchemists, to create hope where there is only illusion, to create reality where there are only dreams.

He shuns the formative principle of industrial design that “form follows function.” His aspirations are those of an artist, not a designer. This perspective is reflected in his work.

No, this is not a toy, it is Wanders’ full size, running “holiday car,” its exterior covered with colored stones.

The designer’s approach should be one of interaction, not imposition: interaction between human needs, the tools, techniques, and materials of construction, the environment, and the designer’s skill and imagination. As designers, we use the best materials, tools, and techniques available to solve real problems in the context of our environment as well as possible. We are directed by human needs and the problems that must be solved to fulfill them, not a desire for self-expression. We are restricted objectively by our tools and materials and their impact on the world, not subjectively by the expanse of our egos. The product of our efforts should show no visible sign of ourselves, though it is born of our imagination. Perhaps this is a fundamental difference between art and design: the former an act of self-expression, often beautiful; the latter an act of integration and resolution, no less beautiful, but assessed differently. As designers, we speak in silence, but our voices, though anonymous whispers, are no less heard. Silently, we change the world.

Take care,

The Three Vs and the Big O

May 6th, 2014

It’s often useful to take a fresh look at things through the eyes of an outsider. My friend Leanne recently provided me with an outsider’s perspective after reading a blog article of mine regarding Big Data. In it I referred to the three Vs—volume, velocity, and variety—as a common theme of Big Data definitions, which struck Leanne as misapplied. Being trained in health care and, perhaps more importantly, being a woman, Leanne pointed out that the three Vs don’t seem to offer any obvious advantages to data, but they’re highly desirable when applied to the Big O. What’s the Big O? Leanne was referring to the “oh, oh, oh, my God” Big O more commonly known as the female ORGASM. When it comes to the rock-my-world experience of the Big O:

  1. Volume is desirable—the more the better;
  2. Velocity is desirable—reaching terminal velocity quickly with little effort is hard to beat; and
  3. Variety is desirable—getting there through varied and novel means is a glorious adventure.

The three Vs are a perfect fit for the Big O, but not for data. More data coming at us faster from an ever-growing variety of sources offers few advantages and often distracts from the ultimate goal. Leanne doesn’t understand why data geeks (her words, not mine) are spending so much time arguing about terminology and technology instead of focusing on content—what data has to say—and putting that content to good use. I couldn’t agree more.

Take care,

VisualCue: A Zombie, Reanimated

May 2nd, 2014

On two occasions several years ago I was asked by business intelligence publications to review a software product named FYI Visual. On both occasions I gladly accepted because people needed to be warned about it. FYI Visual was a zombie in the sense that, when it was born from the imagination of its creator, a medical doctor, it was lifeless, without substance or worth, animated only by his force of will and wallet. It was a horrible product, completely bereft of usefulness because it was built on an erroneous foundation. Eventually, the product ceased to exist, and I breathed a sigh of relief. Yesterday, however, I read an article by a fellow named Ben Kerschberg, a contributor to Forbe’s website, that promoted a brand new product, which appears to be a reanimated version of FYI Visual named VisualCue. This new product is fundamentally based on the same flawed foundation as its predecessor.

Why was a contributor to Forbe’s promoting the walking dead? Not because Kerschberg has expertly reviewed the product and found it worthy. Kerschberg is an attorney, with no expertise whatsoever in data visualization. It’s clear from some of Kerschberg’s statements that he was merely parroting promotional material that was provided by VisualCue. For instance, Kerschberg referred to “bar charts,…treemaps, Gannt charts, and scatter plots” as “subpar visualizations… that fail to serve their main purpose of communicating information.” This language is reminiscent of statements previously made by the founder of FYI Visual. What is VisualCue’s answer to these subpar visualizations? “Interactive Visualization,” which Kerschberg says is a term that was coined by Gartner. First of all, Gartner did not coin this term. It has been in use since long before data visualization was on Gartner’s radar. Anyone familiar with the field knows that interactive visualization has been around since the early days of computer graphics. Regarding interactive visualization, Kerschberg makes the following claim, no doubt lifted directly from VisualCue’s absurd promotional content: “Interactive Visualization implies the use of heat maps, geographic maps, link charts, and a broad spectrum of special purpose visualizations that surround processes that are inextricably linked to an underlying analytics.” Huh? Really? This is news to those of us who have worked in the field for many years. And what does VisualCue’s version of interactive visualization look like? Now, for your viewing pleasure, I present and example of their amazing innovation:

One of these collections of icons (binoculars, clock, boat, etc.) is called a tile. You wouldn’t ordinarily use a single tile, but an entire screen full of them, arranged as a mosaic, such as the following example:

A screen full of these cute icons would certainly serve as an effective substitute for “subpar” bar charts, tree maps, scatter plots, and the like, if you wanted to overwhelm viewers’ senses with utter nonsense. Obviously, these heat map colored icons do not serve the same purpose as quantitative graphs such a bar charts and scatter plots. In fact, there is no purpose for which this display would provide a good solution.

FYI Visual, the predecessor of VisualCue, used the same basic approach except that its icons were all rectangles and an odd combination of colors and shapes served the purpose of the heatmap colors. What the new product calls tiles the old product called KEGS and what the new product calls a mosaic arrangement of tiles the original product called a KEGSET. Other than the names, which are now friendlier, it appears that little else has changed. If you’re interested, you can read one of my reviews of the old product in the article “FYI Visual: The Story of a Product that was Built on a Fault.”

If this zombie stumbles into your neighborhood, I think the best way to protect yourself against it is to laugh hysterically. If we start laughing now and refuse to cease, we’ll chase this zombie back into the darkness from which it emerged before any organization wastes money purchasing it.

By promoting this software, Kerschberg is being irresponsible. By allowing people like Kerschberg to write about things they don’t understand, Forbes is demonstrating a complete lack of respect for its readers. Shame on them.

Take care,