Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
May 6th, 2014
It’s often useful to take a fresh look at things through the eyes of an outsider. My friend Leanne recently provided me with an outsider’s perspective after reading a blog article of mine regarding Big Data. In it I referred to the three Vs—volume, velocity, and variety—as a common theme of Big Data definitions, which struck Leanne as misapplied. Being trained in health care and, perhaps more importantly, being a woman, Leanne pointed out that the three Vs don’t seem to offer any obvious advantages to data, but they’re highly desirable when applied to the Big O. What’s the Big O? Leanne was referring to the “oh, oh, oh, my God” Big O more commonly known as the female ORGASM. When it comes to the rock-my-world experience of the Big O:
- Volume is desirable—the more the better;
- Velocity is desirable—reaching terminal velocity quickly with little effort is hard to beat; and
- Variety is desirable—getting there through varied and novel means is a glorious adventure.
The three Vs are a perfect fit for the Big O, but not for data. More data coming at us faster from an ever-growing variety of sources offers few advantages and often distracts from the ultimate goal. Leanne doesn’t understand why data geeks (her words, not mine) are spending so much time arguing about terminology and technology instead of focusing on content—what data has to say—and putting that content to good use. I couldn’t agree more.
May 2nd, 2014
On two occasions several years ago I was asked by business intelligence publications to review a software product named FYI Visual. On both occasions I gladly accepted because people needed to be warned about it. FYI Visual was a zombie in the sense that, when it was born from the imagination of its creator, a medical doctor, it was lifeless, without substance or worth, animated only by his force of will and wallet. It was a horrible product, completely bereft of usefulness because it was built on an erroneous foundation. Eventually, the product ceased to exist, and I breathed a sigh of relief. Yesterday, however, I read an article by a fellow named Ben Kerschberg, a contributor to Forbe’s website, that promoted a brand new product, which appears to be a reanimated version of FYI Visual named VisualCue. This new product is fundamentally based on the same flawed foundation as its predecessor.
Why was a contributor to Forbe’s promoting the walking dead? Not because Kerschberg has expertly reviewed the product and found it worthy. Kerschberg is an attorney, with no expertise whatsoever in data visualization. It’s clear from some of Kerschberg’s statements that he was merely parroting promotional material that was provided by VisualCue. For instance, Kerschberg referred to “bar charts,…treemaps, Gannt charts, and scatter plots” as “subpar visualizations… that fail to serve their main purpose of communicating information.” This language is reminiscent of statements previously made by the founder of FYI Visual. What is VisualCue’s answer to these subpar visualizations? “Interactive Visualization,” which Kerschberg says is a term that was coined by Gartner. First of all, Gartner did not coin this term. It has been in use since long before data visualization was on Gartner’s radar. Anyone familiar with the field knows that interactive visualization has been around since the early days of computer graphics. Regarding interactive visualization, Kerschberg makes the following claim, no doubt lifted directly from VisualCue’s absurd promotional content: “Interactive Visualization implies the use of heat maps, geographic maps, link charts, and a broad spectrum of special purpose visualizations that surround processes that are inextricably linked to an underlying analytics.” Huh? Really? This is news to those of us who have worked in the field for many years. And what does VisualCue’s version of interactive visualization look like? Now, for your viewing pleasure, I present and example of their amazing innovation:
One of these collections of icons (binoculars, clock, boat, etc.) is called a tile. You wouldn’t ordinarily use a single tile, but an entire screen full of them, arranged as a mosaic, such as the following example:
A screen full of these cute icons would certainly serve as an effective substitute for “subpar” bar charts, tree maps, scatter plots, and the like, if you wanted to overwhelm viewers’ senses with utter nonsense. Obviously, these heat map colored icons do not serve the same purpose as quantitative graphs such a bar charts and scatter plots. In fact, there is no purpose for which this display would provide a good solution.
FYI Visual, the predecessor of VisualCue, used the same basic approach except that its icons were all rectangles and an odd combination of colors and shapes served the purpose of the heatmap colors. What the new product calls tiles the old product called KEGS and what the new product calls a mosaic arrangement of tiles the original product called a KEGSET. Other than the names, which are now friendlier, it appears that little else has changed. If you’re interested, you can read one of my reviews of the old product in the article “FYI Visual: The Story of a Product that was Built on a Fault.”
If this zombie stumbles into your neighborhood, I think the best way to protect yourself against it is to laugh hysterically. If we start laughing now and refuse to cease, we’ll chase this zombie back into the darkness from which it emerged before any organization wastes money purchasing it.
By promoting this software, Kerschberg is being irresponsible. By allowing people like Kerschberg to write about things they don’t understand, Forbes is demonstrating a complete lack of respect for its readers. Shame on them.
May 1st, 2014
We visualize quantitative data to perform three fundamental tasks in an effort to achieve three essential goals:
These three tasks are so fundamental to data visualization, I’ve long used them to define the term, as follows:
Data visualization is the use of visual representations to explore, make sense of, and communicate data.
But why is it that we must sometimes use graphical displays to perform these tasks rather than other forms of representation? Why not always express values as numbers in tables? Why express them visually rather than audibly? Essentially, there is only one good reason to express quantitative data visually: some features of quantitative data can be best perceived and understood, and some quantitative tasks can be best performed, when values are displayed graphically. This is so because of the ways our brains work. Vision is by far our dominant sense. We have evolved to perform many data sensing and processing tasks visually. This has been so since the days of our earliest ancestors who survived and learned to thrive on the African savannah. What visual perception evolved to do especially well, it can do faster and better than the conscious thinking parts of our brains. Data exploration, sensemaking, and communication should always involve an intimate collaboration between seeing and thinking (i.e., visual thinking).
Despite this essential reason for visualizing data, people often do it for reasons that are misguided. Let me dispel a few common myths about data visualization.
Myth #1: We visualize data because some people are visual learners.
While it is true that some people have greater visual thinking abilities than others and that some people have a greater interest in images than others, all people with normal perceptual abilities are predominantly visual. Everyone benefits from data visualization, whether they consider themselves visual learners or not, including those who prefer numbers.
Myth #2: We visualize data for people who have difficulty understanding numbers.
While it is true that some people are more comfortable with quantitative concepts and mathematics than others, even the brightest mathematicians benefit from seeing quantitative information displayed visually. Data visualization is not a dumbed-down expression of quantitative concepts.
Myth #3: We visualize data to grab people’s attention with eye-catching but inevitably less informative displays.
Visualizations don’t need to be dumbed down to be engaging. It isn’t necessary to sacrifice content in lieu of appearance. Data can always be displayed in ways that are optimally informative, pleasing to the eye, and engaging. To engage with a data display without being well informed of something useful is a waste.
Myth #4: The best data visualizers are those who have been trained in graphic arts.
While training in graphic arts can be useful, it is much more important to understand the data and be trained in visual thinking and communication. Graphic arts training that focuses on marketing (i.e., persuading people to buy or do something through manipulation) and artistry rather than communication can actually get in the way of effective data visualization.
Myth #5: Graphics provide the best means of telling stories contained in data.
While it is true that graphics are often useful and sometimes even essential for data-based storytelling, it isn’t storytelling itself that demands graphics. Much of storytelling is best expressed in words and numbers rather than images. Graphics are useful for storytelling because some features of data are best understood by our brains when they’re presented visually.
We visualize data because the human brain can perceive particular quantitative features and perform particular quantitative tasks most effectively when the data is expressed graphically. Visual data processing provides optimal support for the following:
1. Seeing the big picture
Graphs reveal the big picture: an overview of a data set. An overview summarizes the data’s essential characteristics, from which we can discern what’s routine vs. exceptional.
The series of three bar graphs below provides an overview of the opinions that 15 countries had about America in 2004, not long after the events of 9/11 and the military campaigns that followed.
I first discovered this information in the following form on the website of PBS:
Based on this table of numbers, I had to read each value one at a time and, because working memory is limited to three or four simultaneous chunks of information at a time, I couldn’t use this display to construct and hold an overview of these countries’ opinions in my head. To solve this problem, I redisplayed this information as the three bar graphs shown above, which provided the overview that I wanted. I was able to use it to quickly get a sense of these countries’ opinions overall and in comparison to one another.
2. Easily and rapidly comparing values
Try to quickly compare the magnitudes of values using a table of numbers, such as the one shown above. You can’t, because numbers must be read one at a time and only two numbers can be compared at a time. Graphs, however, such as the bar graphs above, make it possible to see all of the values at once and to easily and rapidly compare them.
3. Seeing patterns among values
Many quantitative messages are revealed in patterns formed by sets of values. These patterns describe the nature of change through time, how values are distributed, and correlations, to name a few.
Try to construct the pattern of monthly change in either domestic or international sales for the entire year using the table below.
Difficult, isn’t it? The line graph below, however, presents the patterns of change in a way that can be perceived immediately, without conscious effort.
You can thank processes that take place in your visual cortex for this. The visual cortex perceives patterns and then the conscious thinking parts of our brains make sense of them.
4. Comparing patterns
Visual representations of patterns are easy to compare. Not only can the independent patterns of domestic and international sales be easily perceived by viewing the graph above, but they can also be compared to one another to determine how they are similar and different.
These four quantitative features and activities require visual displays. This is why we visualize quantitative data.
April 30th, 2014
Yesterday, I read an article on the website of Scientific American titled “Saving Big Data from Big Mouths” by Cesar A. Hidalgo. As you know if you read this blog regularly, I have grave concerns about the hyperbolic claims of Big Data and believe that it is little more than a marketing campaign to sell expensive technology products and services. In his article, Dr. Hidalgo, who teaches in the MIT Media Lab, challenges several recent articles in prominent publications that criticize the claims of Big Data. The fact that he characterizes naysayers as “big mouths” clues you into his perspective on the matter. In reading his article, I discovered that Dr. Hidalgo’s understanding of Big Data is limited, as is often the case with academics, and his position suffers from a fundamental problem with Big Data, which is that Big Data, as he defines it, doesn’t actually exist. When I say that it doesn’t exist, I’m arguing that Big Data, as he’s defined it, isn’t new or qualitatively different from data in the past. Big Data is just data.
I expressed my concerns to Dr. Hidalgo by posting the following comments in response to his article:
I’m one of the naysayers in response to the claims of so-called Big Data. I’m concerned primarily with the hype that leads organizations to waste money chasing new technologies rather than developing the skills that are needed to glean value from data. One of the fundamental problems with Big Data is the fact that no two people define it in the same way, so it is difficult to discuss it intelligently. In this article, you praised the benefits of Big Data, but did not define it. What do you mean by Big Data? How is Big Data different from other data? When did data become big? Are the means of gleaning value from so-called Big Data different from the means of gleaning value from data in general?
He was kind enough to respond with the following.
Dear Stephen Few,
These are all very good questions.
First, regarding the definition of big data:
As you probably know well, the term big data is used colloquially to refer mostly to digital traces of human activities. These include cell phone data, credit card records and social media activity. Big data is also used occasionally to refer to data generated by some scientific experiments (like CERN or genomic data), although this is not the most common use of the phrase so I will stick to the “digital traces of human activity” definition for now.
Beyond the colloquial definition, I have a working definition of big data that I use on occasion. To keep things simple I say that big data needs to be three times big, meaning that it needs to be big in size, resolution and scope. The size dimension is relatively obvious (data on 20 or 30 individuals is not the same as data on hundreds of thousands of them). The resolution dimension is better explained by an example. Consider having credit card data on a million people. A low resolution version of this dataset would consist only on the total yearly expenditure of each individual. A high resolution version, would include information on when and where the purchases where made. In this example, it is the resolution of the data what allows us to use it to study, for instance, the mobility of this particular group of people (notice that I am not generalizing to the general population, since this subpopulation might be worthy of study on its own). Finally, I require data to be big in scope. By this, I require data to be useful for applications other than the ones for which it was originally collected. For instance, mobile phone records are used by operators for billing purposes, but could be used to forecast traffic or to identify the location of mobile phone users prior to a natural disaster (and use this information to help speed up search and rescue operations). When data is big in size, resolution and scope, I am comfortable saying that it is big.
Second: Your question about when big data become big?
This is an interesting question because it points to the evolution of language. During the last decade the word data has begot at least two children: “metadata” and “big data”. The word metadata grew in popularity in the wake of the NSA scandal, as people needed to differentiate between the content of messages and their metadata. Big data, on the other hand, emerged as people searched for a short way to refer to the digital traces of human activity that were collected for operational purposes by service providers serving large populations, and that could be used for purposes that were beyond those for which the data was originally collected. Certainly, the phrase “big data” provides an economy of language, and as someone that enjoys writing I always appreciate that.
Regarding the time at which this transition happen, I remember that when I started working with mobile phone data (in 2004) people were not using the word big data. As more people entered the field, the word begun to gain force (around 2008). With the financial crisis, the hype of big data entered full swing, as many framed big data as a the new asset, or technology, that could save the economy. I guess at that time, everyone wanted to believe them :-).
Third and Final: You ask whether the means of gleaning value from big data include the methods used to glean value from data in general.
In short, the answer is yes (it is data after all). Multivariate regressions in all of its forms and specifications are still useful and welcome (I use them often). Yet, these new datasets have also stimulated the proliferation of some additional techniques. For instance, visualizations have progressed enormously during recent years since exploring these datasets is not easy, and large datasets involve more exploration. As an example, check dataviva.info . This site makes available more than 100 million visualizations to help people explore Brazil’s formal sector economy. By taking different combinations of visualizations it is possible to weave stories about industries and locations. An example of these stories for a related project, The Observatory of Economic Complexity (atlas.media.mit.edu), can be seen in this video (http://vimeo.com/40565955). Here you will be able to see how these visualization techniques allow people to quickly compose stories about a topic.
Finally, it is worth noting that different people might mean different things when they refer to gleaning value from data. For some people, this might involve explaining the mechanisms that gave rise to the observed patterns, or use the data to learn about an aspect of the world. This is a common approach on the social sciences. For other people value might emerge from predictions that are not cognitively penetrable but nevertheless accurate, such as the ones people obtain with different machine learning techniques, such as neural networks or those based on abstract features. The latter of these approaches, which is often used by computer scientists, can be very useful for sites that require accurate predictions, such as Netflix or Amazon. Here, the value is certainly more commercial, but is also a valid answer to the question.
I hope these answer help clarify your questions.
All the best
I wanted to respond in kind, but for some unknown reason the Scientific American website is rejecting my comments, so I’ll continue the discussion here in my own blog.
Your response regarding the definition of Big Data demonstrates the problem that I’m trying to expose: Big Data has not been defined in a manner that lends itself to intelligent discussion. Your definition does not at all represent a generally accepted definition of Big Data. It is possible that the naysayers with whom you disagree define Big Data differently than you do. I’ve observed a great many false promises and much wasted effort in the name of Big Data. Unless you’re involved with a broad audience of people who work with data in organizations of all sorts (not just academia), you might not be aware of some of the problems that exist with Big Data.
Your working definition of Big Data is somewhat similar to the popular definition involving the 3 Vs (volume, velocity, and variety) that is often cited. The problem with the 3 Vs and your “size, resolution, and scope” definition is that they define Big Data in a way that could be applied to the data that I worked with when I began my career 30 years ago. Back then I routinely worked with data that was big in size (a.k.a., volume), detailed in resolution, and useful for purposes other than that for which it was originally generated. By defining Big Data as you have, you are supporting the case that I’ve been making for years that Big Data has always existed and therefore doesn’t deserve a new name.
I don’t agree that the term Big Data emerged as a “way to refer to digital traces of human activity that were collected for operational purposes by service providers serving large populations, and that could be used for purposes that were beyond those for which the data was originally collected.” What you’ve described has been going on for many years. In the past we called it data, with no need for the new term “Big Data.” What I’ve observed is that the term Big Data emerged as a marketing campaign by technology vendors and those who support them (e.g., large analyst firms such as Gartner) to promote sales. Every few years vendors come up with a new name for the same thing. Thirty years ago, we called it decision support. Not long after that we called it data warehousing. Later, the term business intelligence came into vogue. Since then we’ve been subjected to marketing campaigns associated with analytics and data science. These campaigns keep organizations chasing the latest technologies, believing that they’re new and necessary, which is rarely the case. All the while, they never slow down long enough to develop the basic skills of data sensemaking.
When you talk about data visualization, you’re venturing into territory that I know well. It is definitely not true that data visualization has “progressed enormously during recent years.” As a leading practitioner in the field, I am painfully aware that progress in data visualization has been slow and, in actual practice, is taking two steps backwards, repeating past mistakes, for every useful step forwards.
What various people and organizations value from data certainly differs, as you’ve said. The question that I asked, however, is whether or not the means of gleaning value from data, regardless of what we deem valuable, are significantly different from the past. I believe that the answer is “No.” While it is true that we are always making gradual progress in the development of analytical techniques and technologies, what we do today is largely the same as what we did when I first began my work in the field 30 years ago. Little has changed, and what has changed is an extension of the past, not a revolutionary or qualitative departure.
I hope that Dr. Hidalgo will continue our discussion here and that many of you will contribute as well.
March 28th, 2014
All men are designers. All that we do, almost all the time, is design, for design is basic to all human activity. The planning and patterning of any act towards a desired, foreseeable end constitutes the design process. Any attempt to separate design, to make it a thing-by-itself, works counter to the inherent value of design as the primary underlying matrix of life. Design is the conscious effort to impose meaningful order.
Mankind is unique among animals in its relationship to the environment. All other animals adapt themselves to a changing environment (by growing thicker fur in the winter, or evolving into a totally new species over a half-million-year cycle); only mankind transforms earth itself to suit its needs and wants. This job of form-giving and reshaping has become the designer’s responsibility. A hundred years ago, if a new chair, carriage, kettle, or pair of shoes was needed, the consumer went to the craftsman, stated his wants, and the article was made for him. Today the myriad objects of daily use are mass-produced to a utilitarian and aesthetic standard often completely unrelated to the consumer’s need. At this point Madison Avenue must be brought in to make these objects desirable or even palatable to the mass consumer.
In an age of mass production when everything must be planned and designed, design has become the most powerful tool with which man shapes his tools and environments (and, by extension, society and himself). This demands high social and moral responsibility from the designer. It also demands greater understanding of the people by those who practice design and more insight into the design process by the public.
Design must become an innovative, highly creative, cross-disciplinary tool responsive to the true needs of men. It must be more research-oriented, and we must stop defiling the earth itself with poorly-designed objects and structures.
“Should I design it to be functional or to be aesthetically pleasing?” This is the most heard, the most understandable, and the most mixed-up question in design today. “Do you want it to look good, or to work?” Barricades erected between what are really just two of the many aspects of function. It is all quite simple: aesthetic value is an inherent part of function.
The response of many designers has been like that so unsuccessfully practiced by Hollywood: the public has been pictured as totally unsophisticated, possessed of neither taste nor discrimination. A picture emerges of a moral weakling with an IQ of about 70, ready to accept whatever specious values the unholy trinity of Motivation Research, Market Analysis, and Sales have decided is good for him.
The cancerous growth of the creative individual expressing himself egocentrically at the expense of spectator and/or consumer has spread from the arts, overrun most of the crafts, and finally reach even into design. No longer does the artist, craftsman, or in some cases the designer operate with the good of the consumer in mind; rather, many creative statements have become highly individualistic, auto-therapeutic little comments by the artist to himself. With new processes and an endless list of new materials at his proposal, the artist, craftsman, and designer now suffers from the tyranny of absolute choice. When everything becomes possible, when all the limitations are gone, design and art can easily become a never-ending search for novelty, and the desire for novelty on the part of the artist becomes an equally strong desire for novelty on the part of the spectator and consumer, until newness-for-the-sake-of-newness becomes the only measure.
To “sex-up” objects (designers’ jargon for making things more attractive to mythical consumers) makes no sense in a world in which basic need for design is very real. In an age that seems to be mastering aspects of form, a return to content is long overdue. Designing for the people’s needs rather than for their wants, or artificially created wants, is the only meaningful direction now.
None of the words above are mine, despite the fact that they reflect my thinking and values perfectly. These words were written in 1971 by the designer/teacher Victor Papanek, whose work I only recently discovered. You can read them yourself in Papanek’s important and thoughtful book entitled Design for the Real World (Academy Chicago Publishers). This is a true classic that all designers should read, especially those of us who design information displays.
Our designs affect the world for good or ill. We choose to either take responsibility by presenting information effectively or to do harm by presenting information in ways that lead to impoverished and erroneous thinking. If you choose the former, you owe it to yourself (and others) to read Design for the Real World for inspiration, direction, and an invitation to make a difference.