Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
May 12th, 2014
I am writing these words in Amsterdam. Yesterday, when I arrived here, I visited the Stedelijk Museum of contemporary art and design. The featured exhibition was the work of the Dutch industrial designer Marcel Wanders. This exhibit was timely, for I’m currently reading a book titled Design This Day by Walter Dorwin Teague, one of the founders of industrial design. The juxtaposition between Wanders’ current work and Teague’s formative concept of design struck me as extreme. Wanders is the antithesis of Teague. The exhibition of Wander’s work featured this huge photograph above the entrance:
Wanders work exhibits conscious, unapologetic self-expression—”Look at me!” One of the quotes writ large on the museum’s wall expressed Wanders’ belief that a designer’s work should exhibit his personal signature. I disagree, as does Teague.
When speaking of the rightness of a design, Teague declares that all aspects “should derive their sanction from something more necessary than a designer’s fancy.” Design strives to solve a problem, to serve human needs, not to express the personality of the designer.
Wanders’ notion of design is quite different.
It is our responsibility to be magicians, to be jesters, to be alchemists, to create hope where there is only illusion, to create reality where there are only dreams.
He shuns the formative principle of industrial design that “form follows function.” His aspirations are those of an artist, not a designer. This perspective is reflected in his work.
No, this is not a toy, it is Wanders’ full size, running “holiday car,” its exterior covered with colored stones.
The designer’s approach should be one of interaction, not imposition: interaction between human needs, the tools, techniques, and materials of construction, the environment, and the designer’s skill and imagination. As designers, we use the best materials, tools, and techniques available to solve real problems in the context of our environment as well as possible. We are directed by human needs and the problems that must be solved to fulfill them, not a desire for self-expression. We are restricted objectively by our tools and materials and their impact on the world, not subjectively by the expanse of our egos. The product of our efforts should show no visible sign of ourselves, though it is born of our imagination. Perhaps this is a fundamental difference between art and design: the former an act of self-expression, often beautiful; the latter an act of integration and resolution, no less beautiful, but assessed differently. As designers, we speak in silence, but our voices, though anonymous whispers, are no less heard. Silently, we change the world.
May 6th, 2014
It’s often useful to take a fresh look at things through the eyes of an outsider. My friend Leanne recently provided me with an outsider’s perspective after reading a blog article of mine regarding Big Data. In it I referred to the three Vs—volume, velocity, and variety—as a common theme of Big Data definitions, which struck Leanne as misapplied. Being trained in health care and, perhaps more importantly, being a woman, Leanne pointed out that the three Vs don’t seem to offer any obvious advantages to data, but they’re highly desirable when applied to the Big O. What’s the Big O? Leanne was referring to the “oh, oh, oh, my God” Big O more commonly known as the female ORGASM. When it comes to the rock-my-world experience of the Big O:
- Volume is desirable—the more the better;
- Velocity is desirable—reaching terminal velocity quickly with little effort is hard to beat; and
- Variety is desirable—getting there through varied and novel means is a glorious adventure.
The three Vs are a perfect fit for the Big O, but not for data. More data coming at us faster from an ever-growing variety of sources offers few advantages and often distracts from the ultimate goal. Leanne doesn’t understand why data geeks (her words, not mine) are spending so much time arguing about terminology and technology instead of focusing on content—what data has to say—and putting that content to good use. I couldn’t agree more.
May 2nd, 2014
On two occasions several years ago I was asked by business intelligence publications to review a software product named FYI Visual. On both occasions I gladly accepted because people needed to be warned about it. FYI Visual was a zombie in the sense that, when it was born from the imagination of its creator, a medical doctor, it was lifeless, without substance or worth, animated only by his force of will and wallet. It was a horrible product, completely bereft of usefulness because it was built on an erroneous foundation. Eventually, the product ceased to exist, and I breathed a sigh of relief. Yesterday, however, I read an article by a fellow named Ben Kerschberg, a contributor to Forbe’s website, that promoted a brand new product, which appears to be a reanimated version of FYI Visual named VisualCue. This new product is fundamentally based on the same flawed foundation as its predecessor.
Why was a contributor to Forbe’s promoting the walking dead? Not because Kerschberg has expertly reviewed the product and found it worthy. Kerschberg is an attorney, with no expertise whatsoever in data visualization. It’s clear from some of Kerschberg’s statements that he was merely parroting promotional material that was provided by VisualCue. For instance, Kerschberg referred to “bar charts,…treemaps, Gannt charts, and scatter plots” as “subpar visualizations… that fail to serve their main purpose of communicating information.” This language is reminiscent of statements previously made by the founder of FYI Visual. What is VisualCue’s answer to these subpar visualizations? “Interactive Visualization,” which Kerschberg says is a term that was coined by Gartner. First of all, Gartner did not coin this term. It has been in use since long before data visualization was on Gartner’s radar. Anyone familiar with the field knows that interactive visualization has been around since the early days of computer graphics. Regarding interactive visualization, Kerschberg makes the following claim, no doubt lifted directly from VisualCue’s absurd promotional content: “Interactive Visualization implies the use of heat maps, geographic maps, link charts, and a broad spectrum of special purpose visualizations that surround processes that are inextricably linked to an underlying analytics.” Huh? Really? This is news to those of us who have worked in the field for many years. And what does VisualCue’s version of interactive visualization look like? Now, for your viewing pleasure, I present and example of their amazing innovation:
One of these collections of icons (binoculars, clock, boat, etc.) is called a tile. You wouldn’t ordinarily use a single tile, but an entire screen full of them, arranged as a mosaic, such as the following example:
A screen full of these cute icons would certainly serve as an effective substitute for “subpar” bar charts, tree maps, scatter plots, and the like, if you wanted to overwhelm viewers’ senses with utter nonsense. Obviously, these heat map colored icons do not serve the same purpose as quantitative graphs such a bar charts and scatter plots. In fact, there is no purpose for which this display would provide a good solution.
FYI Visual, the predecessor of VisualCue, used the same basic approach except that its icons were all rectangles and an odd combination of colors and shapes served the purpose of the heatmap colors. What the new product calls tiles the old product called KEGS and what the new product calls a mosaic arrangement of tiles the original product called a KEGSET. Other than the names, which are now friendlier, it appears that little else has changed. If you’re interested, you can read one of my reviews of the old product in the article “FYI Visual: The Story of a Product that was Built on a Fault.”
If this zombie stumbles into your neighborhood, I think the best way to protect yourself against it is to laugh hysterically. If we start laughing now and refuse to cease, we’ll chase this zombie back into the darkness from which it emerged before any organization wastes money purchasing it.
By promoting this software, Kerschberg is being irresponsible. By allowing people like Kerschberg to write about things they don’t understand, Forbes is demonstrating a complete lack of respect for its readers. Shame on them.
May 1st, 2014
We visualize quantitative data to perform three fundamental tasks in an effort to achieve three essential goals:
These three tasks are so fundamental to data visualization, I’ve long used them to define the term, as follows:
Data visualization is the use of visual representations to explore, make sense of, and communicate data.
But why is it that we must sometimes use graphical displays to perform these tasks rather than other forms of representation? Why not always express values as numbers in tables? Why express them visually rather than audibly? Essentially, there is only one good reason to express quantitative data visually: some features of quantitative data can be best perceived and understood, and some quantitative tasks can be best performed, when values are displayed graphically. This is so because of the ways our brains work. Vision is by far our dominant sense. We have evolved to perform many data sensing and processing tasks visually. This has been so since the days of our earliest ancestors who survived and learned to thrive on the African savannah. What visual perception evolved to do especially well, it can do faster and better than the conscious thinking parts of our brains. Data exploration, sensemaking, and communication should always involve an intimate collaboration between seeing and thinking (i.e., visual thinking).
Despite this essential reason for visualizing data, people often do it for reasons that are misguided. Let me dispel a few common myths about data visualization.
Myth #1: We visualize data because some people are visual learners.
While it is true that some people have greater visual thinking abilities than others and that some people have a greater interest in images than others, all people with normal perceptual abilities are predominantly visual. Everyone benefits from data visualization, whether they consider themselves visual learners or not, including those who prefer numbers.
Myth #2: We visualize data for people who have difficulty understanding numbers.
While it is true that some people are more comfortable with quantitative concepts and mathematics than others, even the brightest mathematicians benefit from seeing quantitative information displayed visually. Data visualization is not a dumbed-down expression of quantitative concepts.
Myth #3: We visualize data to grab people’s attention with eye-catching but inevitably less informative displays.
Visualizations don’t need to be dumbed down to be engaging. It isn’t necessary to sacrifice content in lieu of appearance. Data can always be displayed in ways that are optimally informative, pleasing to the eye, and engaging. To engage with a data display without being well informed of something useful is a waste.
Myth #4: The best data visualizers are those who have been trained in graphic arts.
While training in graphic arts can be useful, it is much more important to understand the data and be trained in visual thinking and communication. Graphic arts training that focuses on marketing (i.e., persuading people to buy or do something through manipulation) and artistry rather than communication can actually get in the way of effective data visualization.
Myth #5: Graphics provide the best means of telling stories contained in data.
While it is true that graphics are often useful and sometimes even essential for data-based storytelling, it isn’t storytelling itself that demands graphics. Much of storytelling is best expressed in words and numbers rather than images. Graphics are useful for storytelling because some features of data are best understood by our brains when they’re presented visually.
We visualize data because the human brain can perceive particular quantitative features and perform particular quantitative tasks most effectively when the data is expressed graphically. Visual data processing provides optimal support for the following:
1. Seeing the big picture
Graphs reveal the big picture: an overview of a data set. An overview summarizes the data’s essential characteristics, from which we can discern what’s routine vs. exceptional.
The series of three bar graphs below provides an overview of the opinions that 15 countries had about America in 2004, not long after the events of 9/11 and the military campaigns that followed.
I first discovered this information in the following form on the website of PBS:
Based on this table of numbers, I had to read each value one at a time and, because working memory is limited to three or four simultaneous chunks of information at a time, I couldn’t use this display to construct and hold an overview of these countries’ opinions in my head. To solve this problem, I redisplayed this information as the three bar graphs shown above, which provided the overview that I wanted. I was able to use it to quickly get a sense of these countries’ opinions overall and in comparison to one another.
2. Easily and rapidly comparing values
Try to quickly compare the magnitudes of values using a table of numbers, such as the one shown above. You can’t, because numbers must be read one at a time and only two numbers can be compared at a time. Graphs, however, such as the bar graphs above, make it possible to see all of the values at once and to easily and rapidly compare them.
3. Seeing patterns among values
Many quantitative messages are revealed in patterns formed by sets of values. These patterns describe the nature of change through time, how values are distributed, and correlations, to name a few.
Try to construct the pattern of monthly change in either domestic or international sales for the entire year using the table below.
Difficult, isn’t it? The line graph below, however, presents the patterns of change in a way that can be perceived immediately, without conscious effort.
You can thank processes that take place in your visual cortex for this. The visual cortex perceives patterns and then the conscious thinking parts of our brains make sense of them.
4. Comparing patterns
Visual representations of patterns are easy to compare. Not only can the independent patterns of domestic and international sales be easily perceived by viewing the graph above, but they can also be compared to one another to determine how they are similar and different.
These four quantitative features and activities require visual displays. This is why we visualize quantitative data.
April 30th, 2014
Yesterday, I read an article on the website of Scientific American titled “Saving Big Data from Big Mouths” by Cesar A. Hidalgo. As you know if you read this blog regularly, I have grave concerns about the hyperbolic claims of Big Data and believe that it is little more than a marketing campaign to sell expensive technology products and services. In his article, Dr. Hidalgo, who teaches in the MIT Media Lab, challenges several recent articles in prominent publications that criticize the claims of Big Data. The fact that he characterizes naysayers as “big mouths” clues you into his perspective on the matter. In reading his article, I discovered that Dr. Hidalgo’s understanding of Big Data is limited, as is often the case with academics, and his position suffers from a fundamental problem with Big Data, which is that Big Data, as he defines it, doesn’t actually exist. When I say that it doesn’t exist, I’m arguing that Big Data, as he’s defined it, isn’t new or qualitatively different from data in the past. Big Data is just data.
I expressed my concerns to Dr. Hidalgo by posting the following comments in response to his article:
I’m one of the naysayers in response to the claims of so-called Big Data. I’m concerned primarily with the hype that leads organizations to waste money chasing new technologies rather than developing the skills that are needed to glean value from data. One of the fundamental problems with Big Data is the fact that no two people define it in the same way, so it is difficult to discuss it intelligently. In this article, you praised the benefits of Big Data, but did not define it. What do you mean by Big Data? How is Big Data different from other data? When did data become big? Are the means of gleaning value from so-called Big Data different from the means of gleaning value from data in general?
He was kind enough to respond with the following.
Dear Stephen Few,
These are all very good questions.
First, regarding the definition of big data:
As you probably know well, the term big data is used colloquially to refer mostly to digital traces of human activities. These include cell phone data, credit card records and social media activity. Big data is also used occasionally to refer to data generated by some scientific experiments (like CERN or genomic data), although this is not the most common use of the phrase so I will stick to the “digital traces of human activity” definition for now.
Beyond the colloquial definition, I have a working definition of big data that I use on occasion. To keep things simple I say that big data needs to be three times big, meaning that it needs to be big in size, resolution and scope. The size dimension is relatively obvious (data on 20 or 30 individuals is not the same as data on hundreds of thousands of them). The resolution dimension is better explained by an example. Consider having credit card data on a million people. A low resolution version of this dataset would consist only on the total yearly expenditure of each individual. A high resolution version, would include information on when and where the purchases where made. In this example, it is the resolution of the data what allows us to use it to study, for instance, the mobility of this particular group of people (notice that I am not generalizing to the general population, since this subpopulation might be worthy of study on its own). Finally, I require data to be big in scope. By this, I require data to be useful for applications other than the ones for which it was originally collected. For instance, mobile phone records are used by operators for billing purposes, but could be used to forecast traffic or to identify the location of mobile phone users prior to a natural disaster (and use this information to help speed up search and rescue operations). When data is big in size, resolution and scope, I am comfortable saying that it is big.
Second: Your question about when big data become big?
This is an interesting question because it points to the evolution of language. During the last decade the word data has begot at least two children: “metadata” and “big data”. The word metadata grew in popularity in the wake of the NSA scandal, as people needed to differentiate between the content of messages and their metadata. Big data, on the other hand, emerged as people searched for a short way to refer to the digital traces of human activity that were collected for operational purposes by service providers serving large populations, and that could be used for purposes that were beyond those for which the data was originally collected. Certainly, the phrase “big data” provides an economy of language, and as someone that enjoys writing I always appreciate that.
Regarding the time at which this transition happen, I remember that when I started working with mobile phone data (in 2004) people were not using the word big data. As more people entered the field, the word begun to gain force (around 2008). With the financial crisis, the hype of big data entered full swing, as many framed big data as a the new asset, or technology, that could save the economy. I guess at that time, everyone wanted to believe them :-).
Third and Final: You ask whether the means of gleaning value from big data include the methods used to glean value from data in general.
In short, the answer is yes (it is data after all). Multivariate regressions in all of its forms and specifications are still useful and welcome (I use them often). Yet, these new datasets have also stimulated the proliferation of some additional techniques. For instance, visualizations have progressed enormously during recent years since exploring these datasets is not easy, and large datasets involve more exploration. As an example, check dataviva.info . This site makes available more than 100 million visualizations to help people explore Brazil’s formal sector economy. By taking different combinations of visualizations it is possible to weave stories about industries and locations. An example of these stories for a related project, The Observatory of Economic Complexity (atlas.media.mit.edu), can be seen in this video (http://vimeo.com/40565955). Here you will be able to see how these visualization techniques allow people to quickly compose stories about a topic.
Finally, it is worth noting that different people might mean different things when they refer to gleaning value from data. For some people, this might involve explaining the mechanisms that gave rise to the observed patterns, or use the data to learn about an aspect of the world. This is a common approach on the social sciences. For other people value might emerge from predictions that are not cognitively penetrable but nevertheless accurate, such as the ones people obtain with different machine learning techniques, such as neural networks or those based on abstract features. The latter of these approaches, which is often used by computer scientists, can be very useful for sites that require accurate predictions, such as Netflix or Amazon. Here, the value is certainly more commercial, but is also a valid answer to the question.
I hope these answer help clarify your questions.
All the best
I wanted to respond in kind, but for some unknown reason the Scientific American website is rejecting my comments, so I’ll continue the discussion here in my own blog.
Your response regarding the definition of Big Data demonstrates the problem that I’m trying to expose: Big Data has not been defined in a manner that lends itself to intelligent discussion. Your definition does not at all represent a generally accepted definition of Big Data. It is possible that the naysayers with whom you disagree define Big Data differently than you do. I’ve observed a great many false promises and much wasted effort in the name of Big Data. Unless you’re involved with a broad audience of people who work with data in organizations of all sorts (not just academia), you might not be aware of some of the problems that exist with Big Data.
Your working definition of Big Data is somewhat similar to the popular definition involving the 3 Vs (volume, velocity, and variety) that is often cited. The problem with the 3 Vs and your “size, resolution, and scope” definition is that they define Big Data in a way that could be applied to the data that I worked with when I began my career 30 years ago. Back then I routinely worked with data that was big in size (a.k.a., volume), detailed in resolution, and useful for purposes other than that for which it was originally generated. By defining Big Data as you have, you are supporting the case that I’ve been making for years that Big Data has always existed and therefore doesn’t deserve a new name.
I don’t agree that the term Big Data emerged as a “way to refer to digital traces of human activity that were collected for operational purposes by service providers serving large populations, and that could be used for purposes that were beyond those for which the data was originally collected.” What you’ve described has been going on for many years. In the past we called it data, with no need for the new term “Big Data.” What I’ve observed is that the term Big Data emerged as a marketing campaign by technology vendors and those who support them (e.g., large analyst firms such as Gartner) to promote sales. Every few years vendors come up with a new name for the same thing. Thirty years ago, we called it decision support. Not long after that we called it data warehousing. Later, the term business intelligence came into vogue. Since then we’ve been subjected to marketing campaigns associated with analytics and data science. These campaigns keep organizations chasing the latest technologies, believing that they’re new and necessary, which is rarely the case. All the while, they never slow down long enough to develop the basic skills of data sensemaking.
When you talk about data visualization, you’re venturing into territory that I know well. It is definitely not true that data visualization has “progressed enormously during recent years.” As a leading practitioner in the field, I am painfully aware that progress in data visualization has been slow and, in actual practice, is taking two steps backwards, repeating past mistakes, for every useful step forwards.
What various people and organizations value from data certainly differs, as you’ve said. The question that I asked, however, is whether or not the means of gleaning value from data, regardless of what we deem valuable, are significantly different from the past. I believe that the answer is “No.” While it is true that we are always making gradual progress in the development of analytical techniques and technologies, what we do today is largely the same as what we did when I first began my work in the field 30 years ago. Little has changed, and what has changed is an extension of the past, not a revolutionary or qualitative departure.
I hope that Dr. Hidalgo will continue our discussion here and that many of you will contribute as well.