Thanks for taking the time to read my thoughts about Visual Business
Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions
that are either too urgent to wait for a full-blown article or too
limited in length, scope, or development to require the larger venue.
For a selection of articles, white papers, and books, please visit
August 12th, 2013
This summer I’ve been spending most of my time working on a new book. The current working title is Signal. As the title suggests, this book will focus on analytical techniques for detecting signals in the midst of noisy data. And guess what? All data sets are noisy. In fact, at any given moment, most of the data that we collect are noise. This will always be true, because signals in data are the exception, not the rule.
Signal detection is actually getting harder with the advent of so-called Big Data. By its very nature, most Big Data will never be anything but noise. Collecting everything possible, based on the Big Data argument that the costs of doing so are negligible and that even data that you can’t imagine as useful today could become useful tomorrow, is a dangerous premise. The costs of collecting and storing everything extend far beyond the hardware that’s used to store it. People already struggle to use data effectively. This will become dramatically harder as the volume of data grows. Finding a needle in a haystack doesn’t get easier as you’re tossing more and more hay on the pile.
Most people who are responsible for data analysis in organizations have never been trained to do this work. An insidious assumption exists, promoted by software vendors, that knowing how to use a particular data analysis software product “auto-magically” imbues one with the skills of a data analyst. Even with good software—something that’s rare—this is far from true. Just as with any area of expertise, data analysis requires training and practice, practice, practice. Because few people whose work involves data analysis possess the required skills, much time is wasted and money lost as analysts pore over data without knowing what to look for. They end up chasing patterns that mean nothing and missing those that are gold. Essentially, data analysis is the process of signal detection.
Data that do not convey useful knowledge are noise. When data are displayed, noise can exist both as data that don’t provide useful knowledge and also as useless non-data elements of the display (e.g., irrelevant visual attributes, such as a third dimension of depth in bars, meaningless color variation, and effects of light and shadow). Both sources of noise must be filtered to find and focus on the signals.
When we rely on data for decision making, what qualifies as a signal and what is merely noise? In and of themselves, data are neither. Data are merely facts. When facts are useful, they serve as signals. When they aren’t useful, data clutter the environment with distracting noise.
For data to be useful, they must:
- Address something that matters
- Promote understanding
- Provide an opportunity for action to achieve or maintain a desired state
When any of these qualities are missing, data remain noise.
Signals are always signs of something in particular. In a sense, a signal is not a thing but a relationship. Data become useful knowledge of something that matters when they connect understanding to a question to form an answer. This connection (relationship) is the signal.
As I work on this book to define the nature of signals and to describe techniques for detecting them, I could benefit from your thoughts on the matter. In your experience, what data qualify as signals? How do you find them? What do you do to understand them? What do you do about them once found? What examples have you seen in your own organization or others of time wasted chasing noise. What can we do to reduce noise? Please share with me any thoughts that you have along these lines.
July 23rd, 2013
Just in case you haven’t already noticed, the new edition of Information Dashboard Design is now available!
New chapters have been added that focus on the following topics:
- Fundamental considerations while assessing requirements
- In-depth instruction in the design of bullet graphs
- In-depth instruction in the design of sparklines
- Critical steps that you should take during the design process
Examples of graphics and dashboards have been updated throughout the book and many new examples have been added, including a few more of dashboards that are well designed. In total, approximately 30% more content has been added to the book. It has been a labor of love that I hope you find useful.
June 26th, 2013
I recently read the most thorough, thoughtful, and cogent treatise on technology that I’ve ever encountered: To Save Everything Click Here: The Folly of Technological Solutionism, by Evgeny Morozov.
My attraction to this book is not without bias. Morozov seems to view technology—its potential for both good and ill—much as I do, but the technologies that reside within his purview, the depths to which he’s studied them, and the disciplines on which he draws to understand them, exceed my own. His approach and grasp is that of a philosopher.
Morozov decries technological solutionism.
Alas, all too often, this never-ending quest to ameliorate—or what the Canadian anthropologist Tania Murray Li, writing in a very different context, has called “the will to improve”—is shortsighted and only perfunctorily interested in the activity for which improvement is sought. Recasting all complex social situations either as neatly defined problems with definite, computable solutions or as transparent and self-evident processes that can be easily optimized—if only the right algorithms are in place!—this quest is likely to have unexpected consequences that could eventually cause more damage than the problems they seek to address.
I call the ideology that legitimizes and sanctions such aspirations “solutionism.” I borrow this unabashedly pejorative term from the world of architecture and urban planning, where it has come to refer to an unhealthy preoccupation with sexy, monumental, and narrow-minded solutions—the kind of stuff that wows audiences at TED Conferences—to problems that are extremely complex, fluid, and contentious…Design theorist Michael Dobbins has it right: solutionism presumes rather than investigates the problems that it is trying to solve, reaching “for the answer before the questions have been fully asked.” How problems are composed matters every bit as much as how problems are resolved. (pp. 5 and 6)
This book exposes the threat of solutionism and proposes healthier ways to embrace and benefit from technologies.
The ultimate goal of this book…is to uncover the attitudes, dispositions, and urges that comprise the solutionist mind-set, to show how they manifest themselves in specific projects to ameliorate the human condition, and to hint at how and why some of these attitudes, dispositions, and urges can and should be resisted, circumvented, and unlearned. For only by unlearning solutionism—that is, be transcending the limits it imposes on our imaginations and by rebelling against its value system—will we understand why attaining technological perfection, without attending to the intricacies of the human condition and accounting for the complex world of practices and traditions, might not be worth the price. (p. xv)
If you’ve spent much time listening to or reading the words of Silicon Valley’s prominent spokespersons (Kevin Kelly of IDEO, Mark Zuckerberg of Facebook, Eric Schmidt of Google, to name a few) you might have noticed that they tend to speak of technology as if it were spelled with a capital “T.” For them, Technology is a sentient being with purpose that, much like the God of evangelicals, has a wonderful plan for our lives. It is our job as believers to embrace Technology and let it lead us to the promised land, for it exceeds us in wisdom and power, and is unquestionably good. I’ve provided training and consulting services for many of the technology companies that preach this gospel. During these engagements, I do my best to moderate their techno-enthusiasm and point out that technologies are just tools that provide benefit only when they are well designed, capable of helping us solve real problems, and ethically used. We have choices when we approach technologies, and we should make them thoughtfully.
Morozov addresses information technologies of all types and critiques them incisively from the perspective of history and a breadth of disciplines. Even such givens as Moore’s Law, which technologists often cite as the basis of their position, is revealed as a failed hypothesis—hardly a law.
Morozov seems to share my concerns about Big Data. Regarding the popular new trend of capturing and storing everything he writes, “Where there is no reflection about what ought to be preserved, the records—no matter how comprehensive—might trigger fewer challenging questions about the relative significance of recorded events; the enormity of the archive might actually conceal that significance.” (p. 278) In opposition to those who fail to see the connection between the technologies of today with the past, he writes:
Contrary to his [David Weinberger of Harvard's Berkman Center] claim that “knowledge is now property of the network,” knowledge has always been property of the network, as even a cursory look at the first universities of the twelfth century would reveal. Once again, our digital enthusiasts mistake impressive and—yes!—interesting shifts in magnitude and order with the arrival of a new era in which the old rules no longer apply. Or, as one perceptive critic of Weinberger’s oeuvre has noted, he confuses “a shift in network architecture with the onset of networked knowledge per se.” “The Internet” is not a cause of networked knowledge; it is its consequence—an insight lost on most Internet theorists. (p. 38)
Technologists (especially technology vendors) use the term “revolution” much too loosely. What qualifies as revolutionary? Morozov argues that, “In order to be valid, any declaration of yet another technological revolution must meet two criteria: first, it needs to be cognizant of what has happened and been said before, so that the trend it’s claiming as unique is in fact unique; second, it ought to master the contemporary landscape in its entirety—it can’t just cherry-pick facts to suit its thesis.” No recent so-called revolution in technology fails to meet these criteria more severely than Big Data.
I don’t agree entirely with everything that Morozov presents in this book, but at no point did I find his reasoning unsound or uninformed. He has opened my eyes to a few issues that fall outside of my primary spheres of interest, some of which have caused me to lose a little sleep, especially ways in which technological solutionism is influencing politics. While it is true that our political systems can be improved, the notion that we can “ditch politics altogether and hope that technology—especially ‘the Internet’—can rid us of problems that politics can no longer solve or, in a milder version, that we can replace politicians and politics with technocrats and administration” is frightening. (p. 128 and 129) “Fixing politics without first getting a thorough understanding of what it is and what it is for is still a very dangerous undertaking…Political thinking, as well as political morality, needs to be cultivated; it doesn’t occur naturally—not even to geniuses in Silicon Valley.” (p. 139)
Technologies are important. They give us opportunities to extend our reach and improve our world, but they also give us opportunities to do the opposite. Morozov understands this. He is not a Luddite, he’s a responsible technologist. I recommend that you consider what he has to say.
June 18th, 2013
In a recent blog post titled “Big data NSA spying is not even an effective strategy,” Francis Gouillart raised concerns about Big Data that are very much in line with mine. Gouillart’s is a refreshing and rare voice of sanity. He’s been around long enough to recognize marketing hype when he sees it, and as an independent thinker with ethics, not a shill for technology vendors, he is one among few who are speaking the truth. Here’s a sample:
The evidence for big data is scant at best. To date, large fields of data have generated meaningful insights at times, but not on the scale many have promised…Yet, for years now, corporations and public organizations have been busy buying huge servers and business intelligence software, pushed by technology providers and consultants armed with sales pitches with colorful anecdotes such as the Moneyball story in which general manager Billy Beane triumphed by using player statistics to predict the winning strategies for the Oakland A’s baseball team. If it worked for Billy Beane, it will work for your global multinational, too, right? Well, no.
The worship of big data is not new. Twenty-five years ago, technology salespeople peddled data using an old story about a retailer that spotted a correlation between diaper purchases and beer drinking, allowing a juicy cross-promotion of the two products for young fathers. Today, most data warehouses are glorified repositories of transaction data, with very little intelligence.
Working with multinationals as a management consultant, I have chased big data insights all my life and have never found them. What I have learned, however, is that local data has a lot of value. Put another way, big data is pretty useless, but small data is a rich source of insights. The probability of discovering new relationships at a local, highly contextual level and expanding it to universal insights is much higher than of uncovering a new law from the massive crunching of large amounts of data.
Read Gouilart’s article in full and pass it on. It’s time to usher in a quiet voice of sanity in this noisy, naive world of “more is better.”
May 31st, 2013
Predictive analytics is one of the most popular IT terms of our day, and like the others (Big Data, Data Science, etc.), it’s often defined far too loosely. People who work in the field of predictive analytics, however, use the term fairly precisely and meaningfully. No one, in my experience, does a better job of explaining predictive analytics—what it is, how it works, and why it’s important—than Eric Siegel, the founder of Predictive Analytics World, Executive Editor of the Predictive Analytics Times, and author of the new best-selling book in the field, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.
Predictive analytics is a computer-based application of statistics that has grown out of an academic discipline that is traditionally called machine learning. Yes, even though computers can’t think, they can learn (i.e., acquire useful knowledge from data). Siegel defines predictive analytics as “technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.” (p. 11)
I appreciate the fact that Siegel doesn’t gush about the wonders of data and technology to the hyperbolic degree that is common today; he keeps a level head as he describes what can be done in realistic and practical terms. Here’s what he says about data:
As data piles up, we have ourselves a genuine gold rush. But data isn’t the gold. I repeat, data in its raw form is boring crud. The gold is what’s discovered therein. (p. 4)
And again here:
Big data does not exist. The elephant in the room is that there is no elephant in the room. What’s exciting about data isn’t how much of it there is, but how quickly it is growing. We’re in a persistent state of awe at data’s sheer quantity because of one thing that does not change: There’s always so much more today than yesterday. Size is relative, not absolute. If we use the word big today, we’ll quickly run out of adjectives: “big data,” “bigger data,” “even bigger data,” and “biggest data,” The International Conference on Very Large Databases has bee running since 1975. We have a dearth of vocabulary with which to describe a wealth of data…
There’s a ton of it—so what? What guarantees that all this residual rubbish, this by-product of organizational functions, holds value? It’s no more than an extremely long list of observed events, an obsessive-compulsive enumeration of things that have happened.
The answer is simple. Everything is connected to everything else—if only indirectly—and this is reflected in data…
Data always speaks. It always has a story to tell, and there’s always something to learn from it…Pull some data together and, although you can never be certain what you’ll find, you can be sure you’ll discover valuable connections by decoding the language it speaks and listening. (pp. 78 and 79)
Siegel demonstrates that you can embrace technology without becoming a drooling idiot sitting around the campfire singing Kumbayah and toasting the imminence of the Singularity while chugging homemade wine produced by an algorithm:
I have good news: a little prediction goes a long way. I call this The Prediction Effect, a theme that runs throughout the book. The potency of prediction is pronounced—as long as the predictions are better than guessing. The Effect renders predictive analytics believable. We don’t have to do the impossible and attain true clairvoyance. The story is exciting yet credible: Putting odds on the future to lift the fog just a bit off our hazy view of tomorrow means pay dirt. In this way, predictive analytics combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales. (p. XVI)
This is a great introduction to predictive analytics. It won’t teach you how to develop predictive models, but it surveys the territory, explains why it’s worthwhile, and points you in the right direction if you want to claim some of this territory as your own.