Visual Business Intelligence


	Thanks for taking the time to read my thoughts about Visual Business Intelligence. This blog provides me (and others on occasion) with a venue for ideas and opinions that are either too urgent to wait for a full-blown article or too limited in length, scope, or development to require the larger venue. For a selection of articles, white papers, and books, please visit my library.

Something Going Up Is Not Always Good

August 7th, 2017

Even though our unique ability to deal with complexity propelled humans to the top of the evolutionary heap, we still crave simplistic (i.e., overly simple) explanations. I promote the value of simplicity in my work, but never simplicity that sacrifices truth. Simple things can and should be explained simply. Complex things can and should be explained as simply as possible, but never in a way that disregards or misrepresents their complexity.

When people hold simplistic assumptions about data, we should educate them, not accommodate their ignorance. One such assumption is that, in a time series, values going up are always good and values going down are always bad. I find it odd that people tend to interpret data in this manner, because no one interprets life in this manner. While we consider it good when our incomes go up or our health improves, we have no trouble recognizing that the cost of food going up or increases in suffering are bad. Why would we interpret data in this naive manner?

How do you deal with the commonplace exceptions to the â€œgoing up is good assumption,â€ such as the variance between actual and budgeted expenses? When considering expenses, being over budget is usually considered bad. Through the years of teaching data visualization courses, participants in my classes have often suggested that this assumption should be accommodated by reversing the quantitative scale, placing the negative values (i.e., under budget) above and the positive values (i.e., over budget) below. Is this an appropriate solution? Representing negative values as going up creates a new source of confusion, and does so unnecessarily.

Rather than accommodating ignorance by twisting data into awkward arrangements, why not correct the error instead? It is easy to explain that things going up arenâ€™t always good in a way that everyone can understand. When specific cases of ignorance can be banished so quickly, easily, and permanently, why perpetuate it?

Data sensemaking and communication fundamentally seek to replace ignorance with understanding. Everything that we do in this venture should be done with this in mind. When we accommodate ignorance, we condone and encourage it. Doing so undermines the integrity of our work and the outcomes that we should be working hard to achieve.

Take care,

Signature

7 Comments

Confusing Expressions of Relative Proportions

July 17th, 2017

During elementary school we learn to reason quantitatively in fundamental ways. One of the concepts that we learn along the way is that of proportions. We are taught to express a value that is greater than another either in terms of multiplication (e.g., â€œThe value of A is three times the value of Bâ€), as a ratio (e.g., a 3 to 1 ratio), as a fraction in which the numerator is greater than the denominator, usually with a denominator of 1 Â (e.g., 3/1), or as a percentage that is greater than 100% (e.g., 300%). We are taught to express a value that is less than another either as a ratio (e.g., 1 to 3), as a fraction with a numerator that is less than the denominator, usually with a numerator of 1 (e.g., 1/3), or as a percentage that is less than 100% (e.g., 33%). In later years, however, some of us begin to express proportions in confusing and sometimes inaccurate ways.

Consider a case in which the value of A is $100 and the value of B is $300. To express the greater value of B proportionally as a percentage of Aâ€™s value, would it be accurate to say that B is 300% greater than A? No, it wouldnâ€™t. B is only 200% greater than A (300% – 100% = 200%). It is correct, however, to say that â€œthe value of B is 300% the value of A.â€ To avoid confusion for most audiences, it usually works better to express this proportional difference in terms of multiplication, such as â€œThe value of B is three times the value of A.â€

Confusion can also occur when we describe lesser proportions. Recently, while reading a book by a neuroscientist who has closely studied how humans reason quantitatively, I came across the unexpectedly confusing expression â€œa million times less.â€ As I understand it, you can reduce a value through multiplication only by multiplying it by a value that is less than one (e.g., a fraction such as 1/3 or a negative value such as -1). The author should have expressed the lesser proportion as â€œone millionth,â€ which is conceptually clear.

Consider the following results that encountered in Google News:

Notice the sentence attributed to Business Insider: â€œMacs are 3 times cheaper to own than Windows PCsâ€¦â€ Is the meaning of this proportion clear? It isnâ€™t clear to me. It makes sense to say that something is three times greater, but not three times less. What the writer should have said was â€œMacs are one-third as expensive to own as Windows PCs,â€ or could have reversed the comparison by describing the greater proportion, as Computerworld did: â€œIBM says it is 3X more expensive to manage PCs than Macs.â€ When you describe a lesser proportion, express the difference either as a fraction with a numerator that is less than the denominator, usually with a numerator of 1 (e.g., 1/3rdÂ the cost), as a percentage less than 100% (e.g., 33% of the cost), or as a ratio that begins with the smaller value (e.g., a cost ratio of 1 to 3).

People often struggle to understand proportions. We can prevent many of these misunderstandings by expressing proportions properly.

Take care,

Signature

4 Comments

Data Analysts Must Be Critical Thinkers

July 13th, 2017

During my many years of teaching, I have often been surprised to discover a lack of essential thinking and communication skills among the educated. Back when I was in graduate school in Berkeley studying religion from a social science perspective, I taught a religious studies course to undergraduate students at San Jose State University. When I first began grading my studentsâ€™ assignments, I was astounded to discover how poorly many of my students expressed themselves in writing. There were delightful exceptions, of course, but several of my students struggled to construct a coherent sentence. Much of my time was spent correcting failures of communication rather than failures in grasping the course material. Many years later, when I taught data visualization in the MBA program at U.C. Berkeleyâ€™s Haas School of Business, I found that several of my students struggled to think conceptually, even though the concepts that I taught were quite simple. They were more comfortable following simple procedures (â€œDo this; donâ€™t do that.â€) without understanding why. In the 14 years since I founded Perceptual Edge, Iâ€™ve had countless opportunitiesâ€”in my courses, on my blog, in my discussion forum, and when reviewing academic researchâ€”to observe people making arguments that are based on logical fallacies. These are people whose work either directly involves or indirectly supports data analysis. This horrifies me. This is one of the reasons why analytics initiatives frequently fail. No analytical technologies or technical skills will overcome a scarcity of sound reason.

Many of those tasked with data sensemakingâ€”perhaps mostâ€”have never been trained in critical thinking, including basic logic. Can you analyze data if you donâ€™t possess critical thinking skills? You cannot. How many of you took a critical thinking course in college? Iâ€™ll wager that relatively few of you did. Perhaps you later recognized this hole in your education and worked to fill the gap through self-study. Good for you if you did. Critical thinking does not come naturally; it requires study. Even though I received instruction in critical thinking during my school years, Iâ€™ve worked diligently since that time to supplement these skills. Many books on critical thinking line my bookshelves.

Good data analysts have developed a broad range of skills. Training in analytical technologies is of little use if you havenâ€™t already learned to think critically. If you recognize this gap in your own skills, you neednâ€™t despair, for you can still develop them now. A good place to start is the book Asking the Right Questions: A Guide to Critical Thinking, by M.N. Browne and S.M. Keeley.

Take care,

Signature

4 Comments

The Devaluation of Expertise

July 11th, 2017

Like it or not, we rely heavily on experts to function as a society. Expertiseâ€”high levels of knowledge and skill in particular realmsâ€”fuels human progress and continues to maintain it. For this reason, it is frightening to observe the ways in which expertise has been devalued in the modern world, nowhere more so than in America.

My most vivid and direct observations of this problem involve the ways that my own area of expertiseâ€”data visualizationâ€”has been diluted by the ease with which anyone with a modicum of experience can claim to be a data visualization expert today. Learn how to use a product such as Tableau or Power BI today, or Xcelsius a few years ago, and youâ€™re suddenly a data visualization expert. Write a blog about data visualization and you certainly must be an expert. With the relative ease of publication today, you can even write a book about data visualization without ever developing more than a superficial understanding. This is nonsense, it is frustrating to those of us who have actually developed expertise, and it is downright harmful to people who accept advice from faux-experts.

My other direct observation of this phenomenon is the way in which the Internet has inclined people to believe that they are instant experts in anything that they can read about on the Web. Not only do some of the people with scant data visualization knowledge who write comments in response to this blog believe that they know more about it than I do, but many of us are inclined to instruct our medical doctors or our attorneys after an hour or two of Web browsing. We even have the temerity to call simple Web searches â€œresearch,â€ disrespecting those whose work involves actual research. The phrases, â€œIâ€™m doing research onâ€¦â€ and â€œIâ€™m an expert inâ€¦â€ used to mean more than they do today.

Iâ€™m not alone in my concern about this. I just finished reading a book by Tom Nichols entitled The Death of Expertise: The Campaign Against Established Knowledge and Why it Matters, which clearly describes this problem in great breadth and depth.

The bookâ€™s title is a bit of a misnomer, no doubt chosen to get our attention, for Nichols isnâ€™t arguing that expertise is going away, but that its value is being devalued and ignored. Hereâ€™s an excerpt from the bookâ€™s jacket:

Thanks to technological advances and increasing levels of education, we have access to more information than ever before. Yet rather than ushering in a new era of enlightenment, the information age has helped fuel a surge in narcissistic and misguided intellectual egalitarianism that has crippled informed debates on any number of issues. Today, everyone knows everything: with only a quick trip through WebMD or Wikipedia, average citizens believe themselves to be on an equal intellectual footing with doctors and diplomats. All voices, even the most ridiculous, demand to be taken with equal seriousness, and any claim to the contrary is dismissed as undemocratic elitism.

As I mentioned earlier, this problem is perhaps most extreme in America. We have always prided ourselves on being self-made and resistant to intellectual elitism. Itâ€™s a deeply ingrained strain of the American myth. Nichols writes:

Americans have reached a point where ignorance, especially of anything related to public policy, is an actual virtue. To reject the advice of experts is to assert autonomy, a way for Americans to insulate their increasingly fragile egos from ever being told theyâ€™re wrong about anything. It is a new Declaration of Independence: no longer do we hold these truths to be self-evident, we hold all truths to be self-evident, even the ones that arenâ€™t true. All things are knowable and every opinion on any subject is as good as any otherâ€¦The foundational knowledge of the average American is now so low that it has crashed through the floor of â€œuninformed,â€ passed â€œmisinformedâ€ on the way down, and is now plummeting to â€œaggressively wrong.â€ People donâ€™t just believe dumb things; they actively resist further learning rather than let go of those beliefs.

This isnâ€™t all due to the Internet. Other factors are contributing to the devaluation of expertise as well, including our institutions of higher learning.

Higher education is supposed to cure us of the false belief that everyone is as smart as everyone else. Unfortunately, in the twenty-first century the effect of widespread college attendance is just the opposite: the great number of people who have been in or near a college think of themselves as the educated peers of even the most accomplished scholars and experts. College is no longer a time devoted to learning and personal maturation; instead, the stampede of young Americans into college and the consequent competition for their tuition dollars have produced a consumer-oriented experience in which students learn, above all else, that the customer is always right.

I observed during my own time of teaching at U.C. Berkeley that institutions of higher learning have become businesses that do what they must to compete for customers. Professors must please their students (customers) by providing them with an enjoyable experience if they wish to keep their jobs. Learning, however, is hard work.

Journalism also contributes to this problem when it focuses on giving readers what they want, making the news entertaining, rather than seeking to truthfully and thoroughly inform the public. The customer is not always right. The public can be easily entertained into a state of ignorance.

Experts sometimes get it wrong, but true experts still know a lot more about their fields of knowledge than the rest of us and they get it right a lot more often than we do. Occasional errors by experts are no excuse for turning our backs on knowledge.

Democracy cannot function when every citizen is an expert. Yes, it is unbridled ego for experts to believe they can run a democracy while ignoring its voters; it is also, however, ignorant narcissism for laypeople to believe that they can maintain a large and advanced nation without listening to the voices of those more educated and experienced than themselves.

Look where the devaluation of expertise has taken us in America. We now have a president who is the poster child of narcissistic ignorance whose only expertise is in being a media celebrity. This is a slap in the face of the expertise that built this nation and made it strong. America did not become a city on a hill for the world to see and emulate by celebrating ignorance. History has revealed more than once what happens when you place extraordinary power into the hands of a narcissistic bully. This has perhaps never been done, however, with someone who exhibits Trumpâ€™s degree of prideful ignorance.

What do we do? Nichols reminds us that â€œMost causes of ignorance can be overcome, if people are willing to learn.â€ Are we willing to learn? That doesnâ€™t seem to be the case.

The creation of a vibrant intellectual and scientific culture in the West and in the United States required democracy and secular tolerance. Without such virtues, knowledge and progress fall prey to ideological, religious, and populist attacks. Nations that have given in to such temptations have suffered any number of terrible fates, including mass repression, cultural and material poverty, and defeat in war.

How can we get back on track? It might take a disaster of spectacular scale to turn the tide. I hope this isnâ€™t the case, but no divine power will bail us out if we continue on our current course. We must do what humans have always done to thrive and advance. We must use our brains.

Take care,

Signature

11 Comments

Basta, Big Data: Itâ€™s Time to Say Arrivederci

June 27th, 2017

One of my favorite Italian words is â€œbasta,â€ followed by an exclamation point. No, basta does not mean â€œbastardâ€; it means â€œenough,â€ as in â€œIâ€™ve had ENOUGH of you!â€ Weâ€™ve definitely had enough of Big Data. As a term, Big Data has been an utter failure. It has never managed to mean anything in particular. A term that means nothing in particular means nothing at all. The term can legitimately claim two outcomes that some consider useful:

It has sold a great many products and services. Those who have collected the revenues love the term.
It has awakened some people to the power of data to inform decisions. The usefulness of this outcome, however, is tainted by the deceit that some data today is substantially different from data of the past. As a result, Big Data encourages organizations to waste time and money seeking an illusion.

If youâ€™ve thought much about Big Data, youâ€™ve noticed the confusion that plagues the term. What is Big Data? This question lacks a clear answer for the following reasons:

There are almost as many definitions of Big Data as there are people with opinions.
None of the many definitions that have been proposed describe anything about data and its use that is substantially different from the past.
Most of the definitions are so vague or ambiguous that they cannot be used to determine, one way or the other, if a particular set of data or use of data qualifies as Big Data.

The term remains what it was when it first became popularâ€”a marketing campaign, and as such, a source of fabricated need and endless confusion. Nevertheless, like spam, it refuses to go away. Why does this matter? Because chasing illusions is a waste of time and money that also carries a high cost of lost opportunity. It makes no sense to chase Big Data, whatever you think it is, if you continue to derive little or no value from the data that you already have.

Ill-defined terms that capture minds and hearts, as Big Data has, often exert influence in irresponsible and harmful ways. Big Data has been the basis for several audacious claims, such as, â€œNow that we have Big Dataâ€¦â€

â€œâ€¦we no longer need to be concerned with data qualityâ€
â€œâ€¦we no longer need to understand the nature of causalityâ€
â€œâ€¦science has become a thing of the pastâ€
â€œâ€¦we canâ€™t survive without itâ€

People who make such claims either donâ€™t understand data and its use or they are trying to sell you something. Even more disturbing in some respects are the ways in which the seemingly innocuous term Big Data has been used to justify unethical practices, such as gleaning information from our private emails to support targeted adsâ€”a practice that Google is only now abandoning.

Data has always been big and getting bigger. Data has always been potentially valuable for informing better evidence-based decisions. On the other hand, data has always been useless unless it can inform us about something that matters. Even potentially informative data remains useless until we manage to make sense of it. How we make sense of data involves skills and methods that have, with few exceptions, been around for a long time. Skilled data sensemakers have always made good use of data. Those who donâ€™t understand data and its use mask their ignorance and ineffectiveness by introducing new terms every few years as a bit of clever sleight of hand.

The definitions of Big Data that Iâ€™ve encountered fall into a few categories. Big Data isâ€¦

â€¦data sets that are extremely large (i.e., an exclusive emphasis on volume)
â€¦data from various sources and of various types, some of which are relatively new (i.e., an exclusive emphasis on variety)
â€¦data that is both large in volume and derived from various sources (and is sometimes also produced and acquired at fast speeds, to complete the three Vs of volume, velocity, and variety)
â€¦data that is especially complex
â€¦data that provides insights and informs decisions
â€¦data that is processed using advanced analytical methods
â€¦any data at all that is associated with a current fad

Letâ€™s consider the problems that are associated with the definitions in each of these categories.

Data Sets That Are Extremely Large

According to the statistical software company SAS:

Big data is a term that describes the large volume of data â€“ both structured and unstructured â€“ that inundates a business on a day-to-day basis.

SAS.com

This definition fails in several respects, not the least of which is its limitation to business data. The fundamental problem with definitions such as this, which focus primarily on the size of data as the defining factor, is their failure to specify how large data must be to qualify as Big Data rather than just plain data. Large data sets have always existed. What threshold must be crossed to transition from data to Big Data? This definition doesnâ€™t say.

Hereâ€™s a definition that attempts to identify the threshold:

Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.

Vangie Beal, Webopedia.com

Do you recognize the problem of defining the threshold in this manner? What are â€œtraditional database and software techniquesâ€? The following definition is slightly less vague:

Big data means data that cannot fit easily into a standard relational database.

Hal Varian, Chief Economist, Google

(Source Note: All of the definitions that I quote that are attributed to an individual, independent of a particular publication, appeared in an article written by Jennifer Dutcher of the U.C. Berkeley School of Information titled â€œWhat is Big Data?â€ on September 3, 2014.)

In theory, there are no limits to the amount of data that can be stored in a relational database. Databases of all types have practical limits. People have suggested technology-based volume thresholds of various types, including anything that cannot fit into an Excel spreadsheet. All of these definitions establish arbitrary limits. Some are based on arbitrary measures.

Big data is data that even when efficiently compressed still contains 5-10 times more information (measured in entropy or predictive power, per unit of time) than what you are used to right now.

Vincent Granville, Co-Founder, Data Science Central

So, if you are accustomed to 1,000 row Excel tables, a simple SQL Server database consisting of 5,000 to 10,000 rows qualifies as Big Data. Such definitions highlight the uselessness of arbitrary limits on data volume. Hereâ€™s another definition that acknowledges its arbitrary nature:

Big data is whenâ€¦the standard, simple methods (maybe itâ€™s SQL, maybe itâ€™s k-means, maybe itâ€™s a single server with a cron job) break down on the size of the data set, causing time, effort, creativity, and money to be spent crafting a solution to the problem that leverages the data without simply sampling or tossing out records.

Â John Foreman, Chief Data Scientist, MailChimp

Some definitions acknowledge the arbitrariness of the threshold without recognizing it as a definitional failure:

The term big data is really only useful if it describes a quantity of data thatâ€™s so large that traditional approaches to data analysis are doomed to failure. That can mean that youâ€™re doing complex analytics on data thatâ€™s too large to fit into memory or it can mean that youâ€™re dealing with a data storage system that doesnâ€™t offer the full functionality of a standard relational database. Whatâ€™s essential is that your old way of doing things doesnâ€™t apply anymore and canâ€™t just be scaled out.

John Myles White

What good is a definition that is based on a subjective threshold in data volume?

The following definition acknowledges that, when based on data volume, what qualifies as Big Data not only varies from organization to organization, but over time as well:

Big data is data that contains enough observations to demand unusual handling because of its sheer size, though what is unusual changes over time and varies from one discipline to another. Scientific computing is accustomed to pushing the envelope, constantly developing techniques to address relentless growth in dataset size, but many other disciplines are now just discovering the value â€” and hence the challenges â€” of working with data at the unwieldy end of the scale.

Annette Greiner, Lecturer, UC Berkeley School of Information

Not only do these definitions identify Big Data in a manner that lacks objective boundaries, they also acknowledge (perhaps inadvertently) that so-called Big Data has always been with us, for data has always been on the increase in a manner that leads to processing challenges. In other words, Big Data is just data.

There is a special breed of volume-based definitions that advocate â€œCollect and store everything.â€ Here is the most thorough definition of this sort that Iâ€™ve encountered:

The rising accessibility of platforms for the storage and analysis of large amounts of data (and the falling price per TB of doing so) has made it possible for a wide variety of organizations to store nearly all data in their purview â€” every log line, customer interaction, and event â€” unaggregated and for a significant period of time. The associated ethos of â€œstore everything now and ask questions laterâ€ to me more than anything else characterizes how the world of computational systems looks under the lens of modern â€œbig dataâ€ systems.

Josh Schwartz, Chief Data Scientist, Chartbeat

These definitions change the nature of the threshold from a measure of volume to an assumption: you should collect everything at the lowest level of granularity, whether useful or not, for you never know when it might be useful. Definitions of this type are a hardware vendorâ€™s dream, but they are an organizationâ€™s nightmare, for the cost of unlimited storage extends well beyond the cost of hardware. The time and resources that are required to do this are enormous and rarely justified. Anyone who actually works with data knows that the vast majority of the data that exists in the world is noise and will always be noise. Donâ€™t line the pockets of hardware vendor executives with gold by buying into this ludicrous assumption.

Data from Various Sources and of Various Types

Independent of a data setâ€™s size, some definitions of Big Data emphasis its variety. Hereâ€™s one of the clearest:

Whatâ€™s â€œbigâ€ in big data isnâ€™t necessarily the size of the databases, itâ€™s the big number of data sources we have, as digital sensors and behavior trackers migrate across the world. As we triangulate information in more ways, we will discover hitherto unknown patterns in nature and society â€” and pattern-making is the wellspring of new art, science, and commerce.

Quentin Hardy, Deputy Tech Editor, The New York Times

Definitions that emphasize variety suffer from the same problems as those that emphasize volume: where is the threshold? How many data sources are needed to qualify data as Big Data? In what sense does the addition of new data sourcesâ€”something that is hardly newâ€”change the nature of data or its use? It doesnâ€™t. New data sources are sometimes useful and sometimes not. Collecting and storing every possible source of data is no more productive than collecting and storing every instance of data.

Data That Exhibits High Volume, Velocity, and Variety

Iâ€™ll use Gartnerâ€™s definition to represent this category in honor of the fact that Doug Laney of Gartner was the first to identify the three Vs (volume, velocity, and variety) as game changers.

Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

Gartner

Combining volume and variety, plus adding velocityâ€”the speed at which data is generated and acquiredâ€”produces definitions that suffer from all of the problems that Iâ€™ve already identified. Increases in volume, velocity, and variety have been with us always. They have not fundamentally changed the nature of data or its use.

Data That Is Especially Complex

Some definitions focus on the complexity of data.

While the use of the term is quite nebulous and is often co-opted for other purposes, Iâ€™ve understood â€œbig dataâ€ to be about analysis for data thatâ€™s really messy or where you donâ€™t know the right questions or queries to make â€” analysis that can help you find patterns, anomalies, or new structures amidst otherwise chaotic or complex data points.

Philip Ashlock, Chief Architect, Data.gov

You can probably anticipate what Iâ€™ll say about definitions of this sort: once again they lack of a clear threshold and identify a quality that has always been true of data. How complex is complex enough and at what point in history has data not exhibited complexity?

Data ThatÂ Provides Insights and Informs Decisions

As you no doubt already anticipate, definitions in this category exhibit the same problems as those in the categories that weâ€™ve already considered. Hereâ€™s an example:

Big Data has the potential to help companies improve operations and make faster, more intelligent decisions. This dataâ€¦can help a company to gain useful insight to increase revenues, get or retain customers, and improve operations.

Vangie Beal, Webopedia.com

Data That Is Processed Using Advanced Analytical Methods

According to definitions in this category, it is nothing about the data itself that determines Big Data, but is instead the methods that are used to make sense of it.

The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data.

Wikipedia

Some of these definitions allow quite a bit of leeway regarding the nature of the advanced methods, while others are more specific, such as the following:

Big data is an umbrella term that meansâ€¦the possibility of doing extraordinary things using modern machine learning techniques on digital data. Whether it is predicting illness, the weather, the spread of infectious diseases, or what you will buy next, it offers a world of possibilities for improving peopleâ€™s lives.

Shashi Upadhyay, CEO and Founder, Lattice Engines

What analytical methods qualify as Big Data? The answer usually depends on the methods that the person who is quoted uses or sells. Can you guess what kind of software Lattice Engines sells?

Advanced methods that are considered advanced have been around for a long time. Even most of the methods that are identified as advanced today when defining Big Data have been around for quite some time. For example, even though computers were not always powerful enough to run machine-learning algorithms on large data sets, these algorithms are fundamentally based on traditional statistical methods.

A few of the definitions in this category have emphasized advanced skills rather than technologies, such as the following:

As computational efficiency continues to increase, â€œbig dataâ€ will be less about the actual size of a particular dataset and more about the specific expertise needed to process it. With that in mind, â€œbig dataâ€ will ultimately describe any dataset large enough to necessitate high-level programming skill and statistically defensible methodologies in order to transform the data asset into something of value.

Reid Bryant, Data Scientist, Brooks Bell

Once again, however, there is nothing new about these skills.

Any Data at All that Is Associated With a Current Fad

Some definitions of Big Data apply the term to anything data related that is trending. Hereâ€™s an example:

I see big data as storytelling â€” whether it is through information graphics or other visual aids that explain it in a way that allows others to understand across sectors.

Mike Cavaretta, Data Scientist and Manager, Ford Motor Company

This tendency has been directly acknowledged by Ryan Swanstrom in his Data Science 101 blog: â€œNow big data has become a buzzword to mean anything related to data analytics or visualization.â€ This is what happens with fuzzy definitions. They can be easily manipulated to mean anything you wish. As such, they are meaningless and useless.

Now What?

The definitional messiness and thus uselessness of the term Big Data is far from unique. Many information technology terms exhibit these dysfunctional traits. Iâ€™ve worked in the field that goes by the name â€œbusiness intelligenceâ€ for many years, but this industry has never adhered to or lived up to the definition provided by Howard Dresner, who coined the term: â€œConcepts and methods to improve business decision making by using fact-based support systems.â€ Instead, the term has primarily functioned as a name for technologies and processes that are used to collect, store, and produce automated reports of data. Rarely has there been an emphasis on â€œconcepts and methods to improve business decision making,â€ which features humans rather than technologies. This failure of emphasis has resulted in the failure of most business intelligence efforts, which have produced relatively little intelligence.

All of the popular terms that have emerged during my career to describe the work that I and many others do with data, including decision support, data warehousing, analytics, data science, and of course, Big Data, have been plagued by definitional dysfunction, leading to confusion and bad practices. I prefer the term â€œdata sensemakingâ€ for the concepts, methods, and practices that we engage in to understand data. And to promote the value of data as the raw material from which understanding is woven, healthcare has suggested one of the most useful terms: â€œevidence-based medicine.â€ In its generic form, â€œevidence-based decision makingâ€ is simple, straightforward, and clear. If we used these terms to describe the work and its importance, we would stop wasting time chasing illusions and would focus on whatâ€™s fundamentally needed: data sensemaking skills, augmented by good technologies, to support evidence-based decision making. Perhaps then, we would make more progress.

Letâ€™s say â€œgoodbyeâ€ to the term Big Data. It doesnâ€™t mean anything in particular and all of the many things that people have used it to mean merely refer to data. Do we really need a new term to promote the importance of evidence-based decision making? The only people who are prepared to glean real value from data donâ€™t need a new term or a marketing campaign. Rallying those who donâ€™t understand data or its use will only lead to good outcomes if we begin by helping them understand. Meaningless terms lead in the opposite direction.

Take care,

Signature

52 Comments

Something Going Up Is Not Always Good

Confusing Expressions of Relative Proportions

Data Analysts Must Be Critical Thinkers

The Devaluation of Expertise

Basta, Big Data: Itâ€™s Time to Say Arrivederci

Archives