Big Data, Big Deal

Data did not suddenly become big. While it is true that a few new sources of data have emerged in recent years and that we generate and collect data in increasing quantities, changes have been incremental—a matter of degree—not a qualitative departure from the past. Essentially, “big data” is a marketing campaign.

Like many terms that have been coined to promote new interest in data-based decision support (business intelligence, business analytics, business performance monitoring, etc.), big data is more hype than substance and it thrives on remaining ill defined. If you perform a quick Web search on the term, all of the top links other than the Wikipedia entry are to business intelligence (BI) vendors. Interest in big data today is a direct result of vendor marketing; it didn’t emerge naturally from the needs of users. Some of the claims about big data are little more than self-serving fantasies that are meant to inspire big revenues for companies that play in this space. Here’s an example from McKinsey Global Institute (MGI):

MGI studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal-location data could capture $600 billion in consumer surplus.

If you’re willing to put your trust in claims such as a 60% increase in operating margin, a $300 billion annual increase in value, an 8% reduction in expenditures, and a $600 billion consumer surplus, don’t embarrass yourself by trying to quantify these benefits after spending millions of dollars on big data technologies. Using data more effectively can indeed lead to great benefits, including those that are measured in monetary terms, but these benefits can’t be predicted in the manner, to the degree, or with the precision that McKinsey suggests.

When I ask representatives of BI vendors what they mean by big data, two characteristics dominate their definitions:

  1. New data sources: These consist primarily of unstructured data sources, such as text-based information related to social media, and new sources of transactional data, such as from sensors.
  2. Increased data volume: Data, data everywhere, in massive quantities.

Collecting data from new sources rarely introduces data of a new nature; it just adds more of the same. For example, even if new types of sensors measure something that we’ve never measured before, a measurement is a measurement—it isn’t a new type of data that requires special handling. What about all of those new sources of unstructured data, such as that generated by social media (Twitter and its cohorts)? Don’t these unstructured sources require new means of data sensemaking? They may require new means of data collection, but rarely new means of data exploration and analysis.

Do new sources of data require new means of visualization? If so, it isn’t obvious. Consider unstructured social networking data. This information must be structured before it can be visualized, and once it’s structured, we can visualize it in familiar ways. Want to know what people are talking about on Twitter? To answer this question, you search for particular words and phrases that you’ve tied to particular topics and you count their occurrences. Once it’s structured in this way, you can visualize it simply, such as by using a bar graph with a bar for each topic sized by the number of occurrences in ranked order from high to low. If you want to know who’s talking to whom in an email system or what’s linked to what on your Web site, you glean those interactions from your email or Web server and count them. Because these interactions are structured as a network of connections (i.e., not a linear or hierarchical arrangement), you can visualize them as a network diagram: an arrangement of nodes and links. Nodes can be sized to indicate popular people or content and links (i.e., lines that connect the nodes) can vary in thickness to show the volume of interactions between particular pairs of nodes. Never used nodes and links to visualize, explore, and make sense of a network of relationships? This might be new to you, but it’s been around for many years and information visualization researchers have studied the hell out of it.

What about exponentially increasing data volumes? Does this have an effect on data visualization? Not significantly. In my 30 years of experience using technology to squeeze meaning and usefulness from data, data volumes have always been big. When wasn’t there more data than we could handle? Although it is true that the volume of data continues to grow at an increasing rate, did it cross some threshold in the last few years that has made it qualitatively different from before? I don’t think so. The ability of technology to adequately store and access data has always remained just a little behind what we’d like to have in capacity and performance. A little more and a little faster have always been on our wish list. While information technology has struggled to catch up, mostly by pumping itself up with steroids, it has lost sight of the objective: to better understand the world—at least one’s little part of it (e.g., one’s business)—so we can make it better. Our current fascination with big data has us looking for better steroids to increase our brawn rather than better skills to develop our brains. In the world of analytics, brawn will only get us so far; it is better thinking that will open the door to greater insight.

Big data is built on the unquestioned premise that more is better. More of the right data can be useful, but more for the sake of more does nothing but complicate our lives. In the words of the 21st Century Information Fluency Project, we live in a time of “infowhelm.” Just because we can generate and collect more and more data doesn’t mean that we should. We certainly shouldn’t until we figure out how to make sense and use of the data we already have. This seems obvious, but almost no attention is being given to building the skills and technologies that help us use data more effectively. As Richards J. Heuer, Jr. argued in the Psychology of Intelligence Analysis (1999), the primary failures of analysis are less due to insufficient data than to flawed thinking. To succeed analytically, we must invest a great deal more of our resources in training people to think effectively and we must equip them with tools that augment cognition. Heuer spent 45 years supporting the work of the CIA. Identifying a potential terrorist plot requires that analysts sift through a lot of data (yes, big data), but more importantly, it relies on their ability to connect the dots. Contrary to Heuer’s emphasis on thinking skills, big data is merely about more, more, more, which will bury most of the organizations that embrace it deeper in shit.

Is there anything new about data today, big or otherwise, that should be leading us to visualize data differently? I was asked to think about this recently when advising a software vendor that’s trying to develop powerful visualization solutions specifically for managing big data. After wracking my brain, I came up with little. Almost everything that we should be doing to support the visual exploration, analysis, and presentation of data today involves better implementations of visualizations, statistical calculations, and data interactions that we’ve known about for years. Even though these features are old news, they still aren’t readily available in most commercial software today; certainly not in ways that work well. Rather than “going to where no one has gone before,” vendors need to do the less glorious work of supporting the basics well and data analysts need to further develop their data sensemaking skills. This effort may not lend itself to an awe-inspiring marketing campaign, but it will produce satisfied customers and revenues will follow.

I’m sure that new sources of data and increasing volumes might require a few new approaches to data visualization, though I suspect that most are minor tweaks rather than significant departures from current approaches. If you can think of any big data problems that visualization should address in new ways, please share them with us. Let’s see if we can identify a few efforts that vendors should support to truly make data more useful.

Take care,

30 Comments on “Big Data, Big Deal”


By Josh Hoehne. September 19th, 2012 at 1:44 pm

Love your blog. The best explanation of the ‘big data’ movement I’ve heard. Thank you.

By Jorge Camoes. September 19th, 2012 at 4:55 pm

Steve, once upon a time, I had to make monthly sales reports. The sales database was small (around 20,000 new records every month) so I could do it with Excel 97.

Then the company decided to get a new database. Daily updates, more product details, smaller sales territories. Instead of 20,000 new records we were getting around one million. Not an easy task with Excel.

I will not bother you with the grim details of how management and the IT was dealing with this challenge. It was not pretty.

The old cliché “an order of magnitude quantitative change is a qualitative change” summarizes this perfectly. We couldn’t keep making the same reports, the same presentations, the same analysis. Two examples: if you have daily sales, you can react much faster to whatever the competition is doing. But if you have much more detail, you must know what the law of large numbers mean, or else you start jumping to conclusions unsupported by the data.

This “qualitative leap” must be addressed. You must train people, perhaps hire some statisticians. Make more scatterplots and fewer pie charts. Summarize the data, find complex relationships, add alerts, be prepared to react to outliers in a timely manner.

The way I see it, “big data” is not about the number of records or data sources you have to deal with. If you are not prepared to avoid or embrace that “qualitative leap” it doesn’t matter how large or small your database is, and “big data” just means “big troubles”.

Based on historical evidence, I would say that vendors are willing to sell you “a solution” but the real knowledge and the needed cultural change is never included in the package.

So, much of the discussion around big data boils down to marketing hype, and in that sense I fully agree with you. But I’m an optimist, and I believe that huge amounts of data can have a positive effect, pushing the limits of our current tools and routines, forcing us to find new and better ones. And it will impact data visualization as well. Not because we’ll find new approaches but because it will raise awareness to how to use it effectively. Perhaps I’m in Oz now, but will vendors have the nerve to sell shiny pie charts in their “big data solutions”?

By Meic Goodyear. September 20th, 2012 at 1:41 am

To Eliot’s

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

I would add

Where is the information we have lost in data?

By Brandon Jackson. September 20th, 2012 at 3:03 am

We will have evolved when we speak of big ideas instead of the size of our data.

By Nicholas Herold. September 20th, 2012 at 5:58 am

Hah! Steve, you get it exactly right! Thanks for a great piece. IMNSHO It should be required reading for anyone who wants to know what Big Data is, and any managers who think that increasing sources and the size of the data pipeline is a way of getting closer to God, or Nirvana, or Perfection. Whatever clever methods of visualization are developed, the method has got to address both the expertise and the limitations of the end user. In the end, the biggest cost to a company is not the data itself, but in figuring how to interpret and use it.

What is needed in any presentation strategy, visual, numerical tables, or whatever, is a channel that allows the user to quickly find the important bits, while allowing the unimportant ones to slip aside.

By mike barretta. September 20th, 2012 at 7:21 am

The “big data” problem is one of efficient storage, compute, and manipulation.

Emerging platforms are taking care of the physical problems of storage and compute, but the analysis is a user interface problem.

While you might use the same visualizations as an output, creating that visualization is different when crunching through terabytes vs megabytes. New visualization tools provide things like smart sampling of the data or ways to filter and aggregate in the background (like a hadoop job) to allow a responsive user experience.

Also, the strength of “big data” methods is that the signal you are looking for can be far more faint then before. One in a billion can be captured. Consequently, new algorithmic tools need to be available and made intuitive to use.

By Matthew O’Kane. September 20th, 2012 at 8:02 am

Steve,

Although I’ve seen some very interesting new visualisation techniques involving Big Data (in particular when trying to understand very large networks of linked entities), I think you are right in calling the hype out at this moment in the ‘Insight’ area.

I would keep in mind though that there is a lot more to Big Data than visualisation and creating new insights. The main driver of benefit is when predictive analytics is improved through the use of more varied and deeper data sets. It is this area where new techniques are required because the tried and tested regressions and decision trees won’t cut it any more. Check out the great work on Apache Mahout, or the efforts to create new machine learning techniques in R that can fully utilise the vast amount of information we now have available.

By Chael. September 20th, 2012 at 11:58 am

The two terms that are starting to rankle many in the industry are “big data” and “data scientist” — but marketers and LinkedIn junkies love them. I think the characterization that big data means new types of data is accurate, and a few examples that I have see recently include jet engine sensor data, genome sequences, and of course, web logs.

Maybe another way to look at it is that it used to be that the most atomic level of data in a retail data warehouse was a point of sale transaction that tied a single SKU to a place in time. That was a fairly traceable artifact of data. But now, if you wish to scan stack trace errors across hundreds of web server nodes, you can’t simply package that into a relational database, and you can’t package the data into a typical graph. I think that there are lots of interesting challenges in store for the data visualization community when the data at the lowest level is more insignificant than the SKU example, and more creativity is needed to address how to speed the comprehension of the data by the end users.

Thanks for constantly raising the bar, Stephen.

By Or Shoham. September 23rd, 2012 at 8:45 am

Interesting read, and interesting comments as well.

My take, for the moment, is that Big Data is something that is relevant for the few, rather than the many. I could see plenty of use for both better handling of large data volumes, and perhaps more novel analytical options, for things like gauging the effectiveness of a Google Ads campaign. You could easily wind up with a staggering amount of views/clicks/buys, each of which can be associated with a large dataset (geo-location, browser, source page, time spent on page, pages viewed, etc). Analyzing these with traditional BI tools would be hopeless, which suggests you’d need different techniques or a more data-mining-oriented approach. While these would probably be hopelessly complex to the “average” user, there is almost certainly value hiding in the details for a trained analyst.

In cases such as this, there is probably a place for new visualizations, new techniques, or modifications to existing visualizations and techniques. “Common path” graphs, for example, would be an interesting way to visualize the Google Ads example above – what is the most common path taken, which informative values drive users toward that path, which drive them away, and easy navigation to focus either on these subgroups, or on the next most common path, or on the largest deviation factors during a specific step of the path. Likewise, goal-oriented analysis starting from the desirable outcome (a completed sale, for example) and tracing back could be very interesting. None of the BI tools I’m familiar with are even close to offering anything resembling this sort of functionality – they’re still churning out bar and pie charts.

Having said all that, I am reasonably sure that for the vast majority of the BI market, Big Data is nothing more than a market buzzword that has little to no impact on their business needs. As long as the majority of available data remains transactional (such as a twitter post, or a search engine term), Stephen is right on the money by saying nothing has really changed except the size.

By ngarbis. September 24th, 2012 at 2:10 pm

Hi Stephen –

Enjoy your blog (this one included), but I think you are missing the point by focusing on the visualization aspect alone. Yes, there is a lot of data (and there has been since before the term BIG DATA came around). There’s clearly more data available now, and I will agree that that doesn’t, on its own, mean that value will fall from the sky. It does represent some increased potential value.

The real story in my opinion is that the tools available for the analysis today are very different — much more powerful — than those of 5 and 10 years ago. This means that, yes, we can do something with the larger volumes and newer sources of data than we were able to do previously. Combine these tools with the growing capabilities of individuals and organizations to do deep analytics and there you will find new streams of value being cretated — whether they be greater efficiency through richer predictive models on parts failures, or a better marketing campaign.

As you say, the data viz tools and approaches and principles may be largely unchanged, but that doesn’t mean there isn’t something transformative underway in the larger analytics space.

I don’t subscribe to the hype here, but I see a transformation in front of my eyes.

By Andrew. September 25th, 2012 at 9:22 am

“Just because we can generate and collect more and more data doesn’t mean that we should.”

Liked the whole article, but liked that part the best. I’ve long felt that the “big data” subject has a smell of people indiscriminately collecting data first and trying to understand them later.

By Chuck Hollis. September 29th, 2012 at 2:06 pm

Hi Stephen

I read your comments with a wry smile on my face.

As a member of the vendor community (EMC, in particular) I frequently meet people who — somewhat justifiably — claim “well, we’ve seen this all before, what could possibly be new or interesting?” and accusing vendors of being somewhat over-exuberant in promoting a new idea.

Please don’t get me wrong, a healthy skepticism is a good thing, especially when it comes to technology vendors! Having been around the technology biz for over three decades, your line of thinking has been previously expressed around minicomputers, desktop computing, the Internet, GUIs, the iPad, cloud, and so on. Sooner or later, people do appreciate that something very new and different is at hand, and we all move forward.

I was also amused that you referred back to BI vendors and their community. Frankly speaking, BI has about as much to do with big data as dial-up modems have to do with the today’s internet. Indeed, most of the skepticism around anything to do with big data comes from the traditional BI and data management community.

I’ve been more than a few meetings where someone like yourself will challenge people like me to define what could possibly be different or relevant about big data analytics than the more traditional BI that came before it.

Here’s what I point to:

— the ability to correlate and extract value from wildly diverse data sources, e.g. text, video, etc.
— a bias towards experimenting with data around new questions vs reporting on the past
— a closed-loop process to build better predictive models around key questions

Most traditional BI/DW environments I encounter have a limited number of data sources, usually structured and internally generated. They tend to do reporting on well-understand business processes vs. agilely tackling interesting new questions. And I don’t often see a heavy investment in the sophisticated mathematical models that are part and parcel of data science.

Is there a continuum where an advanced BI practitioner could be considered a data scientist? Of course — but that seems to be the exception, rather than the rule.

Is there a lot of hype floating around the industry? Of course — that’s what always happens when there’s something new and exciting at hand. Does the existence of hype eliminate the possibility that there might be some new and powerful concepts at hand? Of course not.

I would encourage you and your brethren to look towards the opportunity at hand — as it is quite transformational when fully appreciated. If you’re interested in a view of how one IT vendors sees the opportunity, I’ve written a synopsis here: http://chucksblog.emc.com/chucks_blog/2012/09/emc-and-big-data-an-overview.html

Best regards

— Chuck Hollis

By Visualign. September 29th, 2012 at 3:15 pm

The main message of this article – that most methods used in analysis and visualization of Big Data have been around for a while – misses the bigger point. Which is that applying those methods to larger and new types of data enables different insights and new business models.

Let’s consider just two examples. Take Twitter sentiment analysis. I remember one vendor showing an example of it’s Big Data capability to forecast 2nd week movie sales based on sentiment analysis for the movie at the end of its 1st week of showing. Yes, the visualization of frequencies and correlations was fairly standard, but the ability to do this over a vast amount of unstructured data is new. While the methods were available, you could not have done this (cost-effectively) 10 years ago. At least some insights require just more “brawn”, not “better thinking”.

Or take Google Maps. Of course, the visualization techniques to render maps, overlay terrain lines, satellite images or even street view pictures have been around for a long time. But the ubiquitous and free availability of such data in great detail enables new applications and business models (real estate, tourism, weather, transportation, etc.) When I look at a tool like Google Earth, I think “interest in big data” is NOT “a direct result of vendor marketing”. Instead, a platform has been created spawning use cases the platform creator may never have thought about.

More of the same is not always better, agreed. But when you look at biological evolution, more and better sensors have often enjoyed selective advantages and hence evolved robustly. A bird of prey can spot prey often due to superior eye-sight. Even though the sensor signals are just “more of the same”, they provide advantages such as better night-vision or better resolution without which the prey would not have been detected.

Or look at multi-modality: Most animals have multiple senses which endow them with a certain amount of redundancy when trying to safely navigate our environments. At the neuronal level, adding another modality is just “more of the same”, but it typically leads to selective advantages and qualitatively different adaptations. To remain in the above analogy, an owl may not have better eyes than an eagle, but it combines it with far better hearing. Or a bat, which likewise hunts at night not because of it’s better eyes, but because of it’s own sonar.

“This information must be structured before it can be visualized, and once it’s structured, we can visualize it in familiar ways.” This is a bit like saying that once new information is digitized, it’s just 0’s and 1’s, which is very familiar. True, but besides the point. The point is that new and/or more detailed data can lead to better understanding and often prediction of the environment, with direct benefits for the owner of that data. Just like predatory animals have evolved to have multiple and refined senses, future applications will evolve to take advantage of more modalities and finer detail. That includes senses beyond what’s biologically present, covering larger scale and resolution in time and space and faster reaction and learning times. The fact that at some level it all comes down to processing “more of the same” doesn’t make this evolution any less fascinating.

By Stephen Few. September 29th, 2012 at 3:19 pm

Chuck,

Does that wry smile of yours have any relevant expertise behind it? I see that you’re a VP in EMC’s marketing department. You seem to have mistaken me for someone who hasn’t been around for a long as you and therefore lacks your sense of historical perspective. We’re not talking about the usefulness of “desktop computing, the Internet, GUIs”, etc.; we’re talking about Big Data. I don’t make a habit of lamenting useful technologies. I’ve been working in decision support (business intelligence, analytics, etc.) for over 30 years. During that time, I’ve almost always been directly involved in the work of analytics, not just marketing it, which is quite different and gives me a perspective that is certainly different from and perhaps even more informed than yours. Before taking the time to respond to your comments, I’d like to determine if you have any real experience that qualifies you as an authority. Are you a data analyst? Have you ever worked as a data analyst? Have you ever actually done any of the work that you’re claiming Big Data enables?

If you take the time to become familiar with my work, you’ll see that I have no patience for people who claim expertise in fields that they don’t understand. Unlike most folks in technology marketing departments, perhaps you’re one of those rare exceptions who actually have relevant experience and expertise related to the products and/or services that they promote. If so, please introduce yourself and your credentials, and I’ll gladly enter into a discussion with you. Otherwise, I might be tempted to write a new blog post that reviews some of the groundless statements that you made in the recent blog post about Big Data that you cited above.

By the way, I know a bit about EMC. Early this year I spent three days teaching BI professionals at your headquarters in Massachusetts how to more effectively explore, analyze, and present data. They were a great group of folks. They didn’t give me the impression, however, that EMC is any better at this than the typical organization of its size.

And, just in case you have only read my blog post about Big Data and not the more comprehensive newsletter article that I published yesterday, you might find that interesting and useful. You can download a copy from the Library page of this site.

Steve

P.S. I’m resisting the urge to smile wryly.

By Stephen Few. September 29th, 2012 at 4:27 pm

Visualign,

The main message of my article is not that “most methods used in analysis and visualization of Big Data have been around for a while.” The main message is that we will not develop the insights that we need by focusing on Big Data. We will develop them primarily by improving our data sensemaking skills.

I appreciate your thoughts on the matter and suspect that we agree much more than we disagree on the issues of data-based insights. Using your biological analogies as a point of departure, I’m not arguing that more data and new sources of data won’t enable useful insights. Rather, I’m arguing that we can gain more and better insights more efficiently if we approach data sensemaking more intelligently. Chasing every possibility of so-called Big Data will of course yield some results, much as evolution yields results gradually and by accident (i.e., through mutation. Approaching the opportunity more intelligently, however, by developing the required skills and focusing on useful data to solve real problems that matter will yield greater rewards more efficiently.

By Chuck Hollis. September 29th, 2012 at 7:16 pm

Stephen:

Your defensive mechanisms are functioning quite well, so congratulations. If you don’t like the message, attack the messenger. Been there, seen that — but no worries, I never take it personally!

The underlying premise though (independent of your dismissal of me) remains the same: something very big is going on, people are starting to figure out, and the new patterns are very different than the old ones. There are enough people who have grasped the key concepts, how they’re different, and are starting to put them into practice. Many have not — that’s how change always happens.

I would not rest too heavily on your historical expertise, though. The studies have shown that the newer breed of data science professionals are very distinct (education, mindset, intellectual orientation, etc.) from traditional data analysts. That research has been correlated with my own experiences and interactions. I would be glad to send you copies if you’re interested.

One thing we would both probably violently agree on: technology alone is not the answer here. The real challenge (as I see it) is broader organizational proficiency around analytics — learning to understand the meaning of data in new ways, and driving coordinated responses to the insights revealed. The focus becomes around leadership: their roles and supporting their aspirations.

As you point out, many organizations (including EMC) are not widely proficient yet. That being said, the patterns are emerging around exactly how organizations go about gaining that proficiency, which I believe is more relevant than any particular technology, or — for that matter — arguing around a buzzword such as “big data”. For example, you might surprised at some of the things we’ve learned to do within our own business in just the last year or so.

I’m looking forward to spending the time to read more of your material, though …

Best regards

— Chuck

By Stephen Few. September 29th, 2012 at 8:30 pm

Chuck,

There was not a shred of defensiveness in my response. I was definitely on the offensive. You haven’t responded to my questions about your experience in the fields of analytics, decision support, or anything related. I’ll assume that you have none. If this is the case, then I am indeed dismissive of your opinions on the topic. Expertise does not exist without experience. People who claim expertise that they lack deserve to be ignored.

You shouldn’t assume that my 30 years of experience indicates stagnation. It doesn’t. You are definitely not familiar with my work. I teach that newer breed of data analyst of which you speak. The most talented analysts of today, whether they call themselves data scientists or not, are not much different from those of 20 years ago. The skill sets haven’t changed much. The technologies that assist them have improved only a little. As more of us focus on the required skills and demand tools that actually work, we’ll learn to use data more effectively. In the meantime, hollow promises from vendors are nothing but a distraction.

By Emmanuel Letouzé. September 30th, 2012 at 6:43 am

Dear Steven,

Indeed healthy skepticism is always welcome. I do think that big data is and will be revolutionizing our lives the way other revolutions have (i am sure you know it has been dubbed ‘the new industrial revolution’, data ‘the new oil that needs to be refined’, etc.), but this is not the place to engage in a lengthy and complex debate over whether or not this is true or merely a hype, and, regardless, keeping a critical perspective is always a good idea. And i do agree with some of your points.

However, allow me to face the risk of having the relevance of my background questioned (cf your response(s) to Chucks’ posts above, which makes me think that your expertise as a data person is not matched by a similar mastery in the field of blogging) to make 2 specific comments:

First, no one serious working on/around big data claims that “more is better” (your words: “Big data is built on the unquestioned premise that more is better.”). It’s a bit more complicated and nuanced than that; everybody recognizes that, as i suspect you know very well.

Second, the McK study is far from being authoritative, and there have been hundreds of examples of what big data can do outside the realm of business (for example follow tornados using Twitter data, which is pretty neat). I think your exclusive focus on business and its metrics make it difficult for you to fully recognize the potential of big data to change the world as we know it. As the work of a Gary King at Harvard is showing, along with that of other leading academics at MIT (Brynjolfsson, Pentland) and Berkeley (Weigend, Varian) for example, it is in my modest opinion first and foremost in the social sciences and policymaking that big data–which is primarily a qualitative revolution–will have the greatest impact in the next decade.

What we all need, collectively, is to figure out how we can make the best of big data while reducing the risks it carries (privacy, misuse).

Emmanuel

By Stephen Few. September 30th, 2012 at 11:12 am

Emmanuel,

Statements such as Big Data is “the new industrial revolution” and “the new oil that needs to be refined” are wonderful examples of marketing speak. Statements such as these are used by marketing departments to fuel technology sales. No one wants to be left out of the latest revolution. The question remains, what makes the data of today qualitatively different from the data of yesterday? If qualitative differences actually exist, in what way do they require changes in the way that we make sense of data and benefit from its use? Stating, as you have, that Big Data “is primarily a qualitative revolution” does not make it so. Why is it that so many people who claim to understand analytics fail to exhibit analytical ability in their arguments? If it is a qualitative revolution, explain how this is so and provide evidence to validate your claim.

You said that Big Data is “a bit more complicated and nuanced” than I’ve suggested. Actually, if you’ve reviewed the literature, you know as I do that the term is extremely vague. Vendors love vague descriptions of the products and services that they provide. In the case of Big Data, they also love the fact that it is not clearly distinguishable from the past. One advantage to vendors of this strategy is their ability to claim anything good that happens as the consequence of Big Data. Perhaps these useful discoveries are merely a consequence of good data analysis, not anything new called Big Data that demands new products. You said that there are “hundreds of examples of what big data can do”, to which I respond that in fact there are indeed many examples of what data can do when it is understood and then used in meaningful ways. This is not a departure from the past. We have always attempted and at times managed to use data in this manner. The one example that you gave of using Twitter data to track tornadoes may indeed illustrate a benefit of data from a source that has only existed recently, since the advent of Twitter. Tweets are a new source of communications between people, similar to emails, instant messages, and texts. Using them to do something useful is great, but it is not a qualitative leap. The methods that we use to make sense of and use tweets as sources of information are not new. The advent of new forms of social media in the last few years has affected the way that we communicate with one another, but any data that can be gleaned from these communications to increase our understanding and consequently make better decisions is just data. What makes it useful is what we manage to do with it. Essentially, Big Data is more data, including data from a few new sources, some of which will help us and most of which will not. Big Data is not a thing; it is merely a name someone coined for our current position along a continuum of data and its potential use.

The point of my argument is that we will only benefit from data today if we do what we have needed to do all along: develop better skills of data sensemaking and better technologies to augment those skills. Little of what is being marketed today as Big Data is designed to address this essential need. Investing in Big Data technologies will only be useful if (1) they are good technologies (most are not), and (2) we focus first and mostly on developing the skills of data sensemaking. The solution resides in our brains. No investment in technology will relieve us of this fundamental fact. We are looking outward for a solution that only exists within.

You suggested that my concern with Chuck Hollis’ background — whether or not he had relevant experience — indicated that my “expertise as a data person is not matched by a similar mastery in the field of blogging.” I found that comment puzzling. Are you suggesting that it is somehow naïve in the blogosphere to question someone’s qualifications? Is that really a rule in the Blogger’s Handbook? If so, blogs cannot function as a useful forum for intelligent discussion. Questioning the credibility of the source is a fundamental principle of critical thinking. With this in mind, Emmanuel, I would like to know where you fit into the scheme of things. Do you work for a software vendor? Are you an industry analyst? Do you have any experience in analytics? Are you a student working on a dissertation on Big Data? You know who I am. Please do me the courtesy of introducing yourself.

By Visualign. September 30th, 2012 at 11:45 am

Stephen,

Interesting discussion. I share the feeling that we agree much more than we disagree on the issues of data-based insights. I see your point that approaching data sense making more intelligently has greater promise than just focusing on raw processing capabilities. That said, both are feeding off each other and giving rise to the joint big data (technology) plus data scientist (sense making) trend. We need both ends of this spectrum to advance. Just like we have seen both computer hardware (Moore’s law) as well as software advances (like touch interfaces or map-reduce distributed processing) reinforce each other.

Bill Gates said that people tend to over-estimate the short-term and under-estimate the long-term possibilities of exponential (Moore’s law) technology growth. This applies to big data just as well. Take the human genome project. When the sequencing of the human genome was completed roughly ten years ago, it didn’t immediately answer many questions around genetic diseases etc. But the exponential reduction in capacity, performance and cost of DNA sequencing brought us to a point where we now start to see businesses like 23andme.com screening individual DNA (from saliva samples) for hundreds of genetic diseases. Stepping stones toward personal medicine just like we got personal computing.

It’s never just about the technology, but always what to do with it, how and why. I also see this integrative, holistic view in Chuck Hollis’ comments, stressing not just vendor technology, but the human (data scientist education) and institutional elements (organizational proficiency). His three items of what’s new revolve around the experimental interaction of analysts with diverse and large amounts of data. The element of exploration and discovery seems key. You are right that evolution by ‘accidental’ mutation is slow. But to empower thousands of people to experiment with and learn from large data sets seems like a great environment to breed new ideas. Like ngarbis said above: “Combine these tools with the growing capabilities of individuals and organizations to do deep analytics and there you will find new streams of value being cretated”. Or look at big data competitions such as on Kaggle.com Those efforts should lead to better understanding and refinements of statistical algorithms as well as potentially new insights on the studied class of sample problems – in other words more intelligent data sense making capabilities. This would not happen without the platforms and the vendor community. That it does makes me believe that Big Data is indeed a Big Deal.

By Stephen Few. September 30th, 2012 at 12:27 pm

To Emmanuel and others who might not be familiar with the larger body of my work:

If you’ve only read this blog post about Big Data and not the larger article that I’ve written on the topic or my other work, you might have the impression that I lack faith in the usefulness of data. This is far from the truth. My entire career revolves around the belief that better uses of data can help us create a better world. Everything that I do in my work promotes this. It is because I believe in the promise of better decisions based on better understanding informed by data that I caution readers about Big Data and other initiatives that have been high-jacked by technology vendors out of self-interest. If something useful exists in the realm that goes by the name Big Data, then let’s define the term in a way that is clear and meaningful (vendors won’t do this for us) and let’s help the world focus on what’s really needed to glean these benefits. As always, vendors are confusing things. They thrive on this confusion. If Big Data is meaningful and useful, help me cut through the confusion.

Emmanuel—Like you, I believe that the greatest benefits of data, whether it be Big in a way that is qualitatively different or just an extension of the past, fall outside of the realm of business. If we can figure out how to do this, we can create a better world. We can indeed use information better than we have in the past. I believe that this will be done by focusing on human skills of data sensemaking assisted by good technology; skills that we’ve known about for years. It certainly won’t be done by focusing on the latest products from technology vendors. They don’t get it. We need to demand what’s really needed. We must point the way.

By Stephen Few. September 30th, 2012 at 1:51 pm

Visualign,

I appreciate your statement, “But to empower thousands of people to experiment with and learn from large data sets seems like a great environment to breed new ideas.” Empowering people—thousands, perhaps millions—to glean value from data is the focus of my work. The skills that will empower them, however, are essentially the same whether the data sets are large or small. The vendors that are pushing Big Data are not teaching these skills. In fact, with few exceptions, they don’t understand these skills. By pinning people’s hopes on so-called Big Data, rather than helping them to develop the skills that are needed to make sense of data, attention is being diverted from what’s needed. This is what concerns me. This is what’s happening in response to the Big Data marketing campaign. It is doing little to drive the development of essential skills or better tools. Tackling Big Data without first becoming empowered as data sensemakers is like opening Pandora’s Box without first developing the power to contain what’s within. Opening that box will indeed be a “big deal,” but perhaps not in the way that you imagine and hope.

By Scott Eaton. October 1st, 2012 at 11:34 am

Just wanted to let you know this article was partially reproduced at http://www.dashboardinsight.com/news/news-articles/big_data_big_deal.aspx.

By Tim2. October 1st, 2012 at 4:47 pm

Dear Chuck

You say:
“I would not rest too heavily on your historical expertise, though. The studies have shown that the newer breed of data science professionals are very distinct (education, mindset, intellectual orientation, etc.) from traditional data analysts.”

Can you tell me what your definition of “data analyst” and “data scientist” are?

By Emmanuel Letouzé. October 2nd, 2012 at 2:13 pm

Hi Stephen

(Sorry for misnaming you in my 1st post).

Actually, i do agree with a lot of points you make, especially that, 100%, quoting you: “I believe that the greatest benefits of data, whether it be Big in a way that is qualitatively different or just an extension of the past, fall outside of the realm of business. If we can figure out how to do this, we can create a better world. I believe that this will be done by focusing on human skills of data sensemaking assisted by good technology; skills that we’ve known about for years. It certainly won’t be done by focusing on the latest products from technology vendors. They don’t get it. We need to demand what’s really needed. We must point the way.”

What i still believe though is that big data has the potential to ‘revolutionize’ research and policymaking (which are my fields much more than business i know nothing about business) if the right steps are taken in the coming years, bearing in mind all the challenges in the way. These are all loose& general statements, but these are complex matters that take more than a post to be discussed properly.

Since you are asking, my background is in development economics and demography, i am not a computer science person, i just happen to have grown increasingly interested in the use of technology and especially the impact of big data on/in development in recent years and to be seemingly finding myself drawn more and more into big data (i am now working on a paper on big data and conflict prevention); until recently i worked as development economist for the UN Global Pulse in the Secretary-General’s executive-office, which works directly on big data, and among other things i wrote a paper on “Big Data for Development” where I/we tried to go beyond the hype and generalities to delineate the field, highlight ongoing and potential applications, point to challenges and risks, and analyze what it would take to fulfill the potential we/i see in applying big data to (economic) development problems. If you have an interest, this is the link to the paper in question: http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopment-GlobalPulseMay2012.pdf.

And yes i am also a (relatively old!) graduate student at UC Berkeley working on a dissertation that will involve big data (as applied to economic development) and work as a consultant for various organizations. This is my profile in LinkedIn: http://www.linkedin.com/pub/emmanuel-letouzé/2/92a/847

My comment about your skills as a blogger referred to what i found to be an excessively aggressive response to a comment made by another person, in which you immediately questioned that person’s credentials. I am not a blogger, but i think in general the best discussions are those that remain cordial and candid. This is simply what i meant.

Thanks,

Emmanuel

By visualign. October 4th, 2012 at 1:56 pm

Some recent content pertinent to this thread from HBR at

http://blogs.hbr.org/fox/2012/10/why-data-will-never-replace-thinking.html

It also cites Nate silver’s new book “The Signal and The Noise” (which I’m only halfway done reading).

Much of the above supports Stephen’s argument. As Nate puts it: “Before we demand more of our data, we need to demand more of ourselves.”

By Andrew. October 5th, 2012 at 8:25 am

@visualign:

Great article. My guess is those who believe that data will replace thinking have never really engaged in much thinking to begin with.

I might have to check out Nate Silver’s book.

By Stephen Few. October 5th, 2012 at 8:49 am

Emmanuel,

You strike me as the kind of person who could benefit from more data. You’re using data for worthwhile purposes and you’re developing the skills that are required to make sense of it.

The aggression that you observed in my response to Chuck Hollis has been forged through many, many years of dealing with pseudo-experts from vendor marketing departments whose sole role is to persuade organizations to open their wallets. I’ve seen the harm that they cause. I routinely work with the people who must suffer day after day under the weight and dysfunctionality of poorly designed software. When you care, as I do, about the better world that we can create through data-based understanding, you are sickened by people who milk this desire for their own benefit, often in ways that undermine our efforts.

By Asda. November 12th, 2012 at 3:09 am

Stephen by that post on October 5th, 2012 at 8:49 just hit the nail right on the head. Which was what he has been talking about all along:
its really good to have big data;
Its even better to have powerful datahandling capabilities;
You need better trainings, all round trainings to make better use of all the big-nonsense data.

By Richardt. November 15th, 2012 at 10:37 am

An article i whish i could have written. Fortunately you did so more people read it. I have heard senior execs in operational companies mention “big data”. Then you know the BI vendors succeeded.
I look forward to following your blog more closely.