Do We Really Need More Data?

The notion that “we need more data” seems to have always served as a fundamental assumption and driver of the data warehousing and business intelligence industries. It is true that a missing piece of information can at times make the difference between a good or bad decision, but there is another truth that we must take more seriously today: most poor decisions are caused by lack of understanding, not lack of data. The way that data warehousing and business intelligence resources are typically allocated fails to reflect this fact. The more and faster emphasis of these efforts must shift to smarter and more effective. Although current efforts to build bigger and faster data repositories and better production reporting systems should continue, they should take a back seat to efforts to increase the data sense-making skills of workers and to improve the tools that support these skills.

Even in the sensitive arena of intelligence analysis, where decisions can preserve or end lives and information is often spotty, it is much more important to teach analysts effective skills and give them the best sense-making tools than it is to give them more data. Former CIA analyst Richards J. Heuer, Jr., argues the following in his book Psychology of Intelligence Analysis (1999):

The difficulties associated with intelligence analysis are often attributed to the inadequacy of available information. Thus the US Intelligence Community invests heavily in improved intelligence collection systems while managers of analysis lament the comparatively small sums devoted to enhancing analytical resources, improving analytical methods, or gaining better understanding of the cognitive processes involved in making analytical judgments. (p. 51)

This lack of appropriate funding exists no less and probably a great deal more in the corporate world as well. Heuer cites research findings that additional information rarely improves the accuracy of analyst’s judgments. What really matters is the quality of the mental model that analysts use—the conceptual frameworks that we bring to the process of data sense-making. Additional information only improves the accuracy of analytical judgments when it helps the analyst correct and improve his or her mental model. Heuer writes:

The accuracy of an analyst’s judgment depends upon both the accuracy of our mental model…and the accuracy of the values attributed to key variables in the model…Additional detail on variables already in the analyst’s mental model and information on other variables that do not in fact have a significant influence on our judgment…have negligible impact on accuracy, but form the bulk of the raw material analysts work with. (p. 59)

Unfortunately, even the most expert among us rarely understands their own mental models.

Experts overestimate the importance of factors that have only a minor impact on their judgments and underestimate the extent to which their decisions are based on a few variables. In short, people’s mental models are simpler than they think, and the analyst is typically unaware not only of which variables should have the greatest influence, but also which variables actually are having the greatest influence. (p. 56)

Researchers, especially those who work in the cognitive sciences, have learned a great deal about the way people process information and make decisions, including the flaws in the process that often trip us up. Proper training based on these insights is needed to make us better analysts; good tools are needed to help us work around analytical limitations that are built right into our brains. It is toward these ends that the bulk of our data warehousing and business intelligence investments should be directed. Is this where you’re focusing your efforts? Is this even on your radar?

Take care,

8 Comments on “Do We Really Need More Data?”


By Barrett. March 30th, 2009 at 1:57 pm

I would be curious to hear your thoughts on an approach like this to data interpretation, http://vimeo.com/3136826. Since we are visual creatures anyway, it seems natural to make our tools visual so we can interpret data more intelligently and effectively.

By Martin Andrews. March 30th, 2009 at 2:14 pm

I could not agree more, with particular emphasis on the need for training. I am lucky to be at a company where data warehouse development has focused on what is most important and not how can we stuff as much data as possible.

Our warehouse has reached a respectable level of maturity, but is still very much under development. Interestingly, there is now a delineation emerging between those who understand how to use the data, and those who do not. We have analysis and “business intelligence” tools, but there again, there are those who know how to wield them and those who can but swing the tools around haphazardly hoping for a result that supports their theory.

BI tools are much overrated. We paid a high price for ours–and I’m not proud of that as it is little more than a heavyweight query GUI, providing nothing approaching “intelligence” in and of itself. The success–or failure–of BI tools all comes back to who is at the controls.

By Stephen Few. March 30th, 2009 at 3:29 pm

Barrett,

Your example, which explores various ways that categorical and quantitative information about products could be graphically encoded to improve the shopping experience, etc., is certainly a legitimate use of data visualization. You must be careful, however, to understand and obey the rules of visual perception when you design such encodings. Too many simultaneous encodings, depending on the ones that you choose, can result in visual clutter that would undermine the shopping experience. Some encodings can be perceived at a glance preattentively, but not all those illustrated in your video can. Also, I believe that at least one of your encodings–relative weight based on the degree to which the surface is bent under the books–is difficult to perceive and use for comparisons. If you haven’t already read them, I highly recommend Colin Ware’s two books “Visual Thinking for Design” and “Information Visualization: Perception for Design.” Ware does a great job of describing the various visual attributes that can be used for presenting data, how to use them, and which work best for various purposes.

By Barrett. March 31st, 2009 at 5:39 pm

Excellent, thank you for the recommendations. I’ll be ordering his books.

By James Taylor. April 4th, 2009 at 1:25 pm

Stephen
I see this problem regularly, albeit from a slightly different perspective. I often come across organizations that insist they need to gather more data, clean it more thoroughly or integrate it more tightly before they can do anything else. Yet they have not thought about the decisions they are trying to make with that data. When they do, they sometimes realize that they have all the data they need for that decision or that cleaning the data better won’t improve the decision making. Often they discover that the data they need to improve the decision is not the data they were going to collect or clean or integrate. Sometimes what is needed to make better decisions is not even data, but a better understanding of regulations or policies.
This “data-first” attitude does not help organizations – unless they understand the decisions they are making, how they are making them and, as you say, what their mental models/unspoken rules might be they will never improve performance.
Thanks as always for an interesting post
JT

By Tony. April 4th, 2009 at 10:16 pm

Why don’t we admit good analysis is difficult for most of us. Good decision making is difficult for most of us.

No matter what data we come up with or what tools we devise, good analysis and good decision making will always be difficult for most of us.

By Michael E. Driscoll. April 5th, 2009 at 2:17 am

I would agree that people vastly overestimate the importance of data. Peter Norvig and colleagues at Google recently published an article entitled “The Unreasonable Effectiveness of Data.” ( http://bit.ly/vkJw ). This may be true when measuring spoken or written language, where the data (text) is a high fidelity representation of the subject (language).

But most business processes are far more opaque. The failure of financial models in the mortgage and insurance sectors evinces how more data is not an antidote to flawed models, and may even their catalyst.

In my experience, the lack of data has never been the bottleneck of insight in any organization — academic, governmental, or corporate.

Data is a complement, not a substitute, for developing mental models of complex systems. If used blindly, it can yield unchecked conclusions and disastrous decisions. But if used intelligently, it can be the source of competitive advantage for firms.

By Abhishek Toraskar. May 15th, 2009 at 12:36 am

Exactly my thoughts. I wrote an article called “More the data, less the business intelligence”.

It is simple fact of life that for any effective decision making, you really need limited data points. You gather these as accurately and effectively as possible, but everything else, YOU BASICALLY IGNORE.

However, organizations tend to ignore this completely and tend to want everything. Probably, it’s just another excuse for the managers to say,”Look there is enough data available in the system for us to make meaningful decision.” Pity.

http://abhishrek.blogspot.com/2009/05/more-data-less-business-intelligence.html