Modern science relies heavily on an approach to the assessment of uncertainty that is too narrow. Scientists rely on statistical measures of significance to establish the merits of their findings, often without fully understanding the limitations of those statistics and the original intentions for their use. P-values and even confidence intervals are cited as stamps of approval for studies that are meaningless and of no real value. Researchers strive to reach significance thresholds as if that were the goal, rather than the addition of useful knowledge. In his book Willful Ignorance: The Mismeasure of Uncertainty, Herbert I. Weisberg, PhD, describes this impediment to science and suggests solutions.
This book is for researchers who are dissatisfied with the way that probability theory is being applied to science, especially those who work in the social sciences. Weisberg describes the situation as follows:
To achieve an illusory pseudo-certainty, we dutifully perform the ritual of computing a significance level or confidence interval, having forgotten the original purposes and assumptions underlying such techniques. This “technology” for interpreting evidence and generating conclusions has come to replace expert judgment to a large extent. Scientists no longer trust their own intuition and judgment enough to risk modest failure in the quest for great success. As a result, we are raising a generation of young researchers who are highly adept technically but have, in many cases, forgotten how to think for themselves.
In science, we strive for greater certainty. Probability is a measure of certainty. But what do we mean by certainty? What we experience as uncertainty arises from two distinct sources: doubt and ambiguity. “Probability in our modern mathematical sense is concerned exclusively with the doubt component of uncertainty.” We measure it quantitatively along a scale from 0 for complete uncertainty to 1 for complete certainty. Statistical measures of probability do not address ambiguity. “Ambiguity pertains generally to the clarity with which the situation of interest is being conceptualized.” Ambiguity—a state of confusion, of simply not knowing—does not lend itself as well as doubt to quantitative measure. It is essentially qualitative. When we design scientific studies, we usually strive to decrease ambiguity through various controls (selecting a homogeneous group, randomizing samples, limiting the number of variables, etc.), but this form of reductionism distances the objects of study from the real world in which they operate. Efforts to decrease ambiguity require judgments, which require expertise regarding the object of study that scientists often lack.
A chasm exists in modern science between researchers, who focus on quantitative measures of doubt and practitioners who rely on qualitative judgments to do their work. This is clearly seen in the world of medicine, with research scientists on one hand and clinicians on the other. “We have become so reliant on our probability-based technology that we have failed to develop methods for validation that can inform us about what really works and, equally important, why.” Uncertainty reduction in science requires a collaboration between these artificially disconnected perspectives.
Our current methodological orthodoxy plays a major role in deepening the division between scientific researchers and clinical practitioners. Prior to the Industrial Age, research and practice were more closely tied together. Scientific investigation was generally motivated more directly by practical problems and conducted by individuals involved in solving them. As scientific research became more specialized and professionalized, the perspectives of researchers and clinicians began to diverge. In particular, their respective relationships to data and knowledge have become quite different.
As I’ve said through various critiques of research studies and discussions with researchers, this chasm between researchers and expert practitioners is especially wide in the field of information visualization and seems to be getting wider.
To make his case, Weisberg takes his readers through the development of probability theory from its beginnings. He does this in great detail, so be forewarned that this assumes an interest in the history of probability. In fact, this history is quite interesting, but it does make up the bulk of the book. It is necessary, however, to help the reader understand the somewhat arbitrary way in which statistical probability was conceptualized in the context of games of chance, as well as the limitations of that particular framing. Within this conceptual perspective, specific statistics such as correlation coefficients and P-values were developed for specific purposes that should be understood.
In the conduct of scientific research, we have the choice of a half-empty or half-full perspective. We must judge whether we really do understand what is going on to some useful extent, or must defer to quantitative empirical evidence. Statistical reasoning seems completely objective, but can blind us to nuances and subtleties. In the past, the problem was to teach people, especially scientists and clinicians, to apply critical skepticism to their intuitions and judgments. Thinking statistically has been an essential corrective to widespread naiveté and quackery. However, in many fields of endeavor, the volumes of potentially relevant data are growing exponentially…Unfortunately, the capacities for critical judgment and deep insight we need may be starting to atrophy, just as opportunities to apply them more productively are increasing.
Don’t assume that Weisberg wants to dismantle the mechanisms of modern science. Instead, he wants to augment them to advance knowledge more effectively.
Is there a way to avoid the regression of science? The answer is surprisingly simple, in principle. We must recognize that probability theory alone is insufficient to establish scientific validity. There is only one foolproof way to learn whether an observed finding, however statistically significant it may appear, might actually hold up in practice. We must dust off the time-honored principle of replication as the touchstone of validity…Only when the system demands and rewards independent replications of study findings can and should public confidence in the integrity of the scientific enterprise be restored.
In addition to study replication, Weisberg also strongly advocates a merging of the perspectives and skills of researchers and practitioners.
Theoretical knowledge and insight can often be helpful in focusing attention or promoting attention on a promising subject of variables. Understanding causal processes will often improve the chances of success, and of identifying factors that are interpretable by clinicians. Clinical insight applied to individual cases will depend on understanding causal mechanisms, not just blind acceptance of black-box statistical models.
Weisberg goes on to suggest ways in which current computer technologies and rapidly expanding data collections create new opportunities for the conduct of science, in many respects similar to Ben Shneiderman’s vision of Science 2.0. Opportunities abound, but they will remain untapped if we fail to correct glaring flaws in our current approach to scientific research. Weisberg knows that this won’t be easy, but he exhibits a balance between concern for systemic dysfunction and optimism for progress. Even more, he offers specific suggestions for setting this progress in motion.
This is a marvelous book—well-written and the product of exceptional thinking. If the role of statistics in research does not interest or concern you, don’t buy this book, for you won’t stick with it. If you share my concerns, however, that science must be renovated and augmented to address the challenges of today and that our understanding and use of probability theory is central to this effort, this book is worth your time.