To Err is Academic
Errors in scientific research are all too common, and the problem has been getting worse. We’ve been led to believe that the methods of science are self-correcting, which is true, but only if they’re understood and followed, which is seldom the case. Ignorance of robust scientific methodology varies among disciplines, but it’s hard to imagine that any discipline can do worse than the errors that I’ve encountered in the field of information visualization.
An alarming article, “Trouble at the Lab,” in the October 19, 2013 edition of The Economist provides keen insight into the breadth, depth, and causes of this problem in academic research as a whole.
Academic scientists readily acknowledge that they often get things wrong. But they also hold fast to the idea that these errors get corrected over time as other scientists try to take the work further. Evidence that many more dodgy results are published than are subsequently corrected or withdrawn calls that much-vaunted capacity for self-correction into question. There are errors in a lot more of the scientific papers being published, written about and acted on than anyone would normally suppose, or like to think.
Various factors contribute to the problem. Statistical mistakes are widespread. The peer reviewers who evaluate papers before journals commit to publishing them are much worse at spotting mistakes than they or others appreciate. Professional pressure, competition and ambition push scientists to publish more quickly than would be wise. A career structure which lays great stress on publishing copious papers exacerbates all these problems. “There is no cost to getting things wrong,” says Brian Nosek, a psychologist at the University of Virginia who has taken an interest in his discipline’s persistent errors. “The cost is not getting them published.”
Graduate students are strongly encouraged by professors to get published, in part because the professor’s name will appear on the published study, even if they’ve contributed little, and professors don’t remain employed without long and continually growing lists of publications. In the field of information visualization, most of the students who do these studies have never been trained in research methodology, and it appears that most of their professors have skipped this training as well. It might surprise you to hear that most of these students and many of the professors also lack training in the fundamental principles and practices of information visualization, which leads to naïve mistakes. This is because most information visualization programs reside in computer science departments, and most of what’s done in computer science regarding information visualization, however useful, does not qualify as scientific research and does not involve scientific methods. There are exceptions, of course, but overall the current state of information visualization research is dismal.
The peer review system is not working. Most reviewers aren’t qualified to spot the flaws that typically plague information visualization research papers. Those who are qualified are often unwilling to expose errors because they want to be liked, and definitely don’t want to set themselves up as a target for a tit-for-tat response against their own work. On several occasions when I’ve written negative reviews of published papers, friends of mine in the academic community have written to thank me privately, but have never been willing to air their concerns publicly—not once. Without a culture of constructive critique, bad research will continue to dominate our field.
Papers with fundamental flaws often live on. Some may develop a bad reputation among those in the know, who will warn colleagues. But to outsiders they will appear part of the scientific canon.
Some of the worst information visualization papers published in the last few years have become some of the most cited. If you say something (or cite something) often enough, it becomes truth. We’ve all heard how people only use 10% of their brains. This is common knowledge, but it is pure drivel. Once the media latched onto this absurd notion, the voices of concerned neuroscientists couldn’t cut through the confusion.
How do we fix this? Here are a few suggestions:
- Researchers must be trained in scientific research methods. This goes for their professors as well. Central to scientific method is a diligent attempt to disprove one’s hypotheses. Skepticism of this type is rarely practiced in information visualization research.
- Researchers must be trained in statistics. Learning to get their software to spit out a p-value is not enough. Learning what a p-value means and when it should be used is more important than learning to produce one.
- Rigid standards must be established and enforced for publication. The respected scientific journal Nature has recently established an 18-point guideline for authors. Most of the guidelines that exist for information visualization papers are meager and in many cases counter-productive. For example, giving high scores for innovation encourages researchers to prioritize novelty over usefulness and effectiveness.
- Peer reviewers must be carefully vetted to confirm that they possess the required expertise.
- Rigid guidelines must be established for the peer review process.
- Peer review should not be done anonymously. I no longer review papers for most publications because they require reviewers to remain anonymous, which I refuse to do. No one whose judgment affects the work of others should be allowed to remain anonymous. Also, anyone who accepts poorly done research for publication should be held responsible for that flawed judgment.
- Researchers should be encouraged to publish their work even when it fails to establish what they expected. The only failure in research is research done poorly. Findings that conflict with expectations are still valuable findings. Even poorly done research is valuable if the authors admit their mistakes and learn from them.
- Researchers should be encouraged to replicate the studies of others. Even in the “hard sciences,” most published research cannot be successfully replicated. One of the primary self-correcting practices of science is replication. How many information visualization papers that attempt to replicate research done by others have you seen? I’ve seen none.
I’m sure that other suggestions belong on this list, but these of the ones that come to mind immediately. Many leaders in the information visualization community have for years discussed the question, “Is data visualization science?” My position is that it could be and it should be, but it won’t be until we begin to enforce scientific standards. It isn’t easy to whip a sloppy, fledgling discipline into shape and you won’t win a popularity contest by trying, but the potential of information visualization is too great to waste.
Take care,
12 Comments on “To Err is Academic”
I’m curious what “Some of the worst information visualization papers published” are?
Nick,
Some of the most cited papers of the last few years include “Useful Junk? The Effects of Visual Embellishment on Comprehension and Memorability of Charts” by Scott Bateman, et al., “Benefitting InfoVis with Visual Difficulties” by Jessica Hullman, et al., and “What Makes a Visualization Memorable?†by Michelle Borkin, et al. Despite their popularity, these research studies were poorly designed, resulting in erroneous findings. You can find my reviews of these studies on this site. I have no doubt that many other papers are worse, but they have not garnered as much undeserved and potentially harmful attention.
Very good points, but doesn’t the whole problem lie it the peer review structure, which not only has the faults you mention but is often used the control the discourse and enhance the career of the reviewer? I know a number of prominent UK physical scientists who, having had original work first delayed then rejected by reviewers who used the delay to steal and publish the work as their own, no longer submit papers to US journals.
Better to self-publish on-line, with the original data, and allow any qualified reviewer (presumably self defined) to contribute to the debate. This would muck up some aproaches to academic career pathways, buyt would lead to much better science I believe.
Meic,
No, the whole problem does not reside in the peer review structure. This is but one of several problems. It is definitely true, however, that many journals that academics are forced to work with “muck up” the works considerably. But self-publishing, which is how I choose to present my own work, won’t fix the system. Many academic publishers have no interest in the research and its quality, but are revenue driven and locked into archaic publishing models that treat authors like indentured servants. The problem must be attacked on several fronts.
I agree with the items you’ve listed above. However, I’d add consumers must also become researchers themselves. For example, even after a study is published, we should expect our news organizations who report on the study to offer more criticism than just “correlation doesn’t mean causation.” The numerical part of statistics may be hard and advanced for some, but if newspapers can explain complex economic issues, I have faith they can present effect-sizes in an uncomplicated way. At the same time, when we can’t rely on the media to do this critical analysis for us. We must also arm ourselves with enough knowledge to read and evaluate academic journal articles (and the journals themselves for that matter) on our own.
There appears to be a growing backlash against “pay journals” that masquerade as havens of open-access but are obvious fronts for publishing mills. It’s a good start. In addition, a friend of mine also sent me this article this morning which appears relevant:
http://www.nature.com/news/policy-twenty-tips-for-interpreting-scientific-claims-1.14183
Looks like the big journals are catching on to the problem.
Whilst the current culture of science may need to change and re-evaluate the fundamentals of the way scientific enquiry is conducted, isn’t the problem of the next generation of scientists/researchers being overlooked, science isn’t a complete mystery, most of these failings have been known for centuries, don’t we also need to change the way science and its fundamental art is taught? The data on scientific failure is already out there.
Sean,
In my experience, science is sometimes taught well and sometimes taught poorly, depending mostly on the skills of the teacher. In your opinion, is there a fundamental problem with the way that science is usually taught today? If so, what is that problem? Also, what is science’s “fundamental art?”
The problems with peer review are known. A large chunk of the issue comes down to faculty being forced to produce pointless research that no one has the time to review properly (they’re too busy doing pointless research). In the old days people researched (mostly) pointless things in a quest for truth. Now it’s to demonstrate their productivity. Academics might sometimes be closed/narrow-minded, but they’re generally not daft — they don’t miss the mistakes because they’re incapable, they miss them because they haven’t afforded the time (for whatever reason). Rigour is notably higher in more reputable journels (mistakes are made but are spotted and corrected).
Information visualisation is really only tickling the toes of the academic community though, despite advancements in related fields such as human/computer interface design, ergonomics and all that. For many, info visualisation has the same validity as ‘interior decoration’ — makes things look pretty, which is nice for some, but lacks the hallmarks of an academically worthy subject.
I’ve still never seen anything that shows that an improved visualisation results in better outcomes. E.g. scientific papers with better graphics are cited more, businesses with better dashboard analytics outperform their competitors, teacher with the winning student engagement dashboard get greater added value… From my perspective, what you count is much more important than how you present it (within reason), but information visualisation is important for extracting the most out of poorer data.
Ad point 8: there are many papers that try to replicate / redo research done by others.
Some of them:
– “Bertin was Right: An Empirical Evaluation of Indexing to Compare Multivariate Time-Series Data Using Line Plots” (by Wolfgang Aigner et al., CGF 2010)
http://www.cvast.tuwien.ac.at/node/30
– “An Empirical Model of Slope Ratio Comparisons” (by Justin Talbot et al., InfoVis 2012)
http://vis.stanford.edu/papers/slope-ratio-comparison
Ad point 7: there are increasingly more papers reporting negative results, such as:
“Assessing the Effect of Visualizations on Bayesian Reasoning Through Crowdsourcing” (by Luana Micallef et al., InfoVis 2012)
http://hal.inria.fr/docs/00/71/75/03/PDF/infovis12_LM.pdf
Bilal,
These three papers are encouraging exceptions to the norm. Notice, however, that the two papers that sought to replicate research responded to work by Bertin and Cleveland, which was done many years ago. While this is of great value, we should also be attempting to replicate recent work that should be challenged while it’s fresh, before it takes on the authority of gospel through hundreds of citations. Regarding the one study that reported negative results, which I applaud, do you have reason to believe that this represents a trend?
Stephen,
there is also some work on examining popular yet recent techniques, identifying weaknesses and cases where they fail, and proposing solutions:
– Common Angle Plots as Perception-True Visualizations of Categorical Associations (Hofmann & Vendettuoli, InfoVis 2013)
http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6634157
Here, Heike and Marie identify a perceptual problem with Parallel Sets (Bendix, Kosara, Hauser, InfoVis 2005) and propose a different layout of the stripes.
They support their results with a comparative evaluation.
– Selecting the Aspect Ratio of a Scatter Plot Based on Its Delaunay Triangulation
Authors (Fink et al, InfoVis 2013)
http://www1.pub.informatik.uni-wuerzburg.de/pub/fink/paper/fhsw-sarsp-InfoVis13.pdf
Here the authors identify cases where a previous technique for the same problem does not work (Talbot et al, InfoVis 2011), and propose an alternative method to compute the aspect ratio.
If you scroll to the supplementary materials, you find evaluating results comparing this method with Talbot’s 2011 method, even reusing the same data set.
They also compare with an older method based on Celeveland banking to 45°.
I definitely agree we need to replicate and re-visit more and more expiremental results of recent (and old) work before weak results become a gospel. My point is that, it is not all bad as some readers might think by reading your post.
I also agree to a far extent that there is a bias against negative results (kind of, if your technique / expirement failed, nobody will like to read your paper).
Yet, compared with the early years, many authors now clealy acknowledge the limitations of their techniques, or report evaluation results showing that other techniques do better in one or many of the tasks they evaluated.
Bilal,
It is definitely not all bad. As I stated in my original blog post, there are exceptions. If this were not true, I wouldn’t stay in touch with the research, nor would I bother to critique the work. The problem is that too little of the work is well done and worthwhile. This can change, however, and I’ve suggested ways to help this happen. The infovis research community must raise its standards. The best work should be featured and the worst should be denied access to conferences and journals. As it is, some of the worst is being featured; sometimes in the form of achievement awards. Rather than being concerned that people will read my critique and mistakenly conclude that infovis research and development is all bad, consider the confusion that is created when a conference such as VisWeek features mostly mediocre work, with a few good papers, and a few horrible papers, without discrimination. What’s being done to correct this? My answer to this question is “Too little.”