Data Scientists Do Scale

I recently read an article by Stuart Frankel that appeared on titled “Data Scientists Don’t Scale” (May 2, 2015). It began by pointing out that the recent “reverence” for Big Data and Data Science “has disillusioned many of us” and is “about to get a reality check.” I share this perspective, but not for the author’s reasons and I definitely don’t agree with the author’s solution. Frankel is the CEO of Narrative Science, “a company working on advanced natural language generation for the enterprise.” Given his role, you won’t be surprised to learn that he promotes the use of natural language processing (NLP) and artificial intelligence (AI) as the scalable alternatives to skilled data analysts that are needed. (Why does the Harvard Business Review provide a platform for biased, self-serving advertising from vendors that pose as informative articles?) Frankel argues that, unlike computers, data scientists don’t scale, but his premise is flawed. Not only do humans scale, they do so in much the same way as computers.

When you need more computing power, you have three potential choices:

  1. Replace the computer that you have with one that’s more powerful
  2. Add more computers
  3. Upgrade the computer that you have to make it more powerful

When you need more human power, what are your choices?

  1. Replace the employee that you have with one who’s more productive
  2. Add more people
  3. Help your employee upgrade his skills to make him more productive

We humans are scalable. In fact, although we are scalable in somewhat different ways than computers, humans are more scalable than computers in some fundamental and important ways.

Many organizations—perhaps most—find it easier to invest in technologies than to invest in people. Investing in technologies is only a good investment, however, when the technologies are good and they can do the job better and less expensively than people. Data sensemaking (analytics, business intelligence, data science, statistics, etc.) is not one of those cases. This is because of the fundamental human ability that makes us much more scalable than computers: we can think. When the cognitive revolution began around 70,000 years ago with the extension of language into the realm of abstract thinking, homo sapiens became the most scalable creature on the planet. If computers could think and feel, they would envy us. Despite our many flaws, which frequently get us into trouble and might eventually lead to our demise, we are gifted in ways and to a degree that our inventions cannot duplicate—not even close. This is easy to forget at a time when the flawed technologies that we’ve created are indiscriminately revered without question. Those of us who know technologies well understand their limitations and therefore seldom succumb to this absurdity. Technologists who promote this reverence usually have something to gain.

If we fail to make wise use of data to create a better future, it will not be the fault of our technologies. We’ll have no one to blame but ourselves. If we allow technologies to do our thinking for us (i.e., execute programs as an imitation of thinking), we’ll lose the ability to think for ourselves. If this happens, not much else will matter.

Take care,


4 Comments on “Data Scientists Do Scale”

By Dale Lehman. June 19th, 2015 at 4:33 am

I agree with all your sentiments here, but have two points to make. First, in answer to your question about why HBR provides a platform for vendors’ biased and self serving views – that is because HBR itself is a platform for Harvard’s faculty biased and self-serving views. They are just doing what they always do.

I think your parallel options above rule out what Frankel’s option is. He would have a fourth option to upgrading human power – to replace it with computer power. While your cautions about doing this are well-taken, I believe Frankel has a point. There is a distribution of human talent and it may well be better to replace the bottom of that distribution with computing power (which is more uniform than human ability). Take an extreme case: physicians. Physicians are not all equally capable and an “intelligent” system may well be better than the bottom of the physician quality distribution – and more efficient than trying to upgrade the skills of those people. Of course, the top of the physician distribution cannot be replaced (and, in fact, will be more valuable since someone – a human – will be required to help design the intelligent systems that will replace the people). I think this is what has been happening over the past two decades and a primary reason why the middle class is being hollowed out. Computers can take the place of people who were not very good at their jobs. They cannot replace the talented humans whose skills are needed as you describe above. But my point is that not all humans are equally capable, and it is this fact that allows the substitution of computing power for human power to make sense up to a point.

By Stephen Few. June 19th, 2015 at 9:28 am


Thanks for the thoughtful response. I’ll try to be equally thoughtful in my response to you. Regarding your fourth option for “scaling” human productivity, replacing humans with computers is at times an option, but it is not a scaling option, which is why I didn’t include it on the list. Similarly, replacing computers with humans to boost productivity or quality is sometimes as option, but not a scaling option.

We create tools — technologies — for two potential reasons: 1) to augment human ability, and 2) to replace human ability. A lathe augments the ability of a skilled carpenter to shape wood. A robot that shapes wood replaces the carpenter. We use technologies when doing so provides a benefit. If you want a chair, you may choose to purchase one that was made primarily by machines rather than a carpenter because the savings in cost outweigh the benefits to you of the superior craftsmanship.

Work that requires thinking — a human ability that computers lack — can often be augmented by computers in beneficial ways. When data analysis software enables us to visualize data on the screen — something that most of cannot do in our heads — it is augmenting our ability to think visually. When data analysis software places a great deal of data on the screen at once for purposes of comparison or pattern detection, it is augmenting our working memory, which can only hold three or four chunks of visual information at once. On the other hand, when data analysis software performs mathematical calculations, we are allowing it to do something that a computer does far better than we do: fast calculations. This is a replacement of human activity that we willingly embrace because the computer’s ability to perform this task is superior to ours.

We must understand what humans do well and what computers do well to know where to draw the lines, which is something that many technology vendors don’t understand and are not inclined to understand because they merely want to sell their technologies. It is important that we use technologies wisely, always considering the costs and benefits, not just for the immediate task but for the long-term effects as well. Early in my career I assisted in implementing an automated voice-response system that replaced human phone operators. It seemed obvious at the time that the technology would provide great cost reductions. I lacked the foresight, however, to anticipate the way that customers would be affected by having a mindless machine direct their calls. Countless times over the years I’ve regretted this as I’ve been routed by stupid machines into a black hole. I believe that Frankel and others like him who create and sell technologies are not drawing the line between the abilities of humans and computers effectively, nor are they thinking about the long-term implications of their technologies.

I agree with you that there is a distribution of human talent and that technologies have been increasingly used in recent years to replace people at the lower end of this distribution. I believe, however, that this is often a harmful, shortsighted decision. This is definitely the case when we use technologies to do tasks that require human abilities. When faced with these situations, helping humans scale their abilities by developing the necessary skills is usually a better solution. Using computers as an inferior substitute for human reasoning and judgment, which is what we do when we use them to do data analysis, is not nearly as productive as it would be to help humans develop data analysis skills.

Your physician example concerns me. The work of a doctor, similar to that of a data analyst, requires human reasoning and judgment. I fear a future that has replaced many doctors with machines. In that future, only the wealthy would benefit from the superior work of human doctors. The rest of us would stand in long lines waiting to be treated by machines. It is true, however, that computers have a great role to play in medicine. For example, expert systems can augment human reasoning and judgment by referencing massive diagnostic databases, providing doctors with information, including suggestions, to consider. These systems are not qualified, however, to replace doctors, who bring much to the process of healing that machines cannot provide. Rather than replacing doctors at the low end of the skill distribution with machines, we should scale humans for the task by replacing those who aren’t cut out for medicine with better candidates and by investing in the training of doctors whose skills can be improved.

You wrote, “Computers can take the place of people who were not very good at their jobs.” This is only true if the job is one that computers can do as well or better than humans or if computers do it less well but we are willing to sacrifice quality for reduce costs, as we do when we buy that chair that was manufactured by a machine. If the job can only be adequately done by humans, we should be investing in humans to do the work. Technologies have become our default solution, which has led to many poor results that go unnoticed. When misapplied, technologies can do harm. They can produce a future in which our lives are less fulfilling, less happy, and less healthy than they are today.

By Mike C. June 19th, 2015 at 2:49 pm

I think “scale” here is a loaded word as it is used in this context. When the author says “scale” he doesn’t mean you can’t add more, he likely means he can’t add more at incrementally lower costs. Technology people are used to a Moore’s Law view of the world where everything becomes rapidly cheaper very quickly. Data analysis doesn’t “play by the rules” of Moore’s Law (there are not twice as many capable data analysts being created every 18 months) and so they view this skill as something that doesn’t scale and therefore needs to be automated.

But for those of us who have been around for any length of time in this industry will attest to, this is a skill that defies automation as it is traditionally considered. I have far more powerful tools on my desktop than I have ever had before, but the core of what I do has not been automated at all. Answering questions such as:

1) What is in this data that I care about?
2) What is this data telling me about my organization?
3) What do I need to do differently because of this data?

is no easier or more difficult than it ever was. It requires a combination of industry/business knowledge and data analysis techniques to even know how to begin to address the issue.

An interesting parallel to automating data analysis is the attempts that were made to automate the software development industry itself. Way back in the 80s there was a sense that programmers (mostly COBOL) “didn’t scale”. Businesses were frustrated by their inability to add new products, enter new markets (etc.) because the systems needed to do those things required developers and there were not enough of them. So to solve the problem the leading-edge companies of the day proposed tools that would code for you (known by various names but usually CASE tools). You simply input some rules into boxes and automation would create your system for you in a fraction of the time. Dozens of vendors jumped into the market and millions of dollars were spent by companies on the silver bullet solution of the day. Nearly every implementation of the technology (at least that I was aware of) was short-lived and either failed completely or best case never lived up to the hype. The industry pretty much collapsed under its own overselling. (To be replaced with the next hyped solution of off-shore outsourcing, but that’s another story) Despite this, software development today is better, easier, and more productive than it has ever been. Why? Not because someone successfully automated coding (which requires very high level thinking), but rather because the tools, languages, and systems around coding have become incrementally better in the last 25 years. This is what data analysts need as well.

I love better tools. I love faster processing. Those in the technology industry would be far better off spending their time helping the data analysis community with these improvements than trying to automate away the role. The answer to the scaling problem is not to automate the task, but rather to increase the number of those capable of doing it. Unfortunately that approach will not attract much VC money. :-)

By Stephen Few. June 19th, 2015 at 3:08 pm

Well said, Mike. I remember CASE tools as well. That’s an excellent illustration of the problem. It pays to remember the past.

Leave a Reply