Display New Daily Cases of COVID-19 with Care

Statistics are playing a major role during the COVID-19 pandemic. The ways that we collect, analyze, and report them, greatly influences the degree to which they inform a meaningful response. An article in the Investor’s Business Daily titled “Dow Jones Futures Jump As Virus Cases Slow; Why This Stock Market Rally Is More Dangerous Than The Coronavirus Market Crash” (April 6, 2020, by Ed Carson) brought this concern to mind when I read the following table of numbers and the accompanying commentary:

U.S. coronavirus cases jumped 25,316 on Sunday [April 5th] to 336,673, with new cases declining from Saturday’s record 34,196. It was the first drop since March 21.

The purpose of the Investor’s Business Daily article was to examine how the pandemic was affecting the stock market. After the decline in the number of reported new COVID-19 cases on Sunday, April 5th, on Monday, April 6, 2020, the stock market surged (Dow Jones gained 1,627.46 points, or 7.73%). This was perhaps a response to hope that the pandemic was easing. This brings a question to mind. Can we trust this apparent decline as a sign that the pandemic has turned the corner in the United States? I wish we could, but we dare not, for several reasons. The purpose of this blog post is not to critique the news article and certainly not to point out the inappropriateness of this data’s effects on the stock market, but merely to argue that we should not read too much into the daily ups and downs of newly reported COVID-19 case counts.

How accurate should we consider daily new case counts based on the date when those counts are recorded? Not at all accurate and of limited relevance. I’ll explain, but first let me show you the data displayed graphically. Because the article did not identify its data source, I chose to base the graph below on official CDC data, so the numbers are a little different. I also chose to begin the period with March 1st rather than 2nd, which seems more natural.

What feature most catches your eye? For most of us, I suspect, it is the steep increase in new cases on April 3rd, followed by a seemingly significant decline on April 4th and 5th.

A seemingly significant rise or fall in new cases on any single day, however, is not a clear sign that something significant has occurred. Most day-to-day volatility in reported new case counts is noise—it’s influenced by several factors other than actual new infections that developed. There is a great deal of difference between the actual number of new infections and the number of new infections that were reported as well as a significant difference between the date on which infections began and the date on which they were reported. We currently have no means to count the number of infections that occurred, and even if we tested everyone for the virus’s antibodies at some point, we would still have no way of knowing the date on which those infections began. Reported new COVID-19 cases is a proxy for the measure that concerns us.

Given the fact that reported new cases is probably the best proxy that’s currently available to us, we could remove much of the noise related to the specific date on which infections began by expressing new case counts as a moving average. A moving average would provide us with a better overview of the pandemic’s trajectory. Here’s the same data as above, this time expressed as a 5-day moving average. With a 5-day moving average the new case count for any particular day is averaged along with the four preceding days (i.e., five-days-worth of new case counts are averaged together), which smooths away most of the daily volatility.

While it still looks as if the new case count is beginning to increase at a lesser rate near the end of this period, this trend no longer appears as dramatic.

Daily volatility in reported new case counts is caused by many factors. We know that the number of new cases that are reported on any particular day do not accurately reflect the number of new infections. It’s likely that most people who have been infected have never been tested. Two prominent reasons for this are 1) the fact that most cases are mild to moderate and therefore never involve the medical intervention, and 2) the fact that many people who would like to be tested cannot because tests are still not readily available. Of those who are tested and found to have the virus, not all of those cases are recorded or, if recorded, are forwarded to an official national database. And finally, of those new cases that are recorded and do make it into an official national data base, the dates on which they are recorded are not the dates on which the infections actually occurred. Several factors determine the specific day on which cases are recorded, including the following:

  1. When the patient chooses or is able to visit a medical facility.
  2. The availability of medical staff to collect the sample. Staff might not be available on particular days.
  3. The availability of lab staff to perform the test. The sample might sit in a queue for days.
  4. The speed at which the test can be completed. Some tests can be completed in a single day and some take several days.
  5. When medical staff has the time to record the case.
  6. When medical staff gets around to forwarding the new case record to an official national database.

There’s a lot that must come together for a new case to be counted and to be counted on a particular day. As the pandemic continues, this challenge will likely increase because, as medical professionals become increasingly overtaxed, both delays in testing and errors in reporting the results will no doubt increase to a corresponding degree.

Now, back to my warning that we shouldn’t read too much into daily case counts as events are unfolding. Here’s the same daily values as before with one additional day, April 6th, included at the end.

Now what catches your eye. It’s different, isn’t it? As it turns out, by waiting one day we can see that reported new cases did not peek on April 3rd followed by a clear turnaround. New cases are still on the rise. Here’s the same data expressed as a 5-day moving average:

The trajectory is still heading upwards at the end of this period. We can all hope that expert projections that the curve will flatten out in the next few days will come to pass, but we should not draw that conclusion from the newly reported case count for any particular day. The statistical models that we’re using are just educated guesses based on approximate data. The true trajectory of this pandemic will only be known in retrospect, if ever, not in advance. Patience in interpreting the data will be rewarded with greater understanding, and ultimately, that will serve our needs better than hasty conclusions.

4 Comments on “Display New Daily Cases of COVID-19 with Care”


By Nilay. April 10th, 2020 at 5:41 pm

Very insightful….What are your thoughts on Predictive models that circulating around? Should we use more context on the peaks – May be show what percent stays home for certain peaks to achieve and how it would change if more people decided to go out and meet others?

By Stephen Few. April 11th, 2020 at 8:05 am

Nilay,

I’ve paid little attention to the COVID-10 predictive models that have received media attention. It would be difficult to evaluate the models because they aren’t made available for public scrutiny. Typically, only the predictions are shared, not the models themselves. There is no way to evaluate a model that lives in a black box.

Context is always important when making sense of and communicating data. It plays a significant role in a predictive model, in part because it determines the variables that the model must take into account.

By Edouard. August 18th, 2020 at 1:20 am

Hi Stephen,

Thank you for this post. How did you chose your moving average ? Why 5 days ? Why not 3 or 7 ? Would you generally pick 5 as a standard good moving average, or would you change it based on the time interval you look at (eg for months, would you still use a 5 month moving average ? Or would 3 make more sense in that case – I am thinking about the 3 months in a quarter in that case). Do you have any guidance you could share on which moving average to use to smooth out variations ?

Thank you for your insights.

By Stephen Few. August 18th, 2020 at 8:08 am

Hi Edouard,

Whenever I choose the number of periods that are included in a moving average, I strive for a compromise between too few, which will fail to eliminate noisy volatility, and too many, which will eliminate too much meaningful detail. There is no magical number. With daily data, either a 5-day or 7-day moving average is typical. A 7-day moving average is especially useful when you want to eliminate meaningless volatility across the days of an entire week. In this particular case, I chose a 5-day moving average merely because many responsible providers of COVID-19 data were using it at the time. If I wrote this blog post today, I would probably choose a 7-day moving average, because that has now become more typical. Either works fine.

Similarly, with monthly data, if you want to eliminate volatility across the months of a quarter, a 3-month moving average would work fine. If you want to remove volatility across the months of a year, however, a 12-month moving average might be appropriate, assuming that it wouldn’t eliminate useful detail.

Leave a Reply