Badly presented statistics cost lives. The stakes have never been higher: over the past year, the careless misrepresentation of data has brewed uncertainty, conspiracy, and the deaths of loved ones. And so, in our quest to gain true insight into the state of the pandemic, we must be prepared to battle this statistical ineptitude, along with the other gnarly monsters that lie out there, ready to pounce and spread misinformation and panic. With an abundance of graphs in the news and on social media, you may have seen the graph in case one before. It shows the number of confirmed cases in Great Britain, the USA, and Japan since the start of the pandemic. The journalistic intent is often simple: to provide a comparison between policies.
Case 1

Simple enough right? Japan was doing badly at first, but was quickly overtaken by GB and the USA. Now, not much separates the three countries. It is pretty close, all things considered, so Japan’s tighter controls and more productive contact tracing procedures obviously weren’t worth the effort. Also, in decidedly good news, cases are starting to level off. The real danger was before April, so we can probably be less intense with our restrictions now. That is a perfectly appropriate conclusion to make from a cursory glance of the graph. It is also a complete misrepresentation of the trends that it is trying to display. Those with keen eyes may have noticed that the graph uses an exponential scale. Instead of increasing by a fixed amount each time (like 1,2, 3, 4), the amount itself increases each time (so 1, 10, 100, 1000). Our brains suck at understanding what that means. Here, let’s see a graph of what the last twelve months look like with a linear scale.
Case 2
This makes the relative difference between the three countries crystal clear. Japan isn’t actually nearly doing as badly as the US. It only just registers on the graph. But there is another problem with this data. The goal is to interpret the relative of countries in dealing with the pandemic, but the graph isn’t showing that. What we’re interested in is how prevalent the virus is in the population: the proportion of the population which is infected, not the total value. A country with a billion people could have the most restrictive measures with only 0.05% of the population infected and be off the chart. Infectevery citizen in a country of 100,000 and it would be a catastrophe but appear invisible on the axes.
Case 3

Here, finally, we start to get an idea of how various policies have worked. It has taken a while, but we are finally representing the information we want to represent, rather than data which appears to on first glance. Here, on a point of personal preference, I think it is far more important to show recorded deaths rather than recorded cases. Firstly, countries have different extents of testing, but more uniform procedures for recording deaths. Secondly, it is the metric which measures the true seriousness of the virus – colds infect many millions every year, but kill a few. Coronavirus is scary precisely because cases are high and so are deaths. So, for the last time, here’s a graph.
Case 4

There could be no clearer statement of our failure to protect our most vulnerable. Compare with the first graph – a graph I have seen painted numerous times – and consider just how intellectually negligent it is to mass-print. Bad stats are everywhere. They mislead and comfort us when we should be seething with rage at the processes which have brought us here. Don’t be fooled, my stay-at-home-scientists. Call it when you see it.
Images: Nicholas Bush