It was a Sunday morning, the 8th of March 2020. Up to that moment, I’ll admit that my level of attention for the whole COVID-19 had been relatively low. Sure, I was keeping an eye at the daily reports about Italy (I’m Italian and my family lives there), but I hadn’t yet put enough effort into doing any kind of analysis. Something though had been bugging me for a couple of days about the numbers I was reading, but I couldn’t quite figure out what it was. So I decided to invest a couple of hours in creating a few basic plots and see what I could make of that information. Spoiler: a few hours later I was sending a 4 pages document to a few selected friends and family members (it was a delicate moment and didn’t want to inadvertently spread panic) to warn them of how the situation would very likely have evolved in the next few days. And it wasn’t pretty. Later that night three entire regions in Italy were put in lockdown, and the measure would have been extended to the whole country the following day.
But let’s start from the beginning.
Setup
The very first thing was of course getting access to data, which had to be:
- Reliable
- Updated regularly (daily)
- Easily accessible
A quick search on google led me to a github repository with enough stars to appear legit, you can find it here https://github.com/CSSEGISandData/COVID-19 [1]. I cloned it, created a virtual environment with the standard packages for this kind of analysis (matplotlib/seaborn, pandas, numpy etc.) and got to work.