Heatmap

COVID-19 Cases Trend Chart When the pandemic hit Milan and Italy, like many others, I wanted to take a look at the data, even though it was uncertain.

The goal was not to create a predictive system – many were already dedicated to that, using specific information and models – but to highlight existing trends.

Cleaned data (graph below) could indeed provide better insights than the raw values reported in the media every day.

I decided to put the representations I had created online on the website www.covcompare.com. The content is generated by a program that daily (around the time of release) downloads data from institutional websites and prepares some useful results later on.

Upon user request, you can view any combination of regions.

There are essentially four visualizations related to new cases of coronavirus infection, allowing comparisons between regions. The first one is the absolute number of new infections, as reported by institutions, mainly demonstrating that this raw number tells little.

Then, there’s a chart of processed data – normalized with respect to the population, logarithm, moving average – where you can actually see what is happening in a specific region. Finally, there is a heatmap representing all days and all regions.

Regarding the heatmap, I wanted to order the regions so that they showed a homogeneous transition. To do this, I used Principal Component Analysis (I also experimented with other techniques, with similar results), which automatically extracts the best ordering of regions.

Finally, I used a clustering algorithm particularly suitable for time series to group the regions (there are clusters with similar trends in terms of acceleration and deceleration) and a peak detection algorithm to identify the peak moments of different waves in different areas.

Heatmap

Peaks and Clusters

 

The code is available on Github: https://github.com/giosds/covcompare

The backend is implemented in Python, using Flask.

 

 

Note: Automatically translated from Italian.