Before you go, check out these stories!

Hackernoon logoCOVID-19: Perceived Spread vs. True Spread in China, Italy and the US by@gabor-petnehazi

COVID-19: Perceived Spread vs. True Spread in China, Italy and the US

Author profile picture

@gabor-petnehaziGabor Petnehazi

Here at TimeNet, we’re building a large time series database with the primary aim of benefitting society through access to data. In this post we’ll study different time series representing both the true, and the perceived spread of the coronavirus (COVID-19) pandemic. Daily COVID-19 numbers are currently available on for many countries. We’re expanding these datasets with further variables measuring how we (people) perceive the significance of the pandemic. We use stock market movements and internet search trends to quantify the virus’s perceived spread.

Data Science and its role in fighting COVID-19

The current crisis is a great challenge for humanity—there is no arguing this. Since no vaccine is available and the incubation period can last as long as two weeks, it’s difficult to contain the pandemic. A recent study from researchers at Columbia University found that asymptomatic people or those with mild symptoms are responsible for 79% of further infections. In essence you could say it’s a clever and efficient virus. Thankfully, data science can offer us some quick answers in these difficult and uncertain times.

Data science and artificial intelligence can help fight the pandemic in various ways, from forecasting outbreaks to developing drugs. It can potentially lead to privacy issues, but the end justifies the means (at least, to a certain point).

The Kaggle data science community has launched two COVID-19 forecasting challenges. The primary goal of these challenges is not to provide accurate forecasts, but to identify factors that impact virus transmission. Such initiatives can be a great help fighting against this invisible enemy.

We believe that TimeNet’s large scale time series database can also help. monitors the spread of the virus (the daily number of confirmed cases, deaths, and recoveries). These statistics can then be compared against various time series of interest.

What correlations have we discovered and what do they mean?

We have chosen 3 countries for this analysis: China, where the COVID-19 outbreak was identified; Italy, the first European country strongly affected; and the United States, where the epidemic still seems to be in a growth stage (at the time of writing). The figure below shows the normalized spread of COVID-19 in the three countries.

Let’s explore the correlations between coronavirus cases and stock market performance. The S&P 500 is a capitalization-weighted stock market index that measures the performance of 500 large companies. Financial analysts often use it as a representation of the overall US stock market. The plot below displays the time series of the stock market index and the number of coronavirus cases.

Correlations between coronavirus cases and the stock market index

There is a very strong negative linear relationship between the accumulating coronavirus cases and the performance of the stock market. This relationship is strongest for Italy. Somewhat surprising, since neither the home of the S&P 500 index (US) nor the origin of the virus (China) had a stronger co-movement in the examined time period.

We have repeated the experiment using the number of coronavirus deaths. The results are very similar, but in this case the United States has the largest correlation in absolute terms.

Correlations between coronavirus-related deaths and the stock market index

We have explored the correlations between coronavirus cases and internet search volumes, too. Two search terms (“Coronavirus” and “COVID-19”) were used to query historical Google search trends and Wikipedia page views. Their correlations with the virus spread are displayed in the table below.

Correlations between internet traffic and coronavirus cases

The correlations between coronavirus-related search volumes and the confirmed cases are positive and high. Three out of four internet traffic time series show the highest correlation with Italian cases, specifically. Overall, it seems that the ‘COVID-19’ search term correlates more strongly with the number of cases than the key word ‘Coronavirus’.

We have repeated this experiment using the number of deaths instead of the number of confirmed cases. Page views of the Wikipedia article ‘Coronavirus’ show the lowest correlation with both cases and deaths, in all three countries. Surprisingly, it has negative correlation with the number of Italian and US deaths. Overall, the correlation patterns are very similar for
the confirmed cases and deaths.

Correlations between internet traffic and coronavirus-related deaths

Moving Forward

In conclusion we would argue that stock market performance and internet search volumes reflect how people perceive the spread and significance of the pandemic. We found that most of these measures correlate more strongly with the number of coronavirus cases in Italy, than those in China or in the United States. It suggests that most people consider the Italian epidemic the most important or the most worrisome. Perhaps the haunting Italian news reports have helped change people’s minds and hopefully their behavior, too.

It’s easy to objectify the data coming from this global crisis and distance ourselves, looking at it as happening ‘somewhere out there’. Yet this threat is very, very real and will impact our society for some time. People are losing their lives and losing their loved ones. Don’t wait until the last moment to take precautions given by local and international health organizations. Unfortunately, Italy serves as a tragic example of what can happen if we don’t act as soon as possible. Do your part and together we will overcome this threat.

What can you find out about the nature of the pandemic? Try
 and join the movement towards discovering COVID-19 solutions.


Join Hacker Noon

Create your free account to unlock your custom reading experience.