This article is to study how social distancing impacts the spread of the corona virus and thus impacting the number of hospital beds needed. This study is based on varying the basic reproduction number Ro and simulating its impact on the spread of the virus using a simple Epidemic model called SIR.
Please note that this article is primarily to illustrate an example of using machine learning algorithms for prediction with the limited data that is publicly available and the opinions of this article should not be interpreted as professional advice.
In this article, I will briefly explain what the basic reproduction number is and a brief overview of the SIR model without going into the mathematics behind it. I will showcase the results of the modelling I did in R programming language to simulate the impact of social distancing using various Ro values.
Exponential disease spread
In this example shown below, 8 people have come into contact with one infected person. Two of them have been infected. Those 2 infect more people and this process continues exponentially.
The obvious solution is to reduce the number of people the infected person contacts. One of the important metric that measures the number of people an infected person can infect is the reproduction number which will be discussed in the section.
Basic reproduction number (Ro)
The basic reproduction number Ro that estimates the speed at which a disease is capable of spreading in a population. This is the total number of people an infected person infects. This is an important number to understand so let's discuss this little bit more below.
This number Ro pronounced R naught.” It’s a mathematical term that indicates how contagious and infectious disease is. Ro tells you the average number of people who will catch a disease from one contagious person. If a disease has an Ro of 6, an infected person will transmit the disease to an average of 6 other people, as long as no one has been vaccinated against it or is already immune to it in their community. Swine flu or H1N1 virus from 2009 had a Ro value of ~ 1.5. The impact was not that much because of vaccines and antiviral drugs. In the case of Coronavirus, the Ro value is estimated to be between 1.5 to 3.5 where 1.5 to 2.5 being used when good social distancing is practiced.
Our goal is to use these Ro values to simulate social distancing impact. A very good social distancing program can be thought of as having an Ro value close to 1. Ro of 2 means an infected person will transmit the disease to 2 other people. If the Ro value is greater than one, the infection rate is greater than the recovery rate, and thus the infection will grow throughout the population. The social distancing program that has Ro values greater than 2 will have trouble containing the virus spread & can potentially overwhelm the population.
Predictive model used for the simulation
Epidemic models are compartment based models that divide the population into separate groups to identify how the disease spreads from one member of a population from one group to another. One of the simplest compartmental models is called SIR which divides the population into 3 groups as described below. One can read more about the mathematics behind this model here.
The two important parameters for this study is
1. Transmission rate
Each infected person can contact as few or as many people a day depending on if social distancing is practiced or not. An infected person can meet few people and infect some of them. Say one infected person meets 6 people and has a 15% probability to infect them. It means the (𝛃) aka the transmission rate is 6 * 0.15 = 0.9 person per day.
2. Rate of recovery (Ɣ)
Rate of recovery is the ratio of the infected person recovered in the time unit. So in this case the infection lasts for 5 days, ⅕ is the gamma value and is currently infected population that recovers each day.
How many additional hospital beds are needed in a state/county to tackle the surge
I used a model built in R programming language to do this study. I initially tried to do this simulation using data from Fairfax county in Virginia but since all the data is not publicly available I used some of the stats from Virginia & USA to fill in the numbers needed for the study. Let's say this county / state has a population of 1 million and announced 5 cases around March 10th. Please note that the model treats all individuals to be the same when in reality older populations with more chronic conditions seem to have higher risk.
A brief summary of the key general stats as of 04/25/20. These are some of the publicly available stats we have used in our calculations. Sources for these stats are from WHO and CDC.
Input parameters used for the simulation
The next section discusses the predictive model output build using R programming language with the above parameters.
Impact of social distancing or lack of it on peak cases
The impact of Social distancing is measured using Ro values. Good social distancing essentially means Ro value is very low. The following graph shows the impact of various Ro values ranging from 1.25 to 2.0. Y-axis shows the number of people (%) and the x-axis shows the number of days. The lower the Ro value, the lower the number of people infected and farther the time to peak giving the county ample time to prepare for it. If we start relaxing the social distancing you will see behavior that is closer to Ro = 2.0 (or possibly above). This is why social distancing, testing, tracing and quarantine are extremely important.
For Ro = 1.25, assuming we continue to do good social distancing and other quarantine methods, you can see 21,500 cases around 197 days. This comes to around 7 months from March 10th which would be the end of Sept / early october. On the other hand, for Ro of 1.5, peak case is at 63,000 at 112 days. The peak increased as well as the time to peak decreased which cuts short the time to prepare
Predicting hospital bed capacity for different Ro values
The following table summarizes the results of the modelling for the 3 different Ro values. Peak value indicates the max number of cases one would find and Days is days since the simulation started (March 10, 2020).
For this simulation we used the availability of the beds as 2.1 beds per 1000 (from stats) & 10% & 16% (from stats) of the people who tested positive are hospitalized. Taking those into consideration, the needed beds are calculated below.
For 16% hospitalization rate
For Ro = 1.25, the SIR model built using R predicts peak cases will happen after 197 days, with the approximate peak date of Sept 23 with hospital beds needed at 3,440 for 16% hospitalization. Assuming 2.1 beds per 1000 people, this county would experience a shortage of 1,340.
Similarly, for Ro = 1.5, the model predicts peak cases will happen after 112 days, with the approximate peak date of June 30th with hospital beds needed at 10,080. Assuming 2.1 beds per 1000 people, this county would experience a shortage of 7,890.
For 10% hospitalization rate
If we were to assume a conservative estimate for the percentage of the population who tested positive to need hospitalization @ 10% the following are the corresponding hospital bed capacity needed.
For Ro = 1.25, the SIR model built using R predicts this county would experience no shortage. For Ro 1.5, it predicts a shortage of 4,200 beds.
Using a SIR model we predicted the extent of the spread for a county with a population of 1 million that had 5 confirmed cases as of March 10th with various reproduction number (Ro) values. Key conclusions include:
Level up your reading game by joining Hacker Noon now!