As Covid-19 has impacted almost all countries, data scientists and researchers have come up with different predictive models to predict the spread of disease so that it can help the respective governments to come up with suitable plans and policies to curb the spread. In this context, it becomes essential for data scientists and analysts to know the most popular and useful models that have been proposed as an outcome of the research, as well as get familiar with different datasets available on the internet. In this blog, I'll discuss essential parameters for Covid19 disease spread, in machine learning models, the types of different models, tools, and datasets available. Some of the project initiatives supported by Google are: Monitoring and forecasting disease spread Improving health equity and minimizing secondary effects of the pandemic Supporting healthcare workers Slowing transmission by advancing the science of contact tracing, environmental sensing Devising effective vaccination plans. Researchers, data scientists, engineers have all come forward to use existing AI/ML algorithms or to innovate new algorithms through research, experimentation, and trials. The below figures represent the percentage of contribution of different algorithms in various predictions for covid19. Source Age-based Mortality Models This type of ML model is trained using age-stratified with component-wise gradient boosting. It helps to predict the probability of death based on information available for patients before they contracted the virus. The process of stratification of the overall model by age groups helps to reduce the and to identify risk factors of different ages. Generalized linear models (GLMs) variability in age In the overall model, 18 features were identified in at least 20% of the models (2 of 10) as being associated with . Data scientists/researchers took to compare the relative importance of the variables for predicting mortality. Of these features, age had the most prominent association—median OR: 2.82 (iqr: 0.03)—for predicting mortality. increased mortality risk Odds Ratios (ORs) with interquartile ranges (IQR) Source Google Data Studio Google has played a commendable role in coming up with a Dashboard that gives an ML model-based forecast of the development of COVID-19 in each US state and county, in order to help responders in healthcare, the public sector, and other impacted organizations be better prepared for uncertainties. The data can be directly accessed from or as a downloadable CSV ( , ). The COVID Tracking Project assists the forecasts with historical values for hospital, ICU, and ventilator usage. Johns Hopkins Coronavirus Resource Center assists with historical data for confirmed cases and deaths while data for vaccine distribution are taken from . BigQuery state forecasts county forecasts Govex Source The below figures illustrate different predictions using BigQuery with data from the above sources as stated above. Source For detailed predictions per month, for a given state in US, explore more about it at BigQuery console Example of a sample query include: * bigquery- -data.covid19_public_forecasts.state_28d state_fips_code = “ ” prediction_date >= forecast_date prediction_date SELECT FROM public WHERE 48 AND ORDER BY Covid19 datasets and models Google Aggregated Mobility Research Dataset – Aggregates weekly flows of users from region to region, where the region is at a resolution of 5km2 COVID-19 Mobility : contains dataset from aggregated data sources, both US and non-US Covid-19 dataset on GitHub : Making data freely accessible for better public outcomes COVID-19 public dataset program by COVID-19 in India National Institute of Health 54 Datasets : containing USA’s daily report state-wise with confirmed cases, time-series summary, etc JHU CSSE COVID-19 Dataset https://github.com/youyanggu/covid19_projections SEIR Simulator Regression-based Time-Series Modeling with Covid19 This is a forecasting approach for COVID-19 case prediction relying on and . This ML modeling approach uses a single large-scale Spatio-temporal graph, with the following assumptions. Graph Neural Networks mobility data – Edges represent direct location-to-location movement and are weighted by mobility flows, based on, the amount of flow internal to the location. Spatial domain – Edges represent a binary connection to past days. Temporal domain Each contains features for the state, county, day, past cases, and past deaths node The most important advantage of Spatio-temporal graphs for COVID-19 prediction is that it does not male assumptions of the underlying disease dynamics and can learn from a variety of data, including inter-region interaction and region-level features. COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks The above COVID-19 graph showing and (highlighted in red) across three days. Each slice represents spatial connections between counties, while the connections between slices represent temporal relationships. Every node in the graph has direct temporal edges to nodes in d previous days. spatial temporal edges COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks The above figure represents the model. Multiple layers of are used on vectors. At each layer, the embedding of the seed node (represented in blue) is concatenated and propagated up to the next embedding layer. The final embedding is passed through an MLP and used to predict P 2-hop Skip-Connection spatial aggregations temporal embedding Fairness on Covid-19 Datasets As Google is committed to principles, it has come forward, to study the disproportionate impact, the disease has had in the United States. As a pioneer of Fairness in AI, Google’s AI team could try to follow “Avoid creating or reinforcing unfair bias”, to study the actual impact of the disease. Responsible AI CDC research has shown that communities of color in the United States have been the hardest hit by COVID-19 with high rates of cases and deaths. The causes of it are related to structural racism, various systemic inequities in access to healthcare, inherent systemic bias, and underlying negatively impacting social determinants of health. disproportionately The below figure illustrates: The absolute error in predicting covid19 deaths is significantly higher for counties with a higher proportion of younger and middle-aged people. The demographic sections of these countries comprising of younger and middle-aged groups have a higher proportion of COVID-19 case counts. Further, after the absolute errors are normalized by actual death counts, there is less difference between the confidence intervals across the demographic groups. Source During the analysis of , the analysis was done by bucketing (segregating them to bins) county populations according to their income. The results are represented by the bottom figure which shows higher absolute errors for higher-income counties. median income Similarly, for and , the figure clearly depicts, there is a direct correlation between the absolute errors and death counts, and this is meaningfully reduced when the error is normalized by the death count, causing the confidence intervals to overlap. Race Ethnicity Source Interpretable Sequence Learning with Covid19 datasets This kind of ML framework proposed how different (composed of different direct and indirect factors that affect prediction coefficients) evolve. It uses to incorporate and improve model performance. The performance of the model has been further analyzed for different subgroups based on the subgroup distributions within the counties. compartments interpretable encoders covariates Source The model is based on an extension to the standard model that includes additional compartments for undocumented cases and hospital resource usage. The end-to-end modeling framework can infer meaningful estimates for undocumented cases even if there is no direct supervision for them. SEIR ( susceptible – exposed – infectious – removed ) The model takes into account disease dynamics that vary over time – e.g. as mobility reduces, the spreading decays. Further, the framework has improved generalization while learning from limited training data, using Masked supervision from partial observations, Partial teacher-forcing to minimize error propagation, Regularization, and Cross-location information-sharing The most important assumptions introduced for the model are: Introduction of compartments for undocumented infected and recovered cases Introduction of hospitalized, ICU, and ventilator compartments Partial immunity No death from undocumented infected cases Invariant populationExplainable AI for Covid19 The most important characteristics of Interpretable Sequence-learning works on the basis of modeling the compartments explicitly to provide an understanding of disease evolution. Explainable AI for Covid19 The most important characteristics of works on the basis of modeling the compartments explicitly to provide an understanding of disease evolution. The below figure demonstrates how the fitted curves can be used to infer important insights on where the peaking occurs or the current decay trends. Interpretable Sequence-Learning Source The of at different phases is computed, as well as the amount of increase/decrease for each compartment is analyzed. For intervention covariates, the largest weights (with significant changes of disease spread) is noticed after a lag of a few days, suggesting their effectiveness after some lag. The positive weights of the mobility index, and negative weights of public interventions are also clearly observed. ratio undocumented to documented infected Source The above figure demonstrates the Learned weights of the time-varying covariates for β (Average contacts of doc. infected/undoc. infected), for 7-day state-level forecasting models for three weeks starting from 24th May 2020 to 7th June 2020. It is observed that the . In addition, the weight magnitude of the interventions gets larger after a lag of few days. mobility index consistently has a highly positive impact on β while gathering bans, school closures and shelter-in-place interventions have highly negative effects Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand The exponential rise of COVID-19 cases and the number of deaths have forced governments in different countries to introduce interventions too early. However, it possesses the risk of allowing the transmission to return once they are lifted (if insufficient has developed). herd immunity It became likely that researchers model the impact of different measures, the time-period over which the interventions need to be maintained, and its effect on the critical care beds occupied per 100,000 of the population. The below figures illustrate the impact of different measures in correspondence with. critical care beds occupied. – Mitigation strategy scenarios for GB showing critical care (ICU) bed requirements. The black line shows the unmitigated epidemic. The green line shows a mitigation strategy incorporating closure of schools and universities; the orange line shows case isolation; the yellow line shows case isolation and household quarantine; and the blue line shows case isolation, home quarantine, and social distancing of those aged over 70. The blue shading shows the 3-month period in which these interventions are assumed to remain in place. Source AI/ML Model-based COVID-19 vaccine prioritization With the ongoing COVID19 vaccinations and high demand for the limited supplies of vaccine, researchers have come forward to build ML models to enable prioritization of vaccine distributions. The objective of this approach is to : i) directly vaccinate those at the highest risk (risk of death, persons over 60 years of age, and those with comorbidities) for severe outcomes and (ii) protect them indirectly by vaccinating those who do the most transmitting. This method involves building a mathematical to compare five age-stratified prioritization strategies. The prioritization strategies remain consistent across countries, transmission rates, vaccination rollout speeds, and estimates of naturally acquired immunity. In addition, this ML-based framework allows comparing the impacts of across contexts. SEIR model (susceptible, exposed, infectious, recovered) prioritization strategies Source Figure demonstrates age-dependent vaccine efficacy shows a decrease from 90% baseline efficacy to 50% efficacy among individuals aged 80+ years, beginning at age 60. Figures ( and ) Percent reduction in deaths in comparison with an unmitigated outbreak for transmission-blocking all-or-nothing vaccines with either constant 90% efficacy for all age groups (solid lines) or age-dependent efficacy. A B C Conclusion Different research organizations and universities came up with innovations to predict covid19 disease spread and effective measures that could prevent disease spread. As hospitals worldwide are faced with finite resources, new ML models are also being developed to help allocate therapies and equipment to those most at-risk, maximizing survival. This will help clinicians to predict which currently uninfected individuals might derive the greatest benefit from vaccination. In order to increase the accuracy of forecasts, a Covid19 CDC Hub – , has been developed which aggregates forecasts from over 30 models and sends them to the each week to help inform public health decision making. The model works in collaboration with the US CDC, which takes in data and builds a single ensemble forecast (by assembling forecasts of the trajectory of the COVID-19 pandemic from different modeling teams submitted at the repository). COVID-19 Forecast Hub CDC forecast data References https://datastudio.google.com/u/0/reporting/52f6e744-66c6-47aa-83db-f74201a7c4df/page/4A0sB?s=ou-b6M0HXag https://console.cloud.google.com/bigquery?sq=1056986132837:f200f3f52daf4f51bd60ed0306c14db5&project=tidal-triumph-248307 COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks https://arxiv.org/pdf/2007.03113.pdf https://github.com/nytimes/covid-19-data https://storage.googleapis.com/covid-external/COVID-19ForecastFairnessAnalysis.pdf https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf https://github.com/reichlab/covid19-forecast-hub https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf https://github.com/reichlab/covid19-forecast-hub https://services.google.com/fh/files/misc/the_impact_of_covid_on_manufacturers_2020_report_google_cloud.pdf https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset Contribute at: kaggle.com/covid19 Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand – https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf Mobility network models of COVID-19 explain inequities and inform reopening https://www.nature.com/articles/s41586-020-2923-3 A review on COVID-19 forecasting models https://link.springer.com/article/10.1007/s00521-020-05626-8 New machine learning model predicts who may benefit most from COVID-19 vaccination Prioritize vaccine delivery – https://assets.ey.com/content/dam/ey-sites/ey-com/en_in/topics/techathon/ai-ml.pdf https://github.com/hollobit/COVID-19-AI A multi-stage SEIR model to predict the potential of a new COVID-19 wave in KSA after lifting all travel restrictions https://www.sciencedirect.com/science/article/pii/S1110016821001460 https://www.nature.com/articles/s41746-021-00425-4 Predicting Coronavirus Pandemic in Real-Time Using Machine Learning and Big Data Streaming System https://www.hindawi.com/journals/complexity/2020/6688912/ Concept Drift and the Impact of COVID-19 on Data Science https://www.iguazio.com/blog/concept-drift-and-the-impact-of-covid-19-on-data-science/