Predicting COVID-19 Cases in Alberta with Machine Learning
COVID-19 Projections
The novel coronavirus, COVID-19, has become one of the most well-documented pandemics of the past years. Countries like Canada and the U.S. have been documenting the daily number of cases, hospitalizations, fatalities, among many other factors that impact and are impacted by the spread of the virus to the population. Governments must closely monitor these key indicators to make projections of future cases so that effective policies can be put in place to better manage the spread of the virus. Making these projections however is no easy task. Given the large amount of data that is available, machine learning algorithms become viable options to predict future cases. This article will attempt to use machine learning techniques to provide predictions of daily COVID-19 cases in Alberta.
Machine Learning Models
Machine learning is a broad term encompassing the creation of a model that learns trends in data so that it can later generate predictions. The specific machine learning models that can analyze trends in time-series data, such as daily COVID-19 cases, are called Recurrent Neural Networks (RNNs). RNNs are able to learn information from timed data to make predictions with time. There are different types of RNNs, some of the most well-known RNN models include the traditional RNN, Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM). It is worth noting that GRU is an emerging type of RNN that is becoming more and more popular due to its faster training time and comparable results with LSTM on smaller datasets. For this article, we will be using the LSTM model to predict the daily COVID-19 cases in Alberta.
The Dataset
The dataset was collected by the University of Oxford Blavatnik School of Government. Along with the daily COVID-19 cases data, this dataset provides metadata of other key parameters that potentially impact the daily cases. The metadata was categorized into three different areas: containment and closure policies, economic policies, and health system policies. A full description of all parameters can be found in this link. For your convenience, Appendix A provides a summary of each of the parameters with a given weight for the level of restriction. The dataset at the time of writing included data up to May 1, 2021.
The Long-Short Term Memory (LSTM) Model
The LSTM model was built and trained on the dataset. The model used data from the start of COVID-19 on March 1, 2020, up to April 11, 2021, for training and from April 12, 2021 to May 1, 2021 for validation. After many epochs of training, the model is ready to generate predictions. The predicted daily number of COVID-19 cases can be seen in Figure 1 and Figure 2. Figure 1 represents a cumulative total of all COVID-19 cases, the data used in the predicted curve is generated by the LSTM model with a 7-day projection. Figure 2 shows a 7-Day moving average of the COVID-19 daily cases, also predicted 7 days in advance. Notice that since the model was trained with data up to 20 days before the end of the dataset it is expected for the predictions to be quite accurate up to this point. It is impressive to note that the model predictions for the last 20 days are still quite accurate. As you can see from the Figures, the model is predicting daily COVID-19 cases to remain the same then start declining towards the end of this upcoming week ending on May 8, 2021.
Figure 1: COVID-19 cumulative cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021).
Figure 2: 7-day moving average of daily COVID-19 cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021).
Recurrent Neural Networks are a great tool to use for the prediction of timed data such as predictive maintenance of oil and gas equipment or oil production forecasting. Process Ecology has accumulated expertise in the development of software in the oil and gas industry. With our knowledge of machine learning modelling, we can bring a competitive advantage to our clients. Reach out today!
Liked this article? Check out others like it:
Predictive Modeling using Machine Learning in the Upstream Oil & Gas Sector
Appendix A: Static features of the COVID-19 dataset.
Category | Subcategory | Levels |
Containment and Closure Policies | School Closure | 0 - no measures |
Workplace Closure | 0 - no measures | |
Public Events Closure | 0 - no measures | |
Gathering Restrictions | 0 - no restrictions | |
Stay at Home Requirements | 0 - no measures | |
National Travel Restrictions | 0 - no measures | |
International Travel Restrictions | 0 - no restrictions | |
Economic Policies | Household Income Support | 1 - the government is replacing less than 50% of lost salary (or if a flat sum, it is less than 50% median salary) |
Debt Relief | 0 - no debt/contract relief | |
Health System Policies | Public Information Campaigns | 0 - no Covid-19 public information campaign |
Testing Policy | 0 - no testing policy | |
Contact Tracing | 0 - no contact tracing 1 - limited contact tracing; not done for all cases 2 - comprehensive contact tracing; done for all identified cases | |
Facial Coverings | 0 - No policy | |
Vaccination Policy | 0 - No availability | |
Protection of Elderly | 0 - no measures |