Resources
Training and support
Download May 07, 2021

Predicting COVID-19 Cases in Alberta with Machine Learning

COVID-19 Projections

The novel coronavirus, COVID-19, has become one of the most well-documented pandemics of the past years. Countries like Canada and the U.S. have been documenting the daily number of cases, hospitalizations, fatalities, among many other factors that impact and are impacted by the spread of the virus to the population. Governments must closely monitor these key indicators to make projections of future cases so that effective policies can be put in place to better manage the spread of the virus. Making these projections however is no easy task. Given the large amount of data that is available, machine learning algorithms become viable options to predict future cases. This article will attempt to use machine learning techniques to provide predictions of daily COVID-19 cases in Alberta.


Machine Learning Models

Machine learning is a broad term encompassing the creation of a model that learns trends in data so that it can later generate predictions. The specific machine learning models that can analyze trends in time-series data, such as daily COVID-19 cases, are called Recurrent Neural Networks (RNNs). RNNs are able to learn information from timed data to make predictions with time. There are different types of RNNs, some of the most well-known RNN models include the traditional RNN, Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM). It is worth noting that GRU is an emerging type of RNN that is becoming more and more popular due to its faster training time and comparable results with LSTM on smaller datasets. For this article, we will be using the LSTM model to predict the daily COVID-19 cases in Alberta.


The Dataset

The dataset was collected by the University of Oxford Blavatnik School of Government. Along with the daily COVID-19 cases data, this dataset provides metadata of other key parameters that potentially impact the daily cases. The metadata was categorized into three different areas: containment and closure policies, economic policies, and health system policies. A full description of all parameters can be found in this link. For your convenience, Appendix A provides a summary of each of the parameters with a given weight for the level of restriction. The dataset at the time of writing included data up to May 1, 2021.


The Long-Short Term Memory (LSTM) Model

The LSTM model was built and trained on the dataset. The model used data from the start of COVID-19 on March 1, 2020, up to April 11, 2021, for training and from April 12, 2021 to May 1, 2021 for validation. After many epochs of training, the model is ready to generate predictions. The predicted daily number of COVID-19 cases can be seen in Figure 1 and Figure 2. Figure 1 represents a cumulative total of all COVID-19 cases, the data used in the predicted curve is generated by the LSTM model with a 7-day projection. Figure 2 shows a 7-Day moving average of the COVID-19 daily cases, also predicted 7 days in advance. Notice that since the model was trained with data up to 20 days before the end of the dataset it is expected for the predictions to be quite accurate up to this point. It is impressive to note that the model predictions for the last 20 days are still quite accurate. As you can see from the Figures, the model is predicting daily COVID-19 cases to remain the same then start declining towards the end of this upcoming week ending on May 8, 2021.

 

COVID-19 cumulative cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021)

Figure 1: COVID-19 cumulative cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021).


7-day moving average of daily COVID-19 cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021).

Figure 2: 7-day moving average of daily COVID-19 cases in Alberta with true (up to May 1, 2021) and predicted values (up to May 8, 2021).


Recurrent Neural Networks are a great tool to use for the prediction of timed data such as predictive maintenance of oil and gas equipment or oil production forecasting. Process Ecology has accumulated expertise in the development of software in the oil and gas industry. With our knowledge of machine learning modelling, we can bring a competitive advantage to our clients. Reach out today!

Liked this article? Check out others like it:

Predictive Modeling using Machine Learning in the Upstream Oil & Gas Sector


Appendix A: Static features of the COVID-19 dataset.

Category

Subcategory

Levels

Containment and Closure Policies

School Closure

  0 - no measures
1 - recommend closing or all schools open with alterations resulting in significant differences compared to non-Covid-19 operations
2 - require closing (only some levels or categories, eg just high school, or just public schools)
  3 - require closing all levels

Workplace Closure

  0 - no measures
1 - recommend closing (or recommend work from home)
2 - require closing (or work from home) for some sectors or categories of workers
  3 - require closing (or work from home) for all-but-essential workplaces (eg grocery stores, doctors)

Public Events Closure

  0 - no measures
1 - recommend canceling
  2 - require canceling

Gathering Restrictions

  0 - no restrictions
1 - restrictions on very large gatherings (the limit is above 1000 people)
2 - restrictions on gatherings between 101-1000 people
3 - restrictions on gatherings between 11-100 people
  4 - restrictions on gatherings of 10 people or less

Stay at Home Requirements

  0 - no measures
1 - recommend not leaving the house
2 - require not leaving the house with exceptions for daily exercise, grocery shopping, and 'essential' trips
  3 - require not leaving the house with minimal exceptions (eg allowed to leave once a week, or only one person can leave at a time, etc)

National Travel Restrictions

  0 - no measures
1 - recommend not to travel between regions/cities
  2 - internal movement restrictions in place

International Travel Restrictions

  0 - no restrictions
1 - screening arrivals
2 - quarantine arrivals from some or all regions
3 - ban arrivals from some regions
  4 - ban on all regions or total border closure

Economic Policies

Household Income Support

  1 - the government is replacing less than 50% of lost salary (or if a flat sum, it is less than 50% median salary)
  2 - the government is replacing 50% or more of lost salary (or if a flat sum, it is greater than 50% median salary)

Debt Relief

  0 - no debt/contract relief
1 - narrow relief, specific to one kind of contract
  2 - broad debt/contract relief

Health System Policies

Public Information Campaigns

  0 - no Covid-19 public information campaign
1 - public officials urging caution about Covid-19
  2 - coordinated public information campaign (eg across traditional and social media)

Testing Policy

  0 - no testing policy
1 - only those who both (a) have symptoms AND (b) meet specific criteria (eg key workers, admitted to hospital, came into contact with a known case, returned from overseas)
2 - testing of anyone showing Covid-19 symptoms
  3 - open public testing (e.g. "drive-through" testing available to asymptomatic people)

  Contact Tracing

  0 - no contact tracing

  1 - limited contact tracing; not done for all cases

  2 - comprehensive contact tracing; done for all identified cases

Facial Coverings

  0 - No policy
1 - Recommended
2 - Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible
3 - Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible
  4 - Required outside the home at all times regardless of location or presence of other people

Vaccination Policy

  0 - No availability
1 - Availability for ONE of the following: key workers/ clinically vulnerable groups (non-elderly) / elderly groups
2 - Availability for TWO of the following: key workers/ clinically vulnerable groups (non-elderly) / elderly groups
3 - Availability for ALL of the following: key workers/ clinically vulnerable groups (non-elderly) / elderly groups
4 - Availability for all three plus partial additional availability (select broad groups/ages)
  5 - Universal availability

Protection of Elderly

  0 - no measures
1 - Recommended isolation, hygiene, and visitor restriction measures in LTCFs and/or elderly people to stay at home
2 - Narrow restrictions for isolation, hygiene in LTCFs, some limitations on external visitors, and/or restrictions protecting elderly people at home
  3 - Extensive restrictions for isolation and hygiene in LTCFs, all non-essential external visitors prohibited, and/or all elderly people required to stay at home and not leave the home with minimal exceptions, and receive no external visitors

 

By Gabriel Mathias, B.Sc.

Gabriel joined Process Ecology in June 2020 as a Process E.I.T./Air Emissions Analyst. He started his career at Suncor as a Process Safety Engineer involved with the safe start-up of the Fort Hills mining operation. Gabriel has a BSc in Chemical Engineering from the University of Calgary and is pursuing an MSc in Software Engineering. His interest in both chemical and software engineering has been well utilized in the development of a new process simulation tool, NEXIM. He is also involved in air emission quantification and reporting for the oil and gas industry. When not behind a computer screen, Gabe enjoys bike riding in Fish Creek or a fun night of board games with friends.

Search

Categories

Latest articles

Certifying Natural gas for Methane Emissions Management: Insights into MiQ Framework

January 22, 2024


U.S. EPA and DOE Join Forces to Combat Methane Emissions: A Process Ecology Perspective

October 31, 2023


Critical Minerals and the O&G Industry

September 13, 2023