avatarAndrej Baranovskij

Summary

The website presents a methodology for forecasting COVID-19 growth using a logistic function and the Prophet library in Python, based on global case data.

Abstract

The website details a data-driven approach to modeling and forecasting the spread of COVID-19 for individual countries. It utilizes a logistic growth model to estimate the trajectory of the virus's spread and employs Facebook's Prophet library to forecast future cases. The data is sourced from the NovelCovid/API REST endpoint and processed using Python, specifically Pandas and SciPy libraries, to identify key parameters such as the day of fastest growth. The forecast aims to predict the stabilization period and the upper range of infections for each country, providing valuable insights for healthcare resource allocation and public motivation. A web UI is available for users to interact with the forecasting model, and the source code is openly accessible on GitHub.

Opinions

  • The author believes that IT professionals have a role in combating the COVID-19 pandemic by providing forecasting tools.
  • The forecasting model is considered useful for estimating when the virus's spread might stabilize and for planning healthcare resources.
  • The author is inspired by the concept of logistic growth applied to coronavirus data and seeks to enhance the model's utility by incorporating forecasting with Prophet.
  • The work is seen as a contribution to the fight against the virus, aiming to provide hope and practical assistance to healthcare workers and the public.
  • The author acknowledges the limitations of the logistic growth model in certain scenarios and plans to address these in future updates.
  • The inclusion of a web UI and the transparency of the source code reflect the author's commitment to accessibility and collaboration within the community.

COVID-19 Growth Modeling and Forecasting with Prophet

Based on all countries COVID-19 data fetched through REST, we model virus growth with logistic function and run forecast with Prophet library in Python

Source: Pixabay

Web UI is available herehttps://app.katanaml.io/covid19/

COVID-19 is a hot topic these days. Healthcare workers are the first line of defense. If you are in IT you are part of the fight against the virus. I thought I should do my part and implement a method to forecast coronavirus growth and dates when the number of infections could stabilize. Forecasting new cases growth could be useful in several ways:

  • People would be able to understand, at least roughly when all this would end. This helps to keep the spirit and motivation
  • Healthcare workers could estimate medical equipment, hospital beds, protective masks, etc.

I was inspired by this post — Modeling Logistic Growth. The author explains how coronavirus growth can be modeled with logistic formula. I thought why not go one step further and implement a forecast for each country using the logistic growth model. Using calculated parameters from the logistic growth model, I’m executing forecast action with the Facebook Prophet library.

I’m fetching COVID-19 daily cases stats from NovelCovid/API REST endpoint. Data fetch is done in covid19_prepare_data.ipynb. Function build_covid19_data() calls REST endpoint, iterates through each country and builds joint Pandas dataframe. This dataframe is persisted into CSV file for further processing.

def build_covid19_data():
    request_str = 'https://corona.lmao.ninja/v2/historical'
    response = requests.get(request_str)
    json_data = response.json() if response and response.status_code == 200 else None
    
    df = None
    for country in json_data:
        res = build_country_data(country)
        if df is None:
            df = pd.DataFrame(res)
            df.index = pd.DatetimeIndex(df['Report_Date'])
            df = df.drop('Report_Date', 1)
            df = df.sort_values(by=['Report_Date'])
        else:
            df_new = pd.DataFrame(res)
            df_new.index = pd.DatetimeIndex(df_new['Report_Date'])
            df_new = df_new.drop('Report_Date', 1)
            df_new = df_new.sort_values(by=['Report_Date'])
            df = df.merge(df_new, left_index=True, right_index=True)
    
    df.to_csv('data/covid19_data.csv')
    return df

Function build_country_data(country) does REST response parsing and build an array of data that will be joined into Pandas.

def build_country_data(country):
    res = []
    keys = country.get('timeline').get('cases').keys()
    for key in keys:
        target_entry = {}
        target_entry['Report_Date'] = key
        country_name = country.get('country')
        if country.get('province') != None:
            country_name = country_name + '_' + country.get('province')
        target_entry[country_name + '_cases'] = country.get('timeline').get('cases').get(key)
        target_entry[country_name + '_deaths'] = country.get('timeline').get('deaths').get(key)
        target_entry[country_name + '_recovered'] = country.get('timeline').get('recovered').get(key)
        res.append(target_entry)
    return res

COVID-19 Forecasting

Step 1

We need to identify if the virus in a particular country is still growing exponentially or growth starts to follow a flat curve. Virus spread can be modeled by logistic function (read more here — Modeling Logistic Growth). If a country is in the second part of the function, this means growth is going towards the end, if somewhere in the middle — this means fast growth is still ahead. Logistic function expressed in Python to identify growth:

# Define funcion with the coefficients to estimate
def func_logistic(t, a, b, c):
    return c / (1 + a * np.exp(-b*t))

The core idea — we can identify the day of the fastest growth through logistic function. Based on the fastest growth day data, we can estimate the top number of infections and using this calculate forecast with Prophet. Parameters for logistic function are calculated per each country data using scipy.optimize library.

data = data.reset_index(drop=False)
data.columns = ['Timestep', 'Total Cases']
            
# Randomly initialize the coefficients
p0 = np.random.exponential(size=3)
# Set min bound 0 on all coefficients, and set different max bounds # for each coefficient
bounds = (0, [100000., 1000., 1000000000.])
# Convert pd.Series to np.Array and use Scipy's curve fit to find   # the best Nonlinear Least Squares coefficients
x = np.array(data['Timestep']) + 1
y = np.array(data['Total Cases'])
(a,b,c),cov = optim.curve_fit(func_logistic, x, y, bounds=bounds, p0=p0, maxfev=1000000)
                
# The time step at which the growth is fastest
t_fastest = np.log(a) / b
i_fastest = func_logistic(t_fastest, a, b, c)

Prophet requires carrying capacity value to be provided to forecast logistic growth. We calculate this value from the identified logistic function. There are two cases.

  1. When the fastest growth day is still ahead = growth increasing. We add ten days after identified the fastest growth day to calculate the estimated top number of infections. This value will be used by Prophet to do a forecast
  2. When the fastest growth day is in the past = growth stabilized. We use the current day and add ten days to calculate the estimated top number of infections. This value will be used by Prophet to do a forecast

Check for complete implementation in detect_growth() function from covid19_model.ipynb.

Step 2

Forecast with Prophet is straightforward. We are calculating forecast per country by feeding actual data together with an estimated carrying capacity value. The forecast is calculated for the next 20 days.

df.columns = ['ds', 'y', 'cap']
    
m = Prophet(growth="logistic")
m.fit(df)
future = m.make_future_dataframe(periods=20)
future['cap'] = df['cap'].iloc[0]
forecast = m.predict(future)

Let’s see what results we get. I decided to check five countries: Lithuania, Italy, Spain, the USA, and Israel.

Keep in mind this is a forecast, it is based on today's actual numbers, with next day data it can be adjusted. Black dots — actual data, blue line — forecast. Vertical black line — last day available with actual data. Horizontal black line — estimated top number of infections.

Lithuania

Upper range: ~2000

Expected stabilization period: by 2nd part of April

Italy

Upper range: ~130000

Expected stabilization period: by 1st part of April

Spain

Upper range: ~260000

Expected stabilization period: by 1st part of May

USA

Upper range: ~850000

Expected stabilization period: by 1st part of May

Israel

Upper range: ~26000

Expected stabilization period: by 1st part of May

Source code is available on GitHub.

Web UI is available herehttps://app.katanaml.io/covid19/

Updates planned for the next version:

  1. Handle separately cases, when logistic growth model can’t be applied (for example: South Korea, Hong Kong and Singapore)
  2. Add backtesting indicator
Python
Covid-19
Forecast
Data Science
Recommended from ReadMedium