I have updated all visualization and prediction results at the top of this notebook

All code implementation are available in the following sections

The data is recorded for 22-Jan to 17-Feb and prediction is made for 7 days

A summary of Confirmed/Deaths/Recovered cases across all Country/Region and their corresponding 'severity'

Severity is computed by $Severity = No.Confirmed / Populaion$

In [69]:
temp.style.background_gradient(cmap='Pastel1_r')
Out[69]:
Confirmed Deaths Recovered severity
Country/Region Province/State
Australia New South Wales 4 0 4 0.530504
Queensland 5 0 0 0.986193
South Australia 2 0 2 1.19261
Victoria 4 0 4 0.625
Belgium NA 1 0 1 0.0877193
Cambodia NA 1 0 1 0.062461
Canada British Columbia 5 0 0 0.986193
London, ON 1 0 1 2.5
Toronto, ON 2 0 0 0.682594
Egypt NA 1 0 0 0.1
Finland NA 1 0 1 0.181818
France NA 12 1 4 0.179104
Germany NA 16 0 12 0.19326
Hong Kong Hong Kong 63 2 5 8.51351
India NA 3 0 3 0.00224048
Iran NA 2 2 0 0.2
Italy NA 3 0 0 0.0496032
Japan NA 84 1 18 0.662461
Macau Macau 10 0 5 16.129
Mainland China Anhui 986 6 413 15.9032
Beijing 393 4 145 18.2451
Chongqing 560 5 274 18.3727
Fujian 293 0 112 7.59855
Gansu 91 2 65 3.55747
Guangdong 1331 5 606 11.731
Guangxi 244 2 86 5.04341
Guizhou 146 2 70 4.20144
Hainan 168 4 84 18.0645
Hebei 306 4 152 4.09639
Heilongjiang 470 12 120 12.2683
Henan 1262 19 573 13.4255
Hubei 62031 2029 10337 1060.36
Hunan 1008 4 561 14.9621
Inner Mongolia 75 0 10 3.03521
Jiangsu 631 0 318 7.84826
Jiangxi 934 1 362 20.6637
Jilin 90 1 37 3.27749
Liaoning 121 1 55 2.75626
Ningxia 71 0 42 11.2698
Qinghai 18 0 16 3.21429
Shaanxi 242 0 102 6.48272
Shandong 544 3 231 6.04444
Shanghai 333 2 186 13.7376
Shanxi 131 0 68 3.58904
Sichuan 514 3 188 6.33785
Tianjin 130 3 54 8.66667
Tibet 1 0 1 0.3125
Xinjiang 76 1 20 3.45455
Yunnan 172 1 60 3.73913
Zhejiang 1174 0 604 20.453
Malaysia NA 22 0 15 0.695762
Nepal NA 1 0 1 0.0341297
Philippines NA 3 1 1 0.0285987
Russia NA 2 0 2 0.0138408
Singapore NA 84 0 34 15
South Korea NA 31 0 12 0.602293
Spain NA 2 0 2 0.0428633
Sri Lanka NA 1 0 1 0.0466418
Sweden NA 1 0 0 0.0988142
Taiwan Taiwan 23 1 2 0.967199
Thailand NA 35 0 15 0.506952
UK NA 9 0 8 0.135461
US Boston, MA 1 0 0 1.45985
Chicago, IL 2 0 2 0.740741
Los Angeles, CA 1 0 0 0.25
Madison, WI 1 0 0 3.92157
Orange, CA 1 0 0 7.14286
San Antonio, TX 1 0 0 0.1
San Benito, CA 2 0 0 33.3333
San Diego County, CA 2 0 0 0.2
Santa Clara, CA 2 0 0 15.748
Seattle, WA 1 0 1 1.38889
Tempe, AZ 1 0 0 5.40541
United Arab Emirates NA 9 0 4 0.957447
Vietnam NA 16 0 7 0.167469
In [53]:
# This is the graphical visualization of serverity
## Hover your mouse over cities to display details

m
Out[53]:

This plot demonstrate the growth in Chinese cities from 22-Jan-2020 to 17-Feb-2020

In [54]:
fig = px.scatter_geo(china_map, lat='Lat', lon='Long', scope='asia',
                     color="size", size='size', hover_name='Province/State', 
                     hover_data=['Confirmed', 'Deaths', 'Severity'],
                     animation_frame="Date", 
                     title='Spread in China over time')
fig.update(layout_coloraxis_showscale=False)
fig.show()
In [55]:
fig = px.treemap(china_latest.sort_values(by='Confirmed', ascending=False).reset_index(drop=True), 
           path=["Province/State"], values="Confirmed", title='Number of Confirmed Cases in Chinese Provinces')
fig.show()
fig = px.treemap(row_latest, path=["Country/Region"], 
                 values="Confirmed", title='Number of Confirmed Cases outside china')
fig.show()

Since people have changed methodology of confirming cases in Hubei and the situation in Wuhan is quite different from the rest of Chinese cities

We would be consider Chinese cities without Hubei which is a better reference for Singapore

In [62]:
fig.show()

We can see that Singapore's growth resemble more to the Chinese cities without considering Hubei. In addition, it's lagging behind for about a 8 days

This makes sense because Singapore's epidemic started later than Chinese. Depite the government taking measures to control the epidemic, it's yet start to plateau(as compared to Chinese). We should hope to see SG's curve getting to a plateaued state in the next 8 days.

In [64]:
fig.show()

Prediction by Prophet

Blue band indicates our confidence region of future confirmed cases. In the best case scenario, Chinese cities(without considering Hubei) should have a good chance of decliding number of confirmed cases starting next week. Whereas for Singapore, it's like that the growing trend will last for a 8-10 days before starting to decline.

In [70]:
plot_China_without_Hubei()
MAPE : 0.6625833464120447
In [68]:
plot_Singapore()
MAPE : 1.2321322095806737
In [ ]:
 
In [ ]:
 
In [1]:
import numpy as np
import pandas as pd
In [2]:
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric, add_changepoints_to_plot, plot_plotly
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
%matplotlib inline
plt.style.use('ggplot')

Read in the data from Kaggle

In [3]:
df = pd.read_csv('2019_nCoV_data.csv')
# conf_df = pd.read_csv('time_series_2019_ncov_confirmed.csv')
# deaths_df = pd.read_csv('time_series_2019_ncov_deaths.csv')
# recv_df = pd.read_csv('time_series_2019_ncov_recovered.csv')

conf_df = pd.read_csv('time_series_2019-ncov-Confirmed.csv')
deaths_df = pd.read_csv('time_series_2019-ncov-Deaths.csv')
recv_df = pd.read_csv('time_series_2019-ncov-Recovered.csv')
In [4]:
df.head(10)
Out[4]:
Sno Date Province/State Country Last Update Confirmed Deaths Recovered
0 1 01/22/2020 12:00:00 Anhui China 01/22/2020 12:00:00 1.0 0.0 0.0
1 2 01/22/2020 12:00:00 Beijing China 01/22/2020 12:00:00 14.0 0.0 0.0
2 3 01/22/2020 12:00:00 Chongqing China 01/22/2020 12:00:00 6.0 0.0 0.0
3 4 01/22/2020 12:00:00 Fujian China 01/22/2020 12:00:00 1.0 0.0 0.0
4 5 01/22/2020 12:00:00 Gansu China 01/22/2020 12:00:00 0.0 0.0 0.0
5 6 01/22/2020 12:00:00 Guangdong China 01/22/2020 12:00:00 26.0 0.0 0.0
6 7 01/22/2020 12:00:00 Guangxi China 01/22/2020 12:00:00 2.0 0.0 0.0
7 8 01/22/2020 12:00:00 Guizhou China 01/22/2020 12:00:00 1.0 0.0 0.0
8 9 01/22/2020 12:00:00 Hainan China 01/22/2020 12:00:00 4.0 0.0 0.0
9 10 01/22/2020 12:00:00 Hebei China 01/22/2020 12:00:00 1.0 0.0 0.0

Data wrangling

In [5]:
dates = list(conf_df.columns[4:])
dates1 = list(recv_df.columns[4:])

We would want to aggregate the confirmed/death/recovered data under one date frame

In [6]:
conf_df_long = conf_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Confirmed')

deaths_df_long = deaths_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Deaths')

recv_df_long = recv_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Recovered')

full_table = pd.concat([conf_df_long, deaths_df_long['Deaths'], recv_df_long['Recovered']], 
                       axis=1, sort=False)
In [7]:
full_table['Date'] = pd.to_datetime(full_table['Date'])
full_table['Recovered'] = full_table['Recovered'].astype('int')
full_table.dtypes
Out[7]:
Province/State            object
Country/Region            object
Lat                      float64
Long                     float64
Date              datetime64[ns]
Confirmed                  int64
Deaths                     int64
Recovered                  int64
dtype: object
In [8]:
full_table.head()
Out[8]:
Province/State Country/Region Lat Long Date Confirmed Deaths Recovered
0 Anhui Mainland China 31.82571 117.2264 2020-01-22 1 0 0
1 Beijing Mainland China 40.18238 116.4142 2020-01-22 14 0 0
2 Chongqing Mainland China 30.05718 107.8740 2020-01-22 6 0 0
3 Fujian Mainland China 26.07783 117.9895 2020-01-22 1 0 0
4 Gansu Mainland China 36.06110 103.8343 2020-01-22 0 0 0
In [9]:
full_table['Country/Region'].unique()
Out[9]:
array(['Mainland China', 'Thailand', 'Japan', 'South Korea', 'Taiwan',
       'US', 'Macau', 'Hong Kong', 'Singapore', 'Vietnam', 'France',
       'Nepal', 'Malaysia', 'Canada', 'Australia', 'Cambodia',
       'Sri Lanka', 'Germany', 'Finland', 'United Arab Emirates',
       'Philippines', 'India', 'Italy', 'UK', 'Russia', 'Sweden', 'Spain',
       'Belgium', 'Others', 'Egypt', 'Iran'], dtype=object)

Add a column for population of the city

In [10]:
full_table['Province/State'].unique()
Out[10]:
array(['Anhui', 'Beijing', 'Chongqing', 'Fujian', 'Gansu', 'Guangdong',
       'Guangxi', 'Guizhou', 'Hainan', 'Hebei', 'Heilongjiang', 'Henan',
       'Hubei', 'Hunan', 'Inner Mongolia', 'Jiangsu', 'Jiangxi', 'Jilin',
       'Liaoning', 'Ningxia', 'Qinghai', 'Shaanxi', 'Shandong',
       'Shanghai', 'Shanxi', 'Sichuan', 'Tianjin', 'Tibet', 'Xinjiang',
       'Yunnan', 'Zhejiang', nan, 'Taiwan', 'Seattle, WA', 'Chicago, IL',
       'Tempe, AZ', 'Macau', 'Hong Kong', 'Toronto, ON',
       'British Columbia', 'Orange, CA', 'Los Angeles, CA',
       'New South Wales', 'Victoria', 'Queensland', 'London, ON',
       'Santa Clara, CA', 'South Australia', 'Boston, MA',
       'San Benito, CA', 'Madison, WI', 'Diamond Princess cruise ship',
       'San Diego County, CA', 'San Antonio, TX'], dtype=object)
In [11]:
def label_race (row):
    if row['Province/State'] == 'Anhui' :
        return 62.0
    if row['Province/State'] == 'Beijing' :
        return 21.54
    if row['Province/State'] == 'Chongqing':
        return 30.48
    if row['Province/State'] == 'Fujian':
        return 38.56
    if row['Province/State']  == 'Gansu':
        return 25.58
    if row['Province/State']  == 'Guangdong':
        return 113.46 
    if row['Province/State']  == 'Guangxi':
        return 48.38 
    if row['Province/State']  == 'Guizhou':
        return 34.75 
    if row['Province/State']  == 'Hainan':
        return 9.3
    if row['Province/State']  == 'Hebei':
        return 74.7
    if row['Province/State']  == 'Heilongjiang':
        return 38.31
    if row['Province/State']  == 'Henan':
        return 94 
    if row['Province/State']  == 'Hubei':
        return 58.5
    if row['Province/State']  == 'Hunan':
        return 67.37
    if row['Province/State']  == 'Inner Mongolia':
        return 24.71
    if row['Province/State']  == 'Jiangsu':
        return 80.4
    if row['Province/State']  == 'Jiangxi':
        return 45.2
    if row['Province/State']  == 'Jilin':
        return 27.46
    if row['Province/State']  == 'Liaoning':
        return 43.9 
    if row['Province/State']  == 'Ningxia':
        return 6.3
    if row['Province/State']  == 'Qinghai':
        return 5.6 
    if row['Province/State']  == 'Shaanxi':
        return 37.33
    if row['Province/State']  == 'Shandong':
        return 90.0
    if row['Province/State']  == 'Shanghai':
        return 24.24
    if row['Province/State']  == 'Shanxi':
        return 36.5
    if row['Province/State']  == 'Sichuan':
        return 81.1
    if row['Province/State']  == 'Tianjin':
        return 15.0 
    if row['Province/State']  == 'Tibet':
        return 3.2
    if row['Province/State']  == 'Xinjiang':
        return 22.0
    if row['Province/State']  == 'Yunnan':
        return 46.0
    if row['Province/State']  == 'Zhejiang':
        return 57.4
    if row['Province/State']  == 'Taiwan':
        return 23.78
    if row['Province/State']  == 'Seattle, WA':
        return 0.72
    if row['Province/State']  == 'Chicago, IL':
        return 2.7
    if row['Province/State']  == 'Tempe, AZ':
        return 0.185
    if row['Province/State']  == 'Macau':
        return 0.62
    if row['Province/State']  == 'Hong Kong':
        return 7.4
    if row['Province/State']  == 'Toronto, ON':
        return 2.93
    if row['Province/State']  == 'British Columbia':
        return 5.07
    if row['Province/State']  == 'Orange, CA':
        return 0.14
    if row['Province/State']  == 'Los Angeles, CA':
        return 4.0
    if row['Province/State']  == 'New South Wales':
        return 7.54
    if row['Province/State']  == 'Victoria':
        return 6.4
    if row['Province/State']  == 'Queensland':
        return 5.07
    if row['Province/State']  == 'London, ON':
        return 0.4
    if row['Province/State']  == 'Santa Clara, CA':
        return 0.127
    if row['Province/State']  == 'South Australia':
        return 1.677
    if row['Province/State']  == 'Boston, MA':
        return 0.685
    if row['Province/State']  == 'San Benito, CA':
        return 0.06
    if row['Province/State']  == 'Madison, WI':
        return 0.255
    if row['Province/State']  == 'Diamond Princess cruise ship':
        return 4/1000
    
    # Below are countries without going to specific cities
    if row['Country/Region']  == 'Thailand':
        return 69.04
    if row['Country/Region']  == 'Japan':
        return 126.8
    if row['Country/Region']  == 'South Korea':
        return 51.47
    if row['Country/Region']  == 'Singapore':
        return 5.6
    if row['Country/Region']  == 'Vietnam':
        return 95.54
    if row['Country/Region']  == 'France':
        return 67.0
    if row['Country/Region']  == 'Nepal':
        return 29.3
    if row['Country/Region']  == 'Malaysia':
        return 31.62
    if row['Country/Region']  == 'Cambodia':
        return 16.01
    if row['Country/Region']  == 'Sri Lanka':
        return 21.44
    if row['Country/Region']  == 'Germany':
        return 82.79
    if row['Country/Region']  == 'Finland':
        return 5.5
    if row['Country/Region']  == 'United Arab Emirates':
        return 9.4
    if row['Country/Region']  == 'Philippines':
        return 104.9
    if row['Country/Region']  == 'India':
        return 1339
    if row['Country/Region']  == 'Italy':
        return 60.48
    if row['Country/Region']  == 'UK':
        return 66.44
    if row['Country/Region']  == 'Russia':
        return 144.5
    if row['Country/Region']  == 'Sweden':
        return 10.12
    if row['Country/Region']  == 'Spain':
        return 46.66
    if row['Country/Region']  == 'Belgium':
        return 11.4
        
    return 10
In [12]:
full_table.loc[full_table['Province/State'].isna()]['Country/Region'].unique()
Out[12]:
array(['Thailand', 'Japan', 'South Korea', 'Singapore', 'Vietnam',
       'France', 'Nepal', 'Malaysia', 'Cambodia', 'Sri Lanka', 'Germany',
       'Finland', 'United Arab Emirates', 'Philippines', 'India', 'Italy',
       'UK', 'Russia', 'Sweden', 'Spain', 'Belgium', 'Egypt', 'Iran'],
      dtype=object)

Create a new column for the population of the country

In [13]:
full_table['population'] = full_table.apply (lambda row: label_race(row), axis=1)
In [14]:
# filling missing values with 0 in columns ('Confirmed', 'Deaths', 'Recovered')
full_table[['Confirmed', 'Deaths', 'Recovered']] = full_table[['Confirmed', 'Deaths', 'Recovered']].fillna(0)
full_table[['Province/State']] = full_table[['Province/State']].fillna('NA')

# cases in the Diamond Princess cruise ship
ship = full_table[full_table['Province/State']=='Diamond Princess cruise ship']

# full table
full_table = full_table[full_table['Province/State']!='Diamond Princess cruise ship']
full_table.head()
Out[14]:
Province/State Country/Region Lat Long Date Confirmed Deaths Recovered population
0 Anhui Mainland China 31.82571 117.2264 2020-01-22 1 0 0 62.00
1 Beijing Mainland China 40.18238 116.4142 2020-01-22 14 0 0 21.54
2 Chongqing Mainland China 30.05718 107.8740 2020-01-22 6 0 0 30.48
3 Fujian Mainland China 26.07783 117.9895 2020-01-22 1 0 0 38.56
4 Gansu Mainland China 36.06110 103.8343 2020-01-22 0 0 0 25.58

Severity computed by No.Confirmed / Population

In [15]:
full_table['Severity'] = full_table['Confirmed']/full_table['population']
In [16]:
# derived dataframes
china = full_table[full_table['Country/Region']=='Mainland China']
row = full_table[full_table['Country/Region']!='Mainland China']

full_latest = full_table[full_table['Date'] == max(full_table['Date'])].reset_index()
china_latest = full_latest[full_latest['Country/Region']=='Mainland China']
row_latest = full_latest[full_latest['Country/Region']!='Mainland China']

full_latest_grouped = full_latest.groupby('Country/Region')['Confirmed', 'Deaths', 'Recovered','population'].sum().reset_index()
china_latest_grouped = china_latest.groupby('Province/State')['Confirmed', 'Deaths', 'Recovered','population'].sum().reset_index()
row_latest_grouped = row_latest.groupby('Country/Region')['Confirmed', 'Deaths', 'Recovered','population'].sum().reset_index()
In [17]:
temp = full_latest.groupby(['Country/Region', 'Province/State'])['Confirmed', 'Deaths', 'Recovered','population'].max()
temp['severity'] = temp.Confirmed/temp.population
temp.drop(columns='population', inplace = True)
temp.style.background_gradient(cmap='Pastel1_r')
Out[17]:
Confirmed Deaths Recovered severity
Country/Region Province/State
Australia New South Wales 4 0 4 0.530504
Queensland 5 0 0 0.986193
South Australia 2 0 2 1.19261
Victoria 4 0 4 0.625
Belgium NA 1 0 1 0.0877193
Cambodia NA 1 0 1 0.062461
Canada British Columbia 5 0 0 0.986193
London, ON 1 0 1 2.5
Toronto, ON 2 0 0 0.682594
Egypt NA 1 0 0 0.1
Finland NA 1 0 1 0.181818
France NA 12 1 4 0.179104
Germany NA 16 0 12 0.19326
Hong Kong Hong Kong 63 2 5 8.51351
India NA 3 0 3 0.00224048
Iran NA 2 2 0 0.2
Italy NA 3 0 0 0.0496032
Japan NA 84 1 18 0.662461
Macau Macau 10 0 5 16.129
Mainland China Anhui 986 6 413 15.9032
Beijing 393 4 145 18.2451
Chongqing 560 5 274 18.3727
Fujian 293 0 112 7.59855
Gansu 91 2 65 3.55747
Guangdong 1331 5 606 11.731
Guangxi 244 2 86 5.04341
Guizhou 146 2 70 4.20144
Hainan 168 4 84 18.0645
Hebei 306 4 152 4.09639
Heilongjiang 470 12 120 12.2683
Henan 1262 19 573 13.4255
Hubei 62031 2029 10337 1060.36
Hunan 1008 4 561 14.9621
Inner Mongolia 75 0 10 3.03521
Jiangsu 631 0 318 7.84826
Jiangxi 934 1 362 20.6637
Jilin 90 1 37 3.27749
Liaoning 121 1 55 2.75626
Ningxia 71 0 42 11.2698
Qinghai 18 0 16 3.21429
Shaanxi 242 0 102 6.48272
Shandong 544 3 231 6.04444
Shanghai 333 2 186 13.7376
Shanxi 131 0 68 3.58904
Sichuan 514 3 188 6.33785
Tianjin 130 3 54 8.66667
Tibet 1 0 1 0.3125
Xinjiang 76 1 20 3.45455
Yunnan 172 1 60 3.73913
Zhejiang 1174 0 604 20.453
Malaysia NA 22 0 15 0.695762
Nepal NA 1 0 1 0.0341297
Philippines NA 3 1 1 0.0285987
Russia NA 2 0 2 0.0138408
Singapore NA 84 0 34 15
South Korea NA 31 0 12 0.602293
Spain NA 2 0 2 0.0428633
Sri Lanka NA 1 0 1 0.0466418
Sweden NA 1 0 0 0.0988142
Taiwan Taiwan 23 1 2 0.967199
Thailand NA 35 0 15 0.506952
UK NA 9 0 8 0.135461
US Boston, MA 1 0 0 1.45985
Chicago, IL 2 0 2 0.740741
Los Angeles, CA 1 0 0 0.25
Madison, WI 1 0 0 3.92157
Orange, CA 1 0 0 7.14286
San Antonio, TX 1 0 0 0.1
San Benito, CA 2 0 0 33.3333
San Diego County, CA 2 0 0 0.2
Santa Clara, CA 2 0 0 15.748
Seattle, WA 1 0 1 1.38889
Tempe, AZ 1 0 0 5.40541
United Arab Emirates NA 9 0 4 0.957447
Vietnam NA 16 0 7 0.167469

Visualization

Plot for World-wide Severity

In [18]:
import folium

m = folium.Map(location=[0,0], tiles='cartodbpositron',
               min_zoom=1, max_zoom=4, zoom_start=1)

for i in range(len(full_latest)):
    folium.Circle(
                location=[full_latest.iloc[i]['Lat'], full_latest.iloc[i]['Long']],
                color='crimson', 
                tooltip =   '<li><bold>Country : '+str(full_latest.iloc[i]['Country/Region'])+
                            '<li><bold>Province : '+str(full_latest.iloc[i]['Province/State'])+
                            '<li><bold>Confirmed : '+str(full_latest.iloc[i]['Confirmed'])+
                            '<li><bold>Deaths : '+str(full_latest.iloc[i]['Deaths'])+
                            '<li><bold>Severity : '+str(full_latest.iloc[i]['Severity']),
                radius=int(full_latest.iloc[i]['Severity']*200)
    ).add_to(m)
m
Out[18]:

Progression in China

In [19]:
china_map = china.groupby(['Date', 'Province/State'])['Confirmed','Deaths','Severity', 'Lat','Long'].max()
china_map = china_map.reset_index()
china_map['size'] = china_map['Severity'].pow(0.5)
china_map['Date'] = pd.to_datetime(china_map['Date'])
china_map['Date'] = china_map['Date'].dt.strftime('%m/%d/%Y')

fig = px.scatter_geo(china_map, lat='Lat', lon='Long', scope='asia',
                     color="size", size='size', hover_name='Province/State', 
                     hover_data=['Confirmed', 'Deaths', 'Severity'],
                     animation_frame="Date", 
                     title='Spread in China over time')
fig.update(layout_coloraxis_showscale=False)
fig.show()

Severity within China

In [20]:
fig = px.treemap(china_latest.sort_values(by='Confirmed', ascending=False).reset_index(drop=True), 
           path=["Province/State"], values="Confirmed", title='Number of Confirmed Cases in Chinese Provinces')
fig.show()
fig = px.treemap(row_latest, path=["Country/Region"], 
                 values="Confirmed", title='Number of Confirmed Cases outside china')
fig.show()

China provinces vs Singapore

In [27]:
df.dtypes
Out[27]:
Sno                 int64
Date               object
Province/State     object
Country            object
Last Update        object
Confirmed         float64
Deaths            float64
Recovered         float64
dtype: object
In [34]:
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].dt.strftime('%m/%d/%Y')
df['Date'] = pd.to_datetime(df['Date'])
df['Confirmed'] = df['Confirmed'].astype('int')
df['Deaths'] = df['Deaths'].astype('int')
df['Recovered'] = df['Recovered'].astype('int')

# some Chinese data are labelled as China and some are labelled as 'Mainland China'
df['Country'] = df['Country'].replace('Mainland China', 'China')
In [35]:
df['Country'].unique()
Out[35]:
array(['China', 'US', 'Japan', 'Thailand', 'South Korea', 'Hong Kong',
       'Macau', 'Taiwan', 'Singapore', 'Philippines', 'Malaysia',
       'Vietnam', 'Australia', 'Mexico', 'Brazil', 'France', 'Nepal',
       'Canada', 'Cambodia', 'Sri Lanka', 'Ivory Coast', 'Germany',
       'Finland', 'United Arab Emirates', 'India', 'Italy', 'Sweden',
       'Russia', 'Spain', 'UK', 'Belgium', 'Others', 'Egypt'],
      dtype=object)
In [36]:
df.dtypes
Out[36]:
Sno                        int64
Date              datetime64[ns]
Province/State            object
Country                   object
Last Update               object
Confirmed                  int64
Deaths                     int64
Recovered                  int64
dtype: object
In [37]:
len(df)
Out[37]:
1719
In [57]:
df_China = df.loc[df['Country'] == 'China']
df_China_without_Hubei = df_China.loc[df_China['Province/State'] != 'Hubei']
df_Singapore = df.loc[df['Country'] == 'Singapore']
In [58]:
df_China_grouped = df_China.groupby('Date')['Confirmed'].sum().reset_index()
df_China_without_Hubei_grouped = df_China_without_Hubei.groupby('Date')['Confirmed'].sum().reset_index()
df_Singapore_grouped = df_Singapore.groupby('Date')['Confirmed'].sum().reset_index()
In [ ]:
 
In [61]:
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=df_China_grouped.Date,
        y=df_China_grouped.Confirmed,
        name='China Confirmed cases',
        mode='lines+markers',
        marker_color = 'rgb(55, 83, 109)',
        hovertemplate =
        '<br><b>Date</b>: %{x} <br>' +
        '<b>Confirmed Cases:</b> %{y}<br>'
    )
)
fig.add_trace(
    go.Scatter(
        x=df_China_without_Hubei_grouped.Date,
        y=df_China_without_Hubei_grouped.Confirmed,
        name='China Confirmed cases without Hubei',
        mode='lines+markers',
        marker_color = 'rgb(26, 118, 255)',
        hovertemplate =
        '<br><b>Date</b>: %{x} <br>' +
        '<b>Confirmed Cases:</b> %{y}<br>'
    )
)

fig.update_traces(
    mode='lines+markers',
    marker_line_width=2,
    marker_size=5
)
fig.update_layout(
    title={'text': 'Confermed case in China and China without Hubei',
           'y':0.95,
           'x':0.5,
           'xanchor': 'center',
           'yanchor': 'top'},
    yaxis_zeroline=False,
    xaxis_zeroline=False,
    hoverlabel_align = 'left',
)

fig.show()
In [63]:
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=df_Singapore_grouped.Date,
        y=df_Singapore_grouped.Confirmed,
        name='China Confirmed cases in SG',
        mode='lines+markers',
        marker_color = 'rgb(55, 83, 109)',
        hovertemplate =
        '<br><b>Date</b>: %{x} <br>' +
        '<b>Confirmed Cases:</b> %{y}<br>'
    )
)

fig.update_traces(
    mode='lines+markers',
    marker_line_width=2,
    marker_size=5
)
fig.update_layout(
    title={'text': 'Confermed case in Singapore',
           'y':0.95,
           'x':0.5,
           'xanchor': 'center',
           'yanchor': 'top'},
    yaxis_zeroline=False,
    xaxis_zeroline=False,
    hoverlabel_align = 'left',
)

fig.show()

Time series prediction with Prophet

First Let's try Prophet without tuning any hyper-parameters

In [42]:
m_d = Prophet(
    yearly_seasonality=False,
    weekly_seasonality = False,
    daily_seasonality = False,
    seasonality_mode = 'additive')

df_China_without_Hubei_grouped.columns = ['ds','y']

m_d.fit(df_China_without_Hubei_grouped)
future_China_without_Hubei = m_d.make_future_dataframe(periods=7)
fcst_daily_China_without_Hubei= m_d.predict(future_China_without_Hubei)
INFO:fbprophet:n_changepoints greater than number of observations. Using 20.
In [43]:
# to quantify our prediction performance
def mean_absolute_percentage_error(y_true, y_pred): 
    """Calculates MAPE given y_true and y_pred"""
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
In [44]:
def plot_China_without_Hubei():
    trace1 = {
      "fill": None, 
      "mode": "markers", 
      "name": "actual no. of Confirmed", 
      "type": "scatter", 
      "x": df_China_without_Hubei_grouped.ds, 
      "y": df_China_without_Hubei_grouped.y
    }
    trace2 = {
      "fill": "tonexty", 
      "line": {"color": "#57b8ff"}, 
      "mode": "lines", 
      "name": "upper_band", 
      "type": "scatter", 
      "x": fcst_daily_China_without_Hubei.ds, 
      "y": fcst_daily_China_without_Hubei.yhat_upper
    }
    trace3 = {
      "fill": "tonexty", 
      "line": {"color": "#57b8ff"}, 
      "mode": "lines", 
      "name": "lower_band", 
      "type": "scatter", 
      "x": fcst_daily_China_without_Hubei.ds, 
      "y": fcst_daily_China_without_Hubei.yhat_lower
    }
    trace4 = {
      "line": {"color": "#eb0e0e"}, 
      "mode": "lines+markers", 
      "name": "prediction", 
      "type": "scatter", 
      "x": fcst_daily_China_without_Hubei.ds, 
      "y": fcst_daily_China_without_Hubei.yhat
    }
    data = [trace1, trace2, trace3, trace4]
    layout = {
      "title": "Confirmed - Time Series Forecast - Daily Trend", 
      "xaxis": {
        "title": "", 
        "ticklen": 5, 
        "gridcolor": "rgb(255, 255, 255)", 
        "gridwidth": 2, 
        "zerolinewidth": 1
      }, 
      "yaxis": {
        "title": "Confirmed nCov - Hubei", 
        "ticklen": 5, 
        "gridcolor": "rgb(255, 255, 255)", 
        "gridwidth": 2, 
        "zerolinewidth": 1
      }, 
    }
    fig = go.Figure(data=data, layout=layout)
    iplot(fig)
    max_date = df_China_without_Hubei_grouped.ds.max()
    y_true = df_China_without_Hubei_grouped.y.values
    y_pred_daily = fcst_daily_China_without_Hubei.loc[fcst_daily_China_without_Hubei['ds'] <= max_date].yhat.values
    print('MAPE : {}'.format(mean_absolute_percentage_error(y_true,y_pred_daily)))
    
    return
plot_China_without_Hubei()
MAPE : 45.150628872015574
In [45]:
m_d_SG = Prophet(
    yearly_seasonality=False,
    weekly_seasonality = False,
    daily_seasonality = False,
    seasonality_mode = 'additive')

df_Singapore_grouped.columns = ['ds','y']

m_d_SG.fit(df_Singapore_grouped)
future_Singapore_grouped = m_d_SG.make_future_dataframe(periods=7)
fcst_daily_Singapore_grouped= m_d_SG.predict(future_Singapore_grouped)
INFO:fbprophet:n_changepoints greater than number of observations. Using 19.
In [46]:
def plot_Singapore():
    trace1 = {
      "fill": None, 
      "mode": "markers", 
      "name": "actual no. of Confirmed", 
      "type": "scatter", 
      "x": df_Singapore_grouped.ds, 
      "y": df_Singapore_grouped.y
    }
    trace2 = {
      "fill": "tonexty", 
      "line": {"color": "#57b8ff"}, 
      "mode": "lines", 
      "name": "upper_band", 
      "type": "scatter", 
      "x": fcst_daily_Singapore_grouped.ds, 
      "y": fcst_daily_Singapore_grouped.yhat_upper
    }
    trace3 = {
      "fill": "tonexty", 
      "line": {"color": "#57b8ff"}, 
      "mode": "lines", 
      "name": "lower_band", 
      "type": "scatter", 
      "x": fcst_daily_Singapore_grouped.ds, 
      "y": fcst_daily_Singapore_grouped.yhat_lower
    }
    trace4 = {
      "line": {"color": "#eb0e0e"}, 
      "mode": "lines+markers", 
      "name": "prediction", 
      "type": "scatter", 
      "x": fcst_daily_Singapore_grouped.ds, 
      "y": fcst_daily_Singapore_grouped.yhat
    }
    data = [trace1, trace2, trace3, trace4]
    layout = {
      "title": "Confirmed - Time Series Forecast - Daily Trend", 
      "xaxis": {
        "title": "", 
        "ticklen": 5, 
        "gridcolor": "rgb(255, 255, 255)", 
        "gridwidth": 2, 
        "zerolinewidth": 1
      }, 
      "yaxis": {
        "title": "Confirmed nCov - Hubei", 
        "ticklen": 5, 
        "gridcolor": "rgb(255, 255, 255)", 
        "gridwidth": 2, 
        "zerolinewidth": 1
      }, 
    }
    fig = go.Figure(data=data, layout=layout)
    iplot(fig)
    max_date = df_Singapore_grouped.ds.max()
    y_true = df_Singapore_grouped.y.values
    y_pred_daily = fcst_daily_Singapore_grouped.loc[fcst_daily_Singapore_grouped['ds'] <= max_date].yhat.values

    print('MAPE : {}'.format(mean_absolute_percentage_error(y_true,y_pred_daily)))
    return
    
plot_Singapore()
MAPE : 21.37323239253616

Prilimilary analysis

It doesn't look like really good prediction. For China cities the prediciton doesn't capture the going down trend of the change.

Now let's try to add in 'changepoint' features for more accurate modeling

In [65]:
m_d = Prophet(
    changepoint_range=0.85,
    changepoint_prior_scale=20,
    n_changepoints=19,
    yearly_seasonality=False,
    weekly_seasonality = False,
    daily_seasonality = False,
    seasonality_mode = 'additive')

df_China_without_Hubei_grouped.columns = ['ds','y']

m_d.fit(df_China_without_Hubei_grouped)
future_China_without_Hubei = m_d.make_future_dataframe(periods=7)
fcst_daily_China_without_Hubei= m_d.predict(future_China_without_Hubei)

plot_China_without_Hubei()
MAPE : 0.6625833464120447
In [66]:
m_d_SG = Prophet(
    changepoint_range=0.8,
    changepoint_prior_scale=19,
    n_changepoints=20,
    yearly_seasonality=False,
    weekly_seasonality = False,
    daily_seasonality = False,
    seasonality_mode = 'additive')

df_Singapore_grouped.columns = ['ds','y']

m_d_SG.fit(df_Singapore_grouped)
future_Singapore_grouped = m_d_SG.make_future_dataframe(periods=7)
fcst_daily_Singapore_grouped= m_d_SG.predict(future_Singapore_grouped)

plot_Singapore()
INFO:fbprophet:n_changepoints greater than number of observations. Using 19.
MAPE : 1.2321322095806737

A note on parameter tuning for Prophet

Note that here we didn't touch the seasonality

In [ ]: