# 37. Estimation of Corona cases with Python and Pandas

What is the real number of infected peoply by the corona virus. What we learn in the news is just the number of positively tested people. Unfortunately, it is technically not possible to test everybody with possible corona virus symptoms. For example we learned that on the 18th of March we had 12327 known cases in Germany. An extremely important question is "How many unknown cases do we have?".

This is what this page is about. I try to estimate this number or better to calculate some possible boundaries for this number.

I started doing this project for myself. I didn't intend to publish it. Everyday the new data on the corona virus cases rains down on me by radio and TV. Day by day and in many cases only comparisons with the previous day. So I started collecting the data on a day to day basis.

But I am sure that I made you curious. You want to know now this "dark" number, the number of total cases, known plus unknown? What we hear about the severity of the disease, i.e. the death rate is also just a calculation on the known cases. So we have the division of the number of people who died due to the virus and the number of the known cases. The death rate given and based on known cases varies between 2.5 % and 5 %. The higher the number of unknown cases is the smaller the real death rate will be. I assumed that the "real" death rate might be 0.7, like a seasonal flu. Be aware: This is just an assumption, not covered by any scientific data, but anyway quite likely. I further assume that the avarage duration of an infection is about 18 days, the total number of infections in Germany on the 18th of March is about 280000 cases compared to 12327 cases. This means about 23 times higher than the known figure!

If you like to follow my calculations based on Python and Pandas, you can continue reading here:

import pandas as pd

xf = pd.ExcelFile("/data1/coronacases.xlsx")
df = xf.parse("Sheet1",
skiprows=2,
index_col=0,
names=["dates", "cases", "active", "deaths"])

df[["cases", "active", "deaths"]].plot()


### OUTPUT:

<matplotlib.axes._subplots.AxesSubplot at 0x7f75331394d0>

The number of cases we here on the media corresponds to the accumulated day by day newly infected people. The number of "active" cases is the number without those who are healed or unfortunately died. The following number shows the ratio of those. As long as nobody had died, the death ratio was 0 %.

discharged = df["cases"] - df["active"]

if "healed" in df.columns:
df.drop("healed", axis=1, inplace=True)
if "healed" not in df.columns:
df.insert(loc=len(df.columns),
column="healed",
value=(discharged -df["deaths"]))

healed_ratio = df["healed"] * 100 / discharged
healed_ratio
death_ratio = df["deaths"] * 100 / discharged
death_ratio

outcome = {"death_ratio": death_ratio, "healed_ratio": healed_ratio}
outcome_df = pd.DataFrame(outcome)
outcome_df
outcome_df.plot()


### OUTPUT:

<matplotlib.axes._subplots.AxesSubplot at 0x7f752e6a4c90>

In the following code, we do some curse fitting. We fit the value to the function

$$f = a \cdot e^{b \cdot x} + c$$

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def growth_func(x, a, b, c):
return a * np.exp(b * x) + c

Y = df.cases.values
X = np.arange(0, len(Y))
popt, pcov = curve_fit(growth_func, X, Y)

def growth(x):
return growth_func(x, *popt)

plt.figure()
plt.plot(X, Y, 'ko', label="Original Data")
plt.plot(X, growth(X), 'r-', label="Fitted Curve")
plt.legend()
plt.show()

#for i in range(1, 30):
#    print(df.cases.values[i], growth(i))


In the following code, I make some crucial assumption. Both by rule of thumb and not properly based on scientific data, because it is not available. The average duration of an infection 'days_infected_before_outcome' might be less or higher. Furthermore I made a deliberate decision to assume the real death rate, based on known and unknown cases, is 0.7. This number corresponds to the seasonal flu.

days_infected_before_outcome = 14
assumed_real_death_rate = 2.9

def create_inverse_growth_func(a, b, c):
def inverse(x):
return np.log((x - c) / a) / b
return inverse

inverse_growth = create_inverse_growth_func(*popt)

# number of cases 'days_infected_before_outcome':
cases_days_infection_before = df["deaths"][-1] * 100 / assumed_real_death_rate
shift_days = inverse_growth(cases_days_infection_before)
shift_days -= (len(df) - days_infected_before_outcome)
print(shift_days)


### OUTPUT:

5.765157139549551

plt.figure()
X = np.arange(0, 65)
plt.plot(X,
growth_func(X + days_infected_before_outcome, *popt),
'r-',
label="Fitted Curve")
plt.legend()
plt.show()


print(len(df["cases"]))
x = np.arange(0, len(df["cases"]))
print(len(x))
if "real" in df.columns:
df.drop("real", axis=1, inplace=True)
df.insert(loc=len(df.columns),
column="real",
value=growth(x+shift_days).astype(np.int))
df


### OUTPUT:

cases active deaths healed real
dates
2020-02-17 16 9 0 7 3659
2020-02-18 16 7 0 9 3659
2020-02-19 16 7 0 9 3659
2020-02-20 16 3 0 13 3659
2020-02-21 16 2 0 14 3659
2020-02-22 16 2 0 14 3659
2020-02-23 16 2 0 14 3659
2020-02-24 16 2 0 14 3659
2020-02-25 18 3 0 15 3659
2020-02-26 26 11 0 15 3659
2020-02-27 18 32 0 -14 3659
2020-02-28 74 58 0 16 3659
2020-02-29 79 63 0 16 3659
2020-03-01 130 114 0 16 3659
2020-03-02 165 149 0 16 3659
2020-03-03 203 187 0 16 3659
2020-03-04 262 246 0 16 3659
2020-03-05 545 528 0 17 3659
2020-03-06 670 652 0 18 3659
2020-03-07 800 782 0 18 3659
2020-03-08 1040 1022 0 18 3659
2020-03-09 1224 1204 2 18 3659
2020-03-10 1565 1545 2 18 3659
2020-03-11 1966 1938 3 25 3659
2020-03-12 2745 2714 6 25 3659
2020-03-13 3675 3621 8 46 3660
2020-03-14 4599 4544 8 47 3662
2020-03-15 5813 5754 13 46 3666
2020-03-16 7272 7188 17 67 3679
2020-03-17 9367 9274 26 67 3713
2020-03-18 12327 12194 28 105 3805
2020-03-19 15320 15161 44 115 4057
2020-03-20 19848 19668 59 121 4741
2020-03-21 22364 22071 84 209 6600
2020-03-22 24873 24513 94 266 11653
2020-03-23 29056 28480 123 453 25388
2020-03-24 32986 29586 157 3243 62724
2020-03-25 37323 33570 206 3547 164213
2020-03-26 43938 37998 267 5673 440091
2020-03-27 50871 43862 351 6658 1190003
2020-03-28 57695 48781 433 8481 3228476
2020-03-29 62095 52359 525 9211 8769619
2020-03-30 66711 52566 645 13500 23832009
2020-03-31 70985 54479 682 15824 64775829
2020-04-01 77981 58350 931 18700 176072669
2020-04-02 84794 61247 1107 22440 478608844
2020-04-03 91159 65309 1275 24575 1300987426
2020-04-04 96108 68262 1446 26400 3536444161
2020-04-05 100123 69839 1584 28700 9613045529
2020-04-06 103375 72865 1819 28691 26130960462
50 50
df[["cases", "real"]].plot()


### OUTPUT:

<matplotlib.axes._subplots.AxesSubplot at 0x7f752d00e890>

Live Python training

Enjoying this page? We offer live Python training courses covering the content of this site.

Upcoming online Courses

17 Oct 2022 to 21 Oct 2022

19 Oct 2022 to 21 Oct 2022

Enrol here