Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assam NobBS Onset date calculation #29

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions NobBS-Experiments/Experiment1.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
library(jsonlite)
library(magrittr)
library(dplyr)
library(fGarch)
library(NobBS)

# Get raw patient data
raw_patient_df <- data.frame(fromJSON("https://api.covid19india.org/raw_data.json"))

# Get names of all districts with patients
district_wise_data <- fromJSON("https://api.covid19india.org/state_district_wise.json")
districts_list = list()
for(p in district_wise_data) { districts_list[names(p$districtData)] <- FALSE}
districts_list[["Unknown"]] <- NULL
districts_df = data.frame(names(districts_list))
colnames(districts_df) <- c("district")

# cleaning up patient data to remove tourist related data and retaining only the columns needed
patient_clean_df = merge(x=districts_df, y=raw_patient_df, by.x='district', by.y='raw_data.detecteddistrict')
patient_data <- patient_clean_df %>% select(district, raw_data.dateannounced, raw_data.gender)
patient_data_reliable <- patient_data %>% filter(raw_data.dateannounced >= "01/03/2020")

#### Calculation of onset date based on announcement date
num_patients <- nrow(patient_data_reliable)
# Using Skewed Gaussian with a mean of 14 days and standard deviation 1 to get the incubation time
incubation_times <- floor(rsnorm(num_patients, mean=14, sd=1, xi=-10))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
incubation_times <- floor(rsnorm(num_patients, mean=14, sd=1, xi=-10))
incubation_times <- floor(rsnorm(num_patients, mean=9, sd=5, xi=-10))

Mean of 9 days, sd of 5 and what does xi mean here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here was to use a Skewed Gaussian instead of Gaussian itself. My intuition came from the fact that in most cases, it takes 14 days for incubation (I could be wrong here). 'xi' is the skewness and making it -ve has made left skewed (meaning we expect more number of people to have an incubation period of 14. Attached is how the distribution looks for incubation periods I have used. Let me know if this makes sense, else, we can stick to normal curve with mean 9 and standard deviation 5
Incubation time graph

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the skewed Gaussian assumption makes sense, the 14 days number is debatable.
Here is my recommendation: Mean of 9 days with sd=5, with a skew towards 14 days.

Why this recommendation?

Incubation Period (IncubPeriod) has a best estimate of mean is 5 days with range from 3.8 to 9 days.

NobBS is looking for disease onset time IIRC, which is from incubation to symptoms. There is an unknown time from incubation to symptoms onset, we'll assume that to be zero -- since incubation time is often measured by first symptoms.

This is where things get interesting. The definition of symptoms varies by country to country. In France, loss of smell and taste is considered a symptom - while in S. Korea, even mild fever alone is a symptom.

What India would consider as symptom, would be typically severe infection symptoms in Wuhan/S. Korea. This variable DurMildInf's mean/median best estimate is 6 days with a range of 5 to 12 days.

Based on above, I'm simply making a guesstimate of sd and I'm okay with any value larger than 3 days. I'll let you make an informed guess for xi. Consider checking out the Korea CDC papers mention in the link below.

Source for numbers above: https://bit.ly/COVID19_Params

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Changed the mean and standard deviation to 9 and 4. Made adjustments for xi and for removing negative values that have come up in the distribution. The distribution looks as below now:
Incubation time graph

# Using Uniform Random sample in the interval of [1-3] to get the testing time
testing_time <- floor(runif(num_patients, min=1, max=4))
delay <- incubation_times + testing_time
onset_date <- as.Date(patient_data_reliable$raw_data.dateannounced, format ="%d/%m/%Y") - delay

# Create dataframe for all districts
report_date <- as.Date(patient_data_reliable$raw_data.dateannounced, format="%d/%m/%Y")
district <- patient_data_reliable$district
all_districts_data <- data.frame(district, onset_date, report_date)

NirantK marked this conversation as resolved.
Show resolved Hide resolved