Wrong application of Bayesian Inference #33

rufus210 · 2020-04-22T00:44:22Z

First disclaimer: I'm not a statistician or epidemiologist, I'm an engineer with a good feel for system modeling.

Second disclaimer: you're doing some great work and trying to get real-time estimates for Rt is extremely important.

I think you are making 1 assumption that is causing the statistics to behave badly - that you can take an approach that models a static R and with a small tweak make it model a time-varying Rt. Specifically the assumption that "we use yesterday's prior $P(R_{t-1})$ to estimate today's prior $P(R_t)$." This assumption only works for a static R, which is why your initial approach converged to a value of R=1. The initial approach of windowing or the new approach of adding Gaussian noise try to work around this issue without fixing it.

The underlying equation we're trying to estimate is: $\lambda = k_{t-1}e^{\gamma(R_0-NPI_t-1)}$

Where $R_t = R_0 - NPI_t$ and the value of $NPI_t$ is changing every day.

One traditional way to estimate a time-varying value like this would be a Kalman filter, which you mention in issue #19. However a Kalman filter can't be directly applied since it depends on matrixes to express a system of linear-quadratic equations, while our equation of interest has an exponent. I'm thinking maybe a Particle filters/Sequential Monte Carlo simulation is appropriate here? I'm going to try working something up.

I think this, in addition to using 7-day smoothed data, is why the 95% confidence bands are so narrow when there's still a lot of obvious statistical noise.

kmedved · 2020-04-22T02:11:22Z

Would an unscented kalman filter perhaps be appropriate given the non-linear nature of the problem @rufus210? Or is that going to have the same issue?

dsfulf · 2020-04-22T04:21:51Z

@rufus210 The reason the posterior converges is just that there is more data -- more observations lead to a narrower posterior. It has nothing to do with any assumptions made, it's a consequence of central limit theorem (more or less).

As a proof, here's the same basic calculation of $R_t$ applied to a parametric model. Note that $R_t$ reaches unity at the same date that the peak occurs. This indicates to me that the calculation itself is valid.

I think there is an issue with the posterior being too-narrow due to including all previous days in the likelihood function. Is it reasonable to say that days 1-10 really inform days 40-50? Most likely not.

cayleytorgeson · 2020-04-22T14:31:59Z

How does the parametric model work for the largest states (NY, CA, MI), or other countries with different testing protocols?

Nectarineimp · 2020-04-22T16:13:56Z

There is no static R. R contains a lot of unobserved variables. One, of course, if the ability of the virus to infect under contact circumstances. Some viruses are hard to become infected by, and some are easy. This one is easy. However, R is not only based upon that. A virus in a petri dish, in a bio-containment freezer, has an R of 0.00000 unless it can teleport. Human behavior accounts for the rest of the R value. The signal in the data is the is human behavior. I'm using this to compare the actions of populations that have had Liberty Protests against lockdown. I do like your idea of modeling lambda, however. Can you create a notebook with that? I'd be very curious to see how well it works.

dsfulf · 2020-04-22T17:16:33Z

@cayleytorgeson Can you be more specific with your question?

cayleytorgeson · 2020-04-22T17:31:42Z

@dsfulf what happens if you run your parametric model for NY, CA, MI, FL, KY, RI, MA, South Korea, UK, Sweden, Italy, etc.?

Seeing those plots would be very interesting in understanding how robust they are to different testing practices.

dsfulf · 2020-04-22T19:17:58Z

@cayleytorgeson I haven't done that, it's not an unsupervised model because it's parametric. Priors are required to total cases and CFR. I used a range of estimates from various epidemiological models for the priors, which aren't are common to find for subsets of the total population. If you'd like to run it for those entities, you can get the notebook here.

Neon22 · 2020-04-23T09:33:57Z

the data has been run against countries here (non parametric):
https://github.com/sanzgiri/covid-19/blob/master/Realtime_R0_by_country.ipynb

montecel-di · 2020-06-23T04:22:14Z

I disagree I think that this is the perfect example of bayesian inference, the requirement isnt that R0(t) converges to 1 its that the mean of R0(t) converges to over the duration of the pandemic R0 is not a property of the solely the virus but of the virus and how people react... this method lets you observe that... the question i have is about the 7 day moving average, first is the fact that it will be delayed.. I wrote an implimentation in R of this method but instead I used LOESS smoother although reading the comments a Kaleman filter might be better(thanks for the suggestion), second my objective was targeted at the county level which presents is own challenges since there is less data. The maximum likelihood estimator was used to find the hyper parameters on the state level... there is way to much diversity on the county level to be able to assign a single sigma that would sufficiently work across all the counties so I run a non linear optimization on sigma optimize the log likelihood... I was originally running it on both the span of the LOESS and sigma but found they are basically independent and since there are 3.5K counties running them in series speeds this up... still takes 5 hours with parallel processing though... if anyone sees this let me know your thoughts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong application of Bayesian Inference #33

Wrong application of Bayesian Inference #33

rufus210 commented Apr 22, 2020

kmedved commented Apr 22, 2020

dsfulf commented Apr 22, 2020

cayleytorgeson commented Apr 22, 2020

Nectarineimp commented Apr 22, 2020

dsfulf commented Apr 22, 2020

cayleytorgeson commented Apr 22, 2020

dsfulf commented Apr 22, 2020

Neon22 commented Apr 23, 2020

montecel-di commented Jun 23, 2020

Wrong application of Bayesian Inference #33

Wrong application of Bayesian Inference #33

Comments

rufus210 commented Apr 22, 2020

kmedved commented Apr 22, 2020

dsfulf commented Apr 22, 2020

cayleytorgeson commented Apr 22, 2020

Nectarineimp commented Apr 22, 2020

dsfulf commented Apr 22, 2020

cayleytorgeson commented Apr 22, 2020

dsfulf commented Apr 22, 2020

Neon22 commented Apr 23, 2020

montecel-di commented Jun 23, 2020