AI4I-COVID-Python

Artificial Intelligence for Industry's project on Italian COVID-19 dataset.

In this project we explored the potential and limitations of Bayesian melding, a statistical technique which fits the input parameters of a deterministic function, according to stochastic observations.

The general idea behind the method is to merge different "opinions" about an observed phenomenon via statistical pooling:

A prior probability on the outputs of the model ("what may be reasonable to happen")
An induced probability computed by applying the deterministic model to some input prior distribution ("what we expect to observe according to the model")
A likelihood probability on the inputs ("what we know has happened")
A likelihood probability on the outputs ("what we actually observe").

In order to correctly apply the pooling operation, the model needs to be inverted. Since this is seldom possible, pooling is approximated with the SIR (sampling importance-resampling, not to be confused with the susceptible-infected-removed model, also used in this repository) algorithm:

Extract a large number of random samples from the input prior distribution
Weight each sample $\Theta_i$ according to $w_i = (\frac{q_2(M(\Theta_i))}{q_1^*(M(\Theta_i))})^{1-\alpha} L_1(\Theta_i) L_2(M(\Theta_i))$ , where:
- $M(\Theta_i)$ is the output of the model applied to $\Theta_i$
- $\alpha$ is the pooling factor (usually 0.5)
- $q_2(M(\Theta_i))$ is the output prior
- $q_1^*(M(\Theta_i))$ is the induced output posterior, ie. the output distribution computed applying the input distribution to the model; it can be estimated by applying the model to each sample and then performing a kernel density estimation with a Gaussian kernel
- $L_1(\Theta_i)$ is the input likelihood
- $L_2(M(\Theta_i))$ is the output likelihood
Extract a small subset of samples, but this time use the computed weights instead of the prior distribution
The distribution on the resampled weights is an approximation of the true input distribution and the usual operations can be performed on it (eg. extract mean to fit the model to the data and variance to determine confidence).

Bayesian melding was applied to three different epidemiological models:

SIR: Susceptible-infected-removed
SIRD: Susceptible-infected-recovered-deceased
SEIRD: Susceptible-exposed-infected-recovered-deceased, extended with hidden E compartment and reinfection rate.

Due to step 1. being very slow and the curse of dimensionality (especially for SEIRD), we also tried to perform deterministic seeding in order to reduce the search space, with limited success.

Slides' beamer template was forked from UniBO beamer and modified for the AI course at DISI.

Authors: G. Tsiotas, L.S. Lorello.

We also maintain a public dataset of Italian regions' colors at: https://github.com/tsiotas/covid-19-zone.

This dataset is updated every day and contains the colors of each region, starting from November, 6th, 2020 (the first day in which the Government decided to apply a color-based scheme).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
slides_src		slides_src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bayesian-melding.pdf		bayesian-melding.pdf
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI4I-COVID-Python

About

Releases

Packages

Languages

LIA-UniBo/AI4I-COVID-Python

Folders and files

Latest commit

History

Repository files navigation

AI4I-COVID-Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages