-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
188 lines (127 loc) · 7.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# conmat
<!-- badges: start -->
[![R-CMD-check](https://github.com/idem-lab/conmat/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/idem-lab/conmat/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/idem-lab/conmat/branch/master/graph/badge.svg)](https://codecov.io/gh/idem-lab/conmat?branch=master)
<!-- badges: end -->
The goal of conmat is to make it easy to generate synthetic contact matrices for a given age population.
**What is a contact matrix?**
Contact matrices describe the degree of contact between individuals of given age groups.
For example, this matrix describes the number of contacts between individuals
```{r}
#| label: show-contact
#| echo: FALSE
name_vec <- c("0-4", "5-9", "10-14")
cmat <- matrix(
data = rep(NA, 9),
nrow = 3,
ncol = 3,
byrow = TRUE,
dimnames = list(
name_vec,
name_vec
)
)
diag(cmat) <- c(10, 11, 13)
cmat[upper.tri(cmat)] <- 3:5
cmat[lower.tri(cmat)] <- 3:5
cmat
```
The rows and columns represent the age groups of the people. On the main diagonal we see that we have a higher number of contacts - showing that people of similar ages tend to interact more with one another.
We can use the information in these matrices to model how diseases such as COVID-19 spread in a population through social contact.
**Why do we need _synthetic_ contact matrices?**
Contact matrices are produced from empirical data resulting from a contact survey, which requires individuals to diary the amount and manner of contact a person has in a day.
However, these surveys are highly time-consuming and expensive to run, meaning that only a handful of these empirical datasets exist globally.
We can use statistical methods to create _synthetic contact matrices_, which are new contact matrices that have been generalised to new countries based on existing surveys.
**Why do we need `conmat`?**
Existing methods only provide outputs of the contact matrices for each country, or at best, for urban and rural areas for a given country.
We need methods that allow for flexibly creating synthetic contact matrices for a specified age population, as the age population distribution of many countries (e.g., Australia), are quite heterogeneous, and assuming it is homogeneous would result in inaccurate representation of community infection in many regions.
## Installation
You can install the development version with:
```r
install.packages("conmat", repos = "https://idem-lab.r-universe.dev")
```
Or alternatively you can use `remotes`
``` r
# install.packages("remotes")
remotes::install_github("idem-lab/conmat")
```
## Example
First we want to fit the model to the POLYMOD data, which contains various survey and population data.
```{r get-polymod}
library(conmat)
polymod_contact_data <- get_polymod_contact_data(setting = "work")
polymod_survey_data <- get_polymod_population()
```
The contact data is a data frame containing the age from and to, and the number of contacts for each of the specified settings, "home", "work", "school", "other", or "all" as well as the number of participants. By default, `polymod_contact_data` contains data from "all", but we're going to use the "work" set of data, as it produces an interesting looking dataset. Each row contains survey information of the number of contacts. Specifically, the number of contacts from one age group to another age group, and then the number of participants in that age group.
The survey data, `polymod_survey_data` contains the lower age limit and the population in that age group.
```{r show-polymod}
polymod_survey_data
```
## Predicting the contact rate
We can create a model of the contact *rate* with the function `fit_single_contact_model`
```{r fit-polymod}
set.seed(2022 - 09 - 06)
contact_model <- fit_single_contact_model(
contact_data = polymod_contact_data,
population = polymod_survey_data
)
```
This fits a generalised additive model (GAM), predicting the contact rate, based on a series of prediction terms that describe various features of the contact rates.
```{r show-conctact-model}
contact_model
```
We can use this contact model to then predict the contact rate in a new population.
As a demonstration, let's take an age population from a given LGA in Australia (this was the initial motivation for the package, so there are some helper functions for Australian specific data).
```{r fairfield}
fairfield <- abs_age_lga("Fairfield (C)")
fairfield
```
We can then pass the contact model through to `predict_contacts`, along with the fairfield age population data, and some age breaks that we want to predict to.
```{r predict-contacts}
set.seed(2022 - 09 - 06)
synthetic_contact_fairfield <- predict_contacts(
model = contact_model,
population = fairfield,
age_breaks = c(seq(0, 85, by = 5), Inf)
)
synthetic_contact_fairfield
```
## Plotting
Let's visualise the matrix to get a sense of the predictions with `autoplot`. First we need to transform the predictions to a matrix:
```{r plot-matrix-differents}
synthetic_contact_fairfield %>%
predictions_to_matrix() %>%
autoplot()
```
## Applying the model across all settings.
You can also fit a model for all of the settings all at once with a series of functions, `fit_setting_contacts`, and `predict_setting_contacts`. This means we can do the above, but for each setting, "home", "work", "school", "other", and "all". We would recommend this when using conmat, as it is a pretty common use case. However for demonstration purposes we wanted to show how it works for a single matrix here first. We also provide details on how to fit the model to each of these settings in parallel. For more details on that workflow, see the "getting started" vignette.
## Data sources
This package provides data and helper functions for the data, for use in calculating contact matrices. The data sources are from the Australian Bureau of Statistics (ABS), as we were using these a lot when we created the package. In the future we might wrap these data sources and helpers into another package, but for the time being they are here. Below are a couple of examples of data provided, see the "data sources" vignette and helpful at the website for full details.
You can extract the age population structure for the LGA, Brisbane, like so:
```{r abs-age-lga}
abs_age_lga("Brisbane (C)")
```
Note that you need to use the exact LGA name - you can look up LGA names in the data set `abs_lga_lookup`:
```{r abs-lga-lookup}
abs_lga_lookup
```
Or get the information for states like so:
```{r abs-age-state}
abs_age_state(state_name = "QLD")
```
## Note
The contact matrices created using this package are transposed when compared to the contact matrices discussed by [Prem](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005697) and [Mossong](https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050074). That is, the rows are "age group to", and the columns are "age group from".
## Code of Conduct
Please note that the conmat project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.