-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
232 lines (176 loc) · 10.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# mlwdata: a repository of datasets for MLW population health in Malawi
<!-- badges: start -->
<!-- badges: end -->
`mlwdata` contains a set of annotated datasets that have been collected as part of research studies conducted at the Malawi-Liverpool-Wellcome Trust Clinical Research Programme in Blantyre, Malawi.
The main aims are to:
- Ensure consistent use of data between and within studies over time
- Facilitate sharing of data to reduce duplication of efforts
## Installation
You can install the from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("petermacp/mlwdata")
```
<br>
## Datasets
Currently, the following datasets are available:
**scale_72_clusters**
<br>
A `sf` MULTIPOLYGON object, containing polygon boundaries and cluster IDs for the SCALE Study, with variables:
- `cluster`: unique identifier for each of the 72 study clusters
- `geometery`: a list column, containing polygons for each cluster boundary.
<br>
**blantyre_clinics**
<br>
A `sf` POINT object, containing coordinates for Blantyre clinics/hospitals, and clinic IDs:
- `clinic`: unique name of each of the 18 clinics/hospitals
- `geometry`: a list column, containing points for each clinic/hospital.
<br>
**blantyre_census_2008_2018**
<br>
A `tibble`, containing age, sex and district-stratfied (Blantyre City, Blantyre Rural) population estimates from the 2008 and 2018 Malawi National Census. These data were provided by the Malwai National Statistics Office in October 2019.
- `District`: either Blantyre City, or Blantyre Rural (as classified in the Census)
- `Year`: census year (2008 or 2018)
- `Age`: age group of population estimates
- `Total`: total population
- `Male`: Male population
- `Female`: Female population
<br>
**blantyre_census_by_q**
<br>
A `tibble`, containing age, sex and district-stratfied (Blantyre City, Blantyre Rural) population, with linear interpolation by quarter, from the 2008 and 2018 Malawi National Census. These data were provided by the Malwai National Statistics Office in October 2019.
- `district`: either Blantyre City, or Blantyre Rural (as classified in the Census)
- `year`: census year (2008 or 2018)
- `age`: age group of population estimates
- `quarter`: quarter of the year
- `year_q`: concatenated year and quarter
- `sex`: male or female
- `population`: population estimate
<br>
**hiv_pops_blantyre_city**
<br>
A `tibble`, containing estimates of HIV prevalence for Blantyre City, stratified by age, sex and quarter between 2008 and 2018. Population estimates are from Malawi National Census 2008 and 2018 estimates. HIV prevalence estimates are from an HIV-prevalence survey conducted in 2014-15 in North West Blantyre. Age and Sex specific HIV-prevalence estimates were multiplied by population demoninators to obtain numbers of HIV-positive people per age-sex-quarter strata. (Note, estimates for adults [16+] and Blantyre City only, and *not* Blantyre Rural are provided).
- `year_q`: concatenated year and quarter
- `year`: census year (2008 or 2018)
- `quarter`: quarter of the year
- `sex`: male or female
- `age`: age group of population estimates
- `hiv_prev`: HIV prevalence in age-sex strata
- `population`: number of HIV-positive people in age-sex-quarter strata
<br>
**blantyre_tb_cases_2009_2018**
<br>
A `tibble`, containing numbers of TB cases notified in Blantyre TB registration centres between Q1 2009 and Q4 2018 by quarter, and stratified by active case finding area of the city (ACF vs. non-ACF), and microbiological status of cases
- `year_q`: Annual quarter
- `acf`: Area of Blantyre City (`ACF` = received active case finding intervention; `non-ACF` = didn't receive active case finding intervention)
- `tbcases`: Classification of TB diagnosis (`Smr/Xpert-positive` cases = cases that were either smear or xpert positive in testing by the routine clinic programme, or smear-positive by the research TB lab; `All cases` = all cases started on treatment, regardless of microbiological status)
- `n`: number of cases in category
<br>
**blantyre_tb_cases_2009_2018**
<br>
A `tibble`, containing anonymised individual-level TB cases notified in Blantyre TB registration centres between Q1 2011 and Q4 2018
- `unique_id`: Anonymised unique case ID
- `fac_code`: TB registration centre
- `reg_date`: Date on which TB case was registered for treatment
- `year`: Year of registration for TB treatment
- `quarter`: Quarter of registration for TB treatment
- `year_q`: Year and quarter of TB treatment registration
- `period`: Active case finding intervention period (`pre-ACF` = before ACF implemented; `ACF` = during ACF intervention; `post-ACF` = after ACF intervention implemented)
- `sex`: Sex of TB case (male or female)
- `age`: Age of TB case on day of treatment registration
- `agegp`: Age group of TB case (`1`= 0-4 years, `2`= 5-14 years, `3`= 15+ years)
- `acf`: Whether TB case's household was located in the ACF intervention area of Blantyre City (`ACF`), or the non-ACF area of Blantyre City `Non-ACF`
- `hiv`: HIV status of the TB case at TB treatment registration
- `art`: Was patient taking antiretroviral therapy for the treatment of HIV at the start of TB treatment?
- `smr_clinic`: Sputum smear status of TB case at TB registration, from sample collected and tested by the routine health system
- `smr_lab`: Sputum smear status of TB case at TB registration, from sample collected at treatment registration, and tested in the research TB lab at the College of Medicine, University of Malawi
- `xpert_clinic`: Sputum Xpert status of TB case at TB registration, from sample collected and tested by the routine health system. Note that Xpert was only reliably introduced into the programme from Q2 2015 onwards
- `tbtype`: Classified as either `Pulmonary TB` or `Extrapulmonary TB`
- `cgh_dur`: Duration of cough (in weeks) prior to TB treatment registration
- `tb_cat`: Category of TB patient (`New`, `Relapse`, `Retreatment after default`, `Retreatment after failure`, `Other`)
- `smr_any`: Whether a positive sputum smear result was obtained from either the routine health system or the study research lab sample
- `smr_xpert_any`: Whether a positive sputum smear result was obtained from either the routine health system or the study research lab sample, or a positive Xpert result was obtained from the routine health system.
<br>
**acf_cnrs_overall**
<br>
A `tibble`, containing TB case notification rates (all cases) per 100,000 population between Q1 2009 and q4 2018, stratifed by active case finding intervention area
- `year_q`: Annual quarter
- `acf`: Whether TB case's household was located in the ACF intervention area of Blantyre City (`ACF`), or the non-ACF area of Blantyre City `Non-ACF`
- `cases`: Number of registered TB cases per category
- `tbcases`: Classification of TB cases (microbiologically-confirmed or all cases)
- `population`: Total population per strata
- `q_population`: Total population per strata/4 (for CNR estimates)
- `cnr`: TB case notification rate, per 100,000 per strata
- `conf.low`: Lower bound of CNR 95% confidence interval
- `conf.high`: Upper bound of CNR 95% confidence interval
- `period`: Active case finding intervention period (`pre-ACF` = before ACF implemented; `ACF` = during ACF intervention; `post-ACF` = after ACF intervention implemented)
<br>
**acf_smrpos_cnrs_overall**
<br>
A `tibble`, containing TB case notification rates (smear or xpert-positive cases) per 100,000 population between Q1 2009 and q4 2018, stratifed by active case finding intervention area. Note cases are classified as Smear/Xpert positive if a positive sputum smear result was obtained from either the routine health system or the study research lab sample, or a positive Xpert result was obtained from the routine health system.
- `year_q`: Annual quarter
- `acf`: Whether TB case's household was located in the ACF intervention area of Blantyre City (`ACF`), or the non-ACF area of Blantyre City `Non-ACF`
- `cases`: Number of registered TB cases per category
- `tbcases`: Classification of TB cases (microbiologically-confirmed or all cases)
- `population`: Total population per strata
- `q_population`: Total population per strata/4 (for CNR estimates)
- `cnr`: TB case notification rate, per 100,000 per strata
- `conf.low`: Lower bound of CNR 95% confidence interval
- `conf.high`: Upper bound of CNR 95% confidence interval
- `period`: Active case finding intervention period (`pre-ACF` = before ACF implemented; `ACF` = during ACF intervention; `post-ACF` = after ACF intervention implemented)
<br>
## Examples
The SCALE Study defined 72 geographical cluster boundaries in urban Blantyre using GPS waypaths. These clusters can be loaded and plotted:
```{r example, message=FALSE, warning=FALSE, fig.width=11, fig.height=11}
library(mlwdata)
library(tidyverse)
glimpse(scale_72_clusters)
ggplot(scale_72_clusters) +
geom_sf() +
geom_sf_text(aes(label=cluster), colour="blue") +
theme_minimal() +
labs(title="SCALE Study",
subtitle="72 clusters",
x="",
y="",
caption = "Corbett, MacPherson et al.")
```
We could add on the location of the clinics in Blantyre.
```{r example2, message=FALSE, warning=FALSE, fig.width=11, fig.height=18}
library(sf)
library(ggrepel)
qech <- blantyre_clinics %>%
#filter to show QECH only for interest
filter(clinic=="Queen Elizabeth hospital") %>%
#split out the x and y coordinates to allow use of ggrepel
st_coordinates() %>%
as_tibble() %>%
mutate(clinic="QECH")
ggplot() +
geom_sf(data = scale_72_clusters) +
geom_sf_text(data = scale_72_clusters, aes(label=cluster), colour="blue") +
geom_sf(data=blantyre_clinics, shape=17, colour="#22211d") +
geom_label_repel(data = qech,
aes(x=X, y=Y, label=clinic),
fontface = "bold",
min.segment.length = 0) +
theme_minimal() +
labs(title="SCALE Study",
subtitle="72 clusters. Clinics/hospitals labelled with triangles",
x="",
y="",
caption = "Corbett, MacPherson et al.") +
coord_sf()
```