Skip to content

Commit 1b10a05

Browse files
committed
Tidy readme
1 parent 0930c5d commit 1b10a05

File tree

1 file changed

+111
-82
lines changed

1 file changed

+111
-82
lines changed

README.md

+111-82
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,116 @@
11
# Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic review and modeling study
22

3-
This is the code associated with the paper
4-
*Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic review and modeling study*,
5-
(https://www.medrxiv.org/content/10.1101/2022.09.08.22279731v3),
6-
currently in press at Eurosurveillance.
3+
This repository contains the code and data required to reproduce the results
4+
of the paper "Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic
5+
review and modelling study", Euro Surveill. 2023;28(21).
6+
(https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.21.2200809).
7+
The paper analyzes how the sensitivity of SARS-CoV-2 serological assays
8+
changes over time since infection, and how this change is influenced by
9+
assay characteristics.
10+
11+
## Reproducing the analysis
12+
13+
In the **code** directory, there are scripts that preprocess the data,
14+
fit the models, and analyze the results. The code is written in R
15+
and the Bayesian models are written in Stan.
16+
17+
The scripts in the **code** directory are led by a number,
18+
indicating the order in which they should be run to fully
19+
reproduce the analysis. Scripts 01 to 03 preprocess the
20+
data, scripts 04 and 05 fit the main models in the
21+
paper (sensitivity vs time), and scripts 06 to
22+
11 are controls and additional analyses in the paper.
23+
24+
It is not necessary to run all scripts to reproduce the analysis,
25+
as the data files in data/processed_data allow to jump
26+
straight into running scripts 04 and 05.
27+
28+
The main outputs of scripts 04 and 05 are also included
29+
in the data/analysis_results directory, so that the
30+
figures and statistics in the paper can be reproduced
31+
without running the analysis. The main figures of the
32+
paper can be reproduced by running the script
33+
**code/plotting_tabulating_scripts/plot_sensitivity_profiles.R**.
34+
35+
36+
## Analysis code
37+
38+
Each script includes a description at the beggining.
39+
The scripts are numbered so as to be run in order.
40+
Paths are built so as to be ran from the directory where
41+
a script file is located.
42+
43+
Summary of the scripts in the ****code**** directory:
44+
45+
**01_death_dynamics_table.R**: (Preprocessing) Builds the
46+
data/raw_data/death_dynamics.csv file by putting together different
47+
case and death data sources.
48+
49+
**02_estimate_seroreversion_delays.R**: (Preprocessing) Uses the data in
50+
data/raw_data/death_dynamics.csv to estimate the delays between
51+
diagnosis and serology testing for the data in **PCR_to_serotest_unknown.csv**
52+
53+
**03_organize_seroreversion_data.R**: (Preprocessing) Tidies the
54+
data and does som extra pre-processing. The output file of this
55+
script is **PCR_to_serotest_all.csv**, the input to the
56+
statistical analysis.
57+
58+
**04_average_sensitivity_analysis.R**: (Analysis) Fits a hierarchical
59+
Bayesian regression to the data in PCR_to_serotest_all.csv, without
60+
taking into account assay characteristics. Outputs files to
61+
data/analysis_results with descriptions of the fitted model
62+
63+
**04_bis_average_sensitivity_analysis_CV.R**: (Analysis) Fits the
64+
same model as the previous file, but for doing cross-validation.
65+
Outputs the cross-validation results, but no model summary.
66+
67+
**05_characteristics_sensitivity_analysis.R**: (Analysis) Fits a hierarchical
68+
Bayesian regression to the data in PCR_to_serotest_all.csv. Unlike script
69+
04, it includes an effect for the different test characteristics.
70+
71+
**05_bis_characteristics_sensitivity_analysis_CV.R**: (Analysis) Fits a
72+
hierarchical Bayesian model like the script above, but for
73+
doing cross-validation. Only outputs the CV results, and not a
74+
model summary.
75+
76+
**06_positive_slope_analysis.R**: (Analysis) Fits a model with
77+
two slopes, an early slope and a later slope. It does so on
78+
a small set of tests that show positive slopes in the main analysis.
79+
80+
**07_manufacturer_comparison.R**: (Analysis) Compares the results from
81+
previous model fittings to manufacturer reported sensitivities.
82+
83+
**08_characteristics_analysis_known_times.R**: (Analysis) Does the
84+
same as script 05, but for excluding data points where we
85+
estimated the time from diagnosis to testing.
86+
87+
**09_organize_specificity_data.R**: (Preprocessing) Prepare the
88+
specificity data to analyze how it changes across assays.
89+
90+
**10_analyze_specificity_data.R**: (Analysis) Fit a Bayesian
91+
model to the specificity data, to find effects of assay
92+
characteristics on specificity.
93+
94+
**11_serotracker_analysis.R**: (Analysis) Computes how many
95+
data points in SeroTracker, that are Unity-aligned, use
96+
assays at high-risk of seroreversion.
97+
98+
**functions_auxiliary.R**: Contains miscellaneous functions for small tasks.
99+
100+
**functions_seroreversion_fit_analysis.R**: Contains functions related to
101+
the Bayesian analysis fit. For example, preparing the initialization
102+
values, extracting the posterior samples in a tidy format, etc.
103+
104+
Directory **plotting_tabulating_scripts** has scripts
105+
that generate the figures for the paper. Each script includes
106+
a description of what Figures it generates. Like the analysis
107+
scripts, paths are made so as to have these scripts ran
108+
from the directory where they are located.
109+
110+
Directory **stan_models** has the .stan files that implement
111+
the Bayesian models that are fit in the main analysis scripts
112+
described above.
113+
7114

8115
## Data files
9116

@@ -96,84 +203,6 @@ The most important variables of this dataset are:
96203
| **midpointDate** | Median date of sample collection |
97204

98205

99-
## Analysis code
100-
101-
Each script includes a description at the beggining.
102-
The scripts are numbered so as to be run in order.
103-
Paths are built so as to be ran from the directory where
104-
a script file is located.
105-
106-
Description of the scripts in the ****code**** directory:
107-
108-
**01_death_dynamics_table.R**: (Preprocessing) Builds the
109-
data/raw_data/death_dynamics.csv file by putting together different
110-
case and death data sources.
111-
112-
**02_estimate_seroreversion_delays.R**: (Preprocessing) Uses the data in
113-
data/raw_data/death_dynamics.csv to estimate the delays between
114-
diagnosis and serology testing for the data in **PCR_to_serotest_unknown.csv**
115-
116-
**03_organize_seroreversion_data.R**: (Preprocessing) Tidies the
117-
data and does som extra pre-processing. The output file of this
118-
script is **PCR_to_serotest_all.csv**, the input to the
119-
statistical analysis.
120-
121-
**04_average_sensitivity_analysis.R**: (Analysis) Fits a hierarchical
122-
Bayesian regression to the data in PCR_to_serotest_all.csv, without
123-
taking into account assay characteristics. Outputs files to
124-
data/analysis_results with descriptions of the fitted model
125-
126-
**04_bis_average_sensitivity_analysis_CV.R**: (Analysis) Fits the
127-
same model as the previous file, but for doing cross-validation.
128-
Outputs the cross-validation results, but no model summary.
129-
130-
**05_characteristics_sensitivity_analysis.R**: (Analysis) Fits a hierarchical
131-
Bayesian regression to the data in PCR_to_serotest_all.csv. Unlike script
132-
04, it includes an effect for the different test characteristics.
133-
134-
**05_bis_characteristics_sensitivity_analysis_CV.R**: (Analysis) Fits a
135-
hierarchical Bayesian model like the script above, but for
136-
doing cross-validation. Only outputs the CV results, and not a
137-
model summary.
138-
139-
**06_positive_slope_analysis.R**: (Analysis) Fits a model with
140-
two slopes, an early slope and a later slope. It does so on
141-
a small set of tests that show positive slopes in the main analysis.
142-
143-
**07_manufacturer_comparison.R**: (Analysis) Compares the results from
144-
previous model fittings to manufacturer reported sensitivities.
145-
146-
**08_characteristics_analysis_known_times.R**: (Analysis) Does the
147-
same as script 05, but for excluding data points where we
148-
estimated the time from diagnosis to testing.
149-
150-
**09_organize_specificity_data.R**: (Preprocessing) Prepare the
151-
specificity data to analyze how it changes across assays.
152-
153-
**10_analyze_specificity_data.R**: (Analysis) Fit a Bayesian
154-
model to the specificity data, to find effects of assay
155-
characteristics on specificity.
156-
157-
**11_serotracker_analysis.R**: (Analysis) Computes how many
158-
data points in SeroTracker, that are Unity-aligned, use
159-
assays at high-risk of seroreversion.
160-
161-
**functions_auxiliary.R**: Contains miscellaneous functions for small tasks.
162-
163-
**functions_seroreversion_fit_analysis.R**: Contains functions related to
164-
the Bayesian analysis fit. For example, preparing the initialization
165-
values, extracting the posterior samples in a tidy format, etc.
166-
167-
Directory **plotting_tabulating_scripts** has scripts
168-
that generate the figures for the paper. Each script includes
169-
a description of what Figures it generates. Like the analysis
170-
scripts, paths are made so as to have these scripts ran
171-
from the directory where they are located.
172-
173-
Directory **stan_models** has the .stan files that implement
174-
the Bayesian models that are fit in the main analysis scripts
175-
described above.
176-
177206
## Meta-analysis summary
178207

179208
In directory **data/systematic_review_summary/**, several

0 commit comments

Comments
 (0)