|
1 | 1 | # Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic review and modeling study
|
2 | 2 |
|
3 |
| -This is the code associated with the paper |
4 |
| -*Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic review and modeling study*, |
5 |
| -(https://www.medrxiv.org/content/10.1101/2022.09.08.22279731v3), |
6 |
| -currently in press at Eurosurveillance. |
| 3 | +This repository contains the code and data required to reproduce the results |
| 4 | +of the paper "Dynamics of SARS-CoV-2 seroassay sensitivity: a systematic |
| 5 | +review and modelling study", Euro Surveill. 2023;28(21). |
| 6 | +(https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.21.2200809). |
| 7 | +The paper analyzes how the sensitivity of SARS-CoV-2 serological assays |
| 8 | +changes over time since infection, and how this change is influenced by |
| 9 | +assay characteristics. |
| 10 | + |
| 11 | +## Reproducing the analysis |
| 12 | + |
| 13 | +In the **code** directory, there are scripts that preprocess the data, |
| 14 | +fit the models, and analyze the results. The code is written in R |
| 15 | +and the Bayesian models are written in Stan. |
| 16 | + |
| 17 | +The scripts in the **code** directory are led by a number, |
| 18 | +indicating the order in which they should be run to fully |
| 19 | +reproduce the analysis. Scripts 01 to 03 preprocess the |
| 20 | +data, scripts 04 and 05 fit the main models in the |
| 21 | +paper (sensitivity vs time), and scripts 06 to |
| 22 | +11 are controls and additional analyses in the paper. |
| 23 | + |
| 24 | +It is not necessary to run all scripts to reproduce the analysis, |
| 25 | +as the data files in data/processed_data allow to jump |
| 26 | +straight into running scripts 04 and 05. |
| 27 | + |
| 28 | +The main outputs of scripts 04 and 05 are also included |
| 29 | +in the data/analysis_results directory, so that the |
| 30 | +figures and statistics in the paper can be reproduced |
| 31 | +without running the analysis. The main figures of the |
| 32 | +paper can be reproduced by running the script |
| 33 | +**code/plotting_tabulating_scripts/plot_sensitivity_profiles.R**. |
| 34 | + |
| 35 | + |
| 36 | +## Analysis code |
| 37 | + |
| 38 | +Each script includes a description at the beggining. |
| 39 | +The scripts are numbered so as to be run in order. |
| 40 | +Paths are built so as to be ran from the directory where |
| 41 | +a script file is located. |
| 42 | + |
| 43 | +Summary of the scripts in the ****code**** directory: |
| 44 | + |
| 45 | +**01_death_dynamics_table.R**: (Preprocessing) Builds the |
| 46 | +data/raw_data/death_dynamics.csv file by putting together different |
| 47 | +case and death data sources. |
| 48 | + |
| 49 | +**02_estimate_seroreversion_delays.R**: (Preprocessing) Uses the data in |
| 50 | +data/raw_data/death_dynamics.csv to estimate the delays between |
| 51 | +diagnosis and serology testing for the data in **PCR_to_serotest_unknown.csv** |
| 52 | + |
| 53 | +**03_organize_seroreversion_data.R**: (Preprocessing) Tidies the |
| 54 | +data and does som extra pre-processing. The output file of this |
| 55 | +script is **PCR_to_serotest_all.csv**, the input to the |
| 56 | +statistical analysis. |
| 57 | + |
| 58 | +**04_average_sensitivity_analysis.R**: (Analysis) Fits a hierarchical |
| 59 | +Bayesian regression to the data in PCR_to_serotest_all.csv, without |
| 60 | +taking into account assay characteristics. Outputs files to |
| 61 | +data/analysis_results with descriptions of the fitted model |
| 62 | + |
| 63 | +**04_bis_average_sensitivity_analysis_CV.R**: (Analysis) Fits the |
| 64 | +same model as the previous file, but for doing cross-validation. |
| 65 | +Outputs the cross-validation results, but no model summary. |
| 66 | + |
| 67 | +**05_characteristics_sensitivity_analysis.R**: (Analysis) Fits a hierarchical |
| 68 | +Bayesian regression to the data in PCR_to_serotest_all.csv. Unlike script |
| 69 | +04, it includes an effect for the different test characteristics. |
| 70 | + |
| 71 | +**05_bis_characteristics_sensitivity_analysis_CV.R**: (Analysis) Fits a |
| 72 | +hierarchical Bayesian model like the script above, but for |
| 73 | +doing cross-validation. Only outputs the CV results, and not a |
| 74 | +model summary. |
| 75 | + |
| 76 | +**06_positive_slope_analysis.R**: (Analysis) Fits a model with |
| 77 | +two slopes, an early slope and a later slope. It does so on |
| 78 | +a small set of tests that show positive slopes in the main analysis. |
| 79 | + |
| 80 | +**07_manufacturer_comparison.R**: (Analysis) Compares the results from |
| 81 | +previous model fittings to manufacturer reported sensitivities. |
| 82 | + |
| 83 | +**08_characteristics_analysis_known_times.R**: (Analysis) Does the |
| 84 | +same as script 05, but for excluding data points where we |
| 85 | +estimated the time from diagnosis to testing. |
| 86 | + |
| 87 | +**09_organize_specificity_data.R**: (Preprocessing) Prepare the |
| 88 | +specificity data to analyze how it changes across assays. |
| 89 | + |
| 90 | +**10_analyze_specificity_data.R**: (Analysis) Fit a Bayesian |
| 91 | +model to the specificity data, to find effects of assay |
| 92 | +characteristics on specificity. |
| 93 | + |
| 94 | +**11_serotracker_analysis.R**: (Analysis) Computes how many |
| 95 | +data points in SeroTracker, that are Unity-aligned, use |
| 96 | +assays at high-risk of seroreversion. |
| 97 | + |
| 98 | +**functions_auxiliary.R**: Contains miscellaneous functions for small tasks. |
| 99 | + |
| 100 | +**functions_seroreversion_fit_analysis.R**: Contains functions related to |
| 101 | +the Bayesian analysis fit. For example, preparing the initialization |
| 102 | +values, extracting the posterior samples in a tidy format, etc. |
| 103 | + |
| 104 | +Directory **plotting_tabulating_scripts** has scripts |
| 105 | +that generate the figures for the paper. Each script includes |
| 106 | +a description of what Figures it generates. Like the analysis |
| 107 | +scripts, paths are made so as to have these scripts ran |
| 108 | +from the directory where they are located. |
| 109 | + |
| 110 | +Directory **stan_models** has the .stan files that implement |
| 111 | +the Bayesian models that are fit in the main analysis scripts |
| 112 | +described above. |
| 113 | + |
7 | 114 |
|
8 | 115 | ## Data files
|
9 | 116 |
|
@@ -96,84 +203,6 @@ The most important variables of this dataset are:
|
96 | 203 | | **midpointDate** | Median date of sample collection |
|
97 | 204 |
|
98 | 205 |
|
99 |
| -## Analysis code |
100 |
| - |
101 |
| -Each script includes a description at the beggining. |
102 |
| -The scripts are numbered so as to be run in order. |
103 |
| -Paths are built so as to be ran from the directory where |
104 |
| -a script file is located. |
105 |
| - |
106 |
| -Description of the scripts in the ****code**** directory: |
107 |
| - |
108 |
| -**01_death_dynamics_table.R**: (Preprocessing) Builds the |
109 |
| -data/raw_data/death_dynamics.csv file by putting together different |
110 |
| -case and death data sources. |
111 |
| - |
112 |
| -**02_estimate_seroreversion_delays.R**: (Preprocessing) Uses the data in |
113 |
| -data/raw_data/death_dynamics.csv to estimate the delays between |
114 |
| -diagnosis and serology testing for the data in **PCR_to_serotest_unknown.csv** |
115 |
| - |
116 |
| -**03_organize_seroreversion_data.R**: (Preprocessing) Tidies the |
117 |
| -data and does som extra pre-processing. The output file of this |
118 |
| -script is **PCR_to_serotest_all.csv**, the input to the |
119 |
| -statistical analysis. |
120 |
| - |
121 |
| -**04_average_sensitivity_analysis.R**: (Analysis) Fits a hierarchical |
122 |
| -Bayesian regression to the data in PCR_to_serotest_all.csv, without |
123 |
| -taking into account assay characteristics. Outputs files to |
124 |
| -data/analysis_results with descriptions of the fitted model |
125 |
| - |
126 |
| -**04_bis_average_sensitivity_analysis_CV.R**: (Analysis) Fits the |
127 |
| -same model as the previous file, but for doing cross-validation. |
128 |
| -Outputs the cross-validation results, but no model summary. |
129 |
| - |
130 |
| -**05_characteristics_sensitivity_analysis.R**: (Analysis) Fits a hierarchical |
131 |
| -Bayesian regression to the data in PCR_to_serotest_all.csv. Unlike script |
132 |
| -04, it includes an effect for the different test characteristics. |
133 |
| - |
134 |
| -**05_bis_characteristics_sensitivity_analysis_CV.R**: (Analysis) Fits a |
135 |
| -hierarchical Bayesian model like the script above, but for |
136 |
| -doing cross-validation. Only outputs the CV results, and not a |
137 |
| -model summary. |
138 |
| - |
139 |
| -**06_positive_slope_analysis.R**: (Analysis) Fits a model with |
140 |
| -two slopes, an early slope and a later slope. It does so on |
141 |
| -a small set of tests that show positive slopes in the main analysis. |
142 |
| - |
143 |
| -**07_manufacturer_comparison.R**: (Analysis) Compares the results from |
144 |
| -previous model fittings to manufacturer reported sensitivities. |
145 |
| - |
146 |
| -**08_characteristics_analysis_known_times.R**: (Analysis) Does the |
147 |
| -same as script 05, but for excluding data points where we |
148 |
| -estimated the time from diagnosis to testing. |
149 |
| - |
150 |
| -**09_organize_specificity_data.R**: (Preprocessing) Prepare the |
151 |
| -specificity data to analyze how it changes across assays. |
152 |
| - |
153 |
| -**10_analyze_specificity_data.R**: (Analysis) Fit a Bayesian |
154 |
| -model to the specificity data, to find effects of assay |
155 |
| -characteristics on specificity. |
156 |
| - |
157 |
| -**11_serotracker_analysis.R**: (Analysis) Computes how many |
158 |
| -data points in SeroTracker, that are Unity-aligned, use |
159 |
| -assays at high-risk of seroreversion. |
160 |
| - |
161 |
| -**functions_auxiliary.R**: Contains miscellaneous functions for small tasks. |
162 |
| - |
163 |
| -**functions_seroreversion_fit_analysis.R**: Contains functions related to |
164 |
| -the Bayesian analysis fit. For example, preparing the initialization |
165 |
| -values, extracting the posterior samples in a tidy format, etc. |
166 |
| - |
167 |
| -Directory **plotting_tabulating_scripts** has scripts |
168 |
| -that generate the figures for the paper. Each script includes |
169 |
| -a description of what Figures it generates. Like the analysis |
170 |
| -scripts, paths are made so as to have these scripts ran |
171 |
| -from the directory where they are located. |
172 |
| - |
173 |
| -Directory **stan_models** has the .stan files that implement |
174 |
| -the Bayesian models that are fit in the main analysis scripts |
175 |
| -described above. |
176 |
| - |
177 | 206 | ## Meta-analysis summary
|
178 | 207 |
|
179 | 208 | In directory **data/systematic_review_summary/**, several
|
|
0 commit comments