-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path09-guide-code.Rmd
293 lines (206 loc) · 19 KB
/
09-guide-code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# GitHub
For reading more easily, we recommend opening the file *Data supplements.Rproj* with R-Studio, which lets the user have the correct working path set up automatically.
Other files can be opened directly by double-clicking on them, or via the `Files` tab in R-Studio.\
Most of the code can be folded/collapsed into sections easily by clicking in the menu `Edit`->`Collapse All Output`.\
Unfolding can be done by clicking on the arrows to the left of the folded sections, or in the menu `Edit`->`Expand All Output`.
## Directory structure {#directory-structure}
For readability, folders are displayed with a dash at the end.
The folder code will be at the section *[code/](#code)*.
### code/ {#code}
Files required to compute the main results of the article: acceleration factors and coverage probabilities.
Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code
### code/linreg.cpp {#linreg.cpp}
Specifies the function computing the negative log-likelihood of a linear regression in `C++`.
Heavily inspired by https://github.com/kaskr/adcomp/blob/master/tmb_examples/linreg.cpp.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg.cpp
<!-- ### code/linreg.dll -->
<!-- Generated automatically by compiling *[code/linreg.cpp](#linreg.cpp)*. -->
<!-- Library that contains code and data that can be used by more than one program at the same time. -->
<!-- Used to compute negative log-likelihoods in `R` with `TMB`. -->
<!-- File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg.dll \ -->
<!-- File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg.dll -->
<!-- ### code/linreg.o {#linreg.o} -->
<!-- Generated by compiling *[code/linreg.cpp](#linreg.cpp)*. -->
<!-- Used to compute negative log-likelihoods in `R` with `TMB`. -->
<!-- File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg.o \ -->
<!-- File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg.o -->
### code/linreg_extended.cpp {#linreg_extended.cpp}
Similar to *[code/linreg.cpp](#linreg.cpp)*, with some more complex code added to serve as an example.
Used to compute negative log-likelihoods in `R` with `TMB`.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg_extended.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg_extended.cpp
<!-- ### code/linreg_extended.dll -->
<!-- Generated automatically by compiling *[code/linreg_extended.cpp](#linreg_extended.cpp)*. -->
<!-- Similar to *[code/linreg.cpp](#linreg.cpp)*. -->
<!-- Used to compute negative log-likelihoods in `R` with `TMB`. -->
<!-- File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg_extended.dll \ -->
<!-- File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg_extended.dll -->
<!-- ### code/linreg_extended.o -->
<!-- Generated by compiling *[code/linreg.cpp](#linreg.cpp)*. -->
<!-- Similar to *[code/linreg.o](#linreg.o)*. -->
<!-- Used to compute negative log-likelihoods in `R` with `TMB`. -->
<!-- File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg_extended.o \ -->
<!-- File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg_extended.o -->
### code/main.R {#main.R}
Used to run procedures related to timing and comparisons presented in the article.
By default, it will load results instead.
The first chunk of code it runs can be found in the file *[code/setup_parameters.R](#setup_parameters.R)*.\
Afterwards, either computations are run and their results are stored in data/, or results are loaded to the current environment.
The computations are written in *[code/poi_hmm_tinn.R](#poi_hmm_tinn.R)*, *[code/poi_hmm_lamb.R](#poi_hmm_lamb.R)*, *[code/poi_hmm_simu1.R](#poi_hmm_simu1.R)*, *[code/poi_hmm_simu2.R](#poi_hmm_simu2.R)*, *[code/poi_hmm_simu3.R](#poi_hmm_simu3.R)*, and *[code/poi_hmm_simu4.R](#poi_hmm_simu4.R)*.
**NOTE**: Our scripts were tested on a workstation with 4 Intel(R) Xeon(R) Gold 6134 processors (3.7 GHz) running under the Linux distribution Ubuntu 18.04.6 LTS (Bionic Beaver) with 384 GB, and took about a week of computing time.
More details are available at individual files.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/main.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/main.R
### code/mvnorm_hmm.cpp {#mvnorm_hmm.cpp}
Specifies the function computing the negative log-likelihood of a m-state multivariate Gaussian Hidden Markov Model in `C++`.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/mvnorm_hmm.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/mvnorm_hmm.cpp
### code/norm_hmm.cpp {#norm_hmm.cpp}
Specifies the function computing the negative log-likelihood of a m-state univariate Gaussian Hidden Markov Model in `C++`.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/norm_hmm.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/norm_hmm.cpp
### code/packages.R
Automatically install/load necessary packages for running *[code/main.R](#main.R)* with all simulations.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/packages.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/packages.R
### code/poi_hmm.cpp {#poi_hmm.cpp}
Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in `C++`.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm.cpp
### code/poi_hmm_smoothing.cpp {#poi_hmm_smoothing.cpp}
Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in `C++`.
Additionally, returns smoothing probabilities.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_smoothing.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_smoothing.cpp
### code/poi_hmm_smoothing_truncated.cpp {#poi_hmm_smoothing_truncated.cpp}
Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in `C++`.
Additionally, returns (optionally truncated) smoothing probabilities.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_smoothing_truncated.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_smoothing_truncated.cpp
### code/poi_hmm_lamb.R {#poi_hmm_lamb.R}
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_lamb.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_lamb.R
On our machine, this file took approximately 2h to compute.
### code/poi_hmm_simu1.R {#poi_hmm_simu1.R}
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu1.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu1.R
On our machine, this file took approximately 3h30 to compute.
### code/poi_hmm_simu2.R {#poi_hmm_simu2.R}
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu2.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu2.R
On our machine, this file took approximately 23h to compute.
### code/poi_hmm_simu3.R {#poi_hmm_simu3.R}
Unlike in the other simulations, the coverage probabilities computation code does not ensure that profile CIs converge.
This is solved in the fourth simulation with a larger dataset size, and shows that care must be taken when dealing with profile CIs because they can fail to produce results. \
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu3.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu3.R
On our machine, this file took approximately 26h30 to compute.
### code/poi_hmm_simu4.R {#poi_hmm_simu4.R}
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu3.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu3.R
On our machine, this file took approximately 63h to compute.
### code/poi_hmm_tinn.R {#poi_hmm_tinn.R}
Most of their code is common:
* `Parameters and covariates` defines parameters and covariates in natural form. The simulation files generate data there.
* `Parameters & covariates for TMB` transforms parameters and covariates to their working form.
* `Estimation` computes estimates with and without TMB for further comparison. The estimates are stored in the data-frames `conf_int_*`.
* `Creating variables for the CIs` creates necessary variables for to compute confidence intervals (CIs).
* `Benchmarks` uses the `microbenchmark` [@R-microbenchmark] package to accurately time the $TMB_0$ $TMB_G$ $TMB_H$ and $TMB_{GH}$ procedures. Their times are stored in the data-frames `estim_benchmarks_df_*`
* `Profiling the likelihood` uses the `TMB` package's `tmbprofile` function to determine a profile of the likelihood, then determines a CI based on it for the corresponding working parameter. Eventually, CIs are derived when possible for the natural parameters $\hat{\bs{\lambda}}$ and $\hat{\bs{\Gamma}}$. The CIs are stored in the data-frames `conf_int_*`.
* `Bootstrap` derives a parametric bootstrap CI from the dataset, while checking that the generated data can be estimated without errors, after which we apply our label switching function to ensure that estimates of a parameter are only aggregated with estimates of the same parameter. The CIs are stored in the data-frames `conf_int_*`.
* `TMB confidence intervals` computes CIs based on `TMB`-derived standard errors. The CIs are stored in the data-frames `conf_int_*`.
* `Coverage probabilities of the 3 CI methods` simulates coverage samples based on true values when available or on estimates otherwise, then derives CIs using the three CI generation methods described above. The coverage probabilities are stored in the data-frames `conf_int_*`.
* `Fixes` uses the label-switching algorithm on the estimates in the `conf_int_*` variable to ensure they are correctly ordered, and then reorders $\hat{\bs{\Gamma}}$ in the `conf_int_*` variable to a row-wise order instead of the default column-wise order. In short this applies two fixes on the `conf_int_*` variable to make it easier to read.\
At the end, we reorder the profile likelihood based CIs in case they are not in ascending order.
A unique randomness seed is set in each of these four files before all time-consuming tasks involving some randomness.
*[code/setup_parameters.R](#setup_parameters.R)* defines the random number generator's version.
*[code/poi_hmm_tinn.R](#poi_hmm_tinn.R)* contains an extra section: `Benchmark the same dataset many times to check the benchmark durations has low variance`.
As written in its title, its role is to time estimation with and without `TMB`.
However, unlike in the section `Benchmarks`, estimation is done on the same dataset everytime.
If everything goes well, the times should have a negligible variance.
This allows us to check if normal background activity on the computer (e.g. the user opening a window or moving the mouse) affects estimation time in any noticeable way.
If it affects estimation noticeably, then we would have to apply much stricter control on background processes in order to reliably compare estimation speeds with different optimizers.
Luckily, a low variance was observed, making the acceleration evidenced in this project significant.
**NOTE**: to be faster, some code is parallelized:
* In *[code/poi_hmm_simu1.R](#poi_hmm_simu1.R)*, bootstrapping is parallelized both in `Bootstrap` (accelerated from 83 seconds per sample to 29 seconds) and in `Coverage probabilities of the 3 CI methods` (likelihood profiling was slower with parallelization, likely due to loading `TMB` each time and the procedure being already fairly quick: from 2.8 seconds in total without to 3.5 seconds in total with).
* In *[code/poi_hmm_simu2.R](#poi_hmm_simu2.R)*, bootstrapping and likelihood profiling are both parallelized in `Bootstrap` (accelerated from 64 seconds per bootstrap sample to 21), `Profiling the likelihood`, and in `Coverage probabilities of the 3 CI methods`.
* *[code/poi_hmm_simu3.R](#poi_hmm_simu3.R)* and *[code/poi_hmm_simu4.R](#poi_hmm_simu4.R)* benefit from the same parallelization as *[code/poi_hmm_simu2.R](#poi_hmm_simu2.R)*.
Parallelization is handled with the `foreach` [@R-foreach] package and the `doParallel` [@R-doParallel] package as a backend.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_tinn.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_tinn.R
### code/setup_parameters.R {#setup_parameters.R}
Sets up multiple variables either to define some global constants or to store useful results for easier maintenance, and compiles *[code/linreg.cpp](#linreg.cpp)* and *[code/poi_hmm.cpp](#poi_hmm.cpp)*.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/setup_parameters.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/setup_parameters.R
### data/ {#data}
Files containing both datasets and the computational results.
*[code/main.R](#main.R)* stores important results in this folder, and only loads them by default.
Speed results are available after some post-processing that we leave to the reader.
Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data
### data/fetal-lamb.RData {#fetal-lamb.RData}
Contains `lamb`: an integer vector of with 240 data from @leroux.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/fetal-lamb.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/fetal-lamb.RData
### data/results_lamb.RData
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_lamb.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_lamb.RData
### data/results_simu1.RData
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_simu1.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_simu1.RData
### data/results_simu2.RData
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_simu2.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_simu2.RData
### data/results_simu3.RData
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_simu3.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_simu3.RData
### data/results_simu4.RData
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_simu4.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_simu4.RData
### data/results_tinn.RData
Contains results from computations and timings run in *[code/main.R](#main.R)*.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/results_tinn.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/results_tinn.RData
### data/tinnitus.RData {#tinnitus.RData}
Tinnitus data collected with the "Track Your Tinnitus" (TYT) mobile application on 87 successive days, provided by the University of Regensburg and European School for Interdisciplinary Tinnitus Research (ESIT), of which a detailed description is presented in @pryss and @pryssa.
A plot is available in Figure \@ref(fig:4tinnitus-fig).
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/tinnitus.RData \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/tinnitus.RData
### functions/ {#functions}
Utility functions used both in `C++` files and in `R` files.
Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions
### functions/mvnorm_utils.cpp {#mvnorm_utils.cpp}
Utility functions used in [Multivariate Gaussian HMM] to transform working parameters to their natural form, and to compute the hidden Markov chain's stationary distribution when needed.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/mvnorm_utils.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/mvnorm_utils.cpp
### functions/mvnorm_utils.R {#mvnorm_utils.R}
Useful functions to estimate multivariate Gaussian HMMs with and without `TMB` and perform various pre-processing and post-processing.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/mvnorm_utils.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/mvnorm_utils.R
### functions/norm_utils.cpp {#norm_utils.cpp}
Utility functions used in [Gaussian HMM] to transform working parameters to their natural form, and to compute the hidden Markov chain's stationary distribution when needed.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/norm_utils.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/norm_utils.cpp
### functions/norm_utils.R {#norm_utils.R}
Useful functions to estimate Gaussian HMMs with and without `TMB` and perform various pre-processing and post-processing.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/norm_utils.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/norm_utils.R
### functions/utils.cpp {#utils.cpp}
Utility functions used in *[code/poi_hmm.cpp](#poi_hmm.cpp)* to transform working parameters to their natural form, and to compute the hidden Markov chain's stationary distribution when needed.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.cpp
### functions/utils.R {#utils.R}
Useful functions to estimate Poisson HMMs with and without `TMB` and perform various pre-processing and post-processing.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.R
### functions/utils.R-explanation.R {#utils.R-explanation.R}
Description of the function parameters and return objects defined in *[functions/utils.R](#utils.R)*. \
The other utility files are adapted to the univariate and multivariate Gaussian setting but otherwise have identical functions. \
Note that the article focuses on Poisson HMMs.
Hence, some utility functions in R present for the Poisson HMM are not adapted to the Gaussian HMMs.\
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.R-explanation.R \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.R-explanation.R
### functions/utils_linreg_extended.cpp {#utils_linreg_extended.cpp}
Contains the useless example function.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils_linreg_extended.cpp \
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils_linreg_extended.cpp