-
Notifications
You must be signed in to change notification settings - Fork 1
/
06-application.Rmd
196 lines (164 loc) · 6.68 KB
/
06-application.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# Application to different data sets
## TYT data
We detail here the code used to estimate a two-state Poisson HMM based on the tinnitus dataset available in *[data/tinnitus.RData](#tinnitus.RData)*.
- Set a seed for randomness, and load files
```{r 6tyt-init}
set.seed(123)
library(TMB)
TMB::compile("code/poi_hmm.cpp")
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/tinnitus.RData")
```
- Set initial parameters
```{r 6tyt-init-params}
# Parameters and covariates
m <- 2
gamma <- matrix(c(0.8, 0.2,
0.2, 0.8), nrow = m, ncol = m)
lambda <- seq(quantile(tinn_data, 0.1), quantile(tinn_data, 0.9), length.out = m)
delta <- stat.dist(gamma)
```
- Transform them into working parameters
```{r 6tyt-pn2pw}
working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = tinn_data, m = m)
```
- Estimate the parameters via a function
```{r 6tyt-estim}
# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
DLL = "poi_hmm", silent = TRUE)
# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
mod_tmb$convergence == 0
# Results
summary(sdreport(obj_tmb), "report")
```
For the code used to generate coverage probabilities and acceleration results, please take a look at *[code/poi_hmm_tinn.R](#poi_hmm_tinn.R)*.
## Lamb data
We detail here the code used to estimate a two-state Poisson HMM based on the lamb dataset available in *[data/fetal-lamb.RData](#fetal-lamb.RData)*
- Set a seed for randomness, and load files
```{r 6lamb-init}
set.seed(123)
library(TMB)
TMB::compile("code/poi_hmm.cpp")
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/fetal-lamb.RData")
lamb_data <- lamb
rm(lamb)
```
- Set initial parameters
```{r 6lamb-init-params}
# Parameters and covariates
m <- 2
gamma <- matrix(c(0.8, 0.2,
0.2, 0.8), nrow = m, ncol = m)
lambda <- seq(0.3, 4, length.out = m)
delta <- stat.dist(gamma)
```
- Transform them into working parameters
```{r 6lamb-pn2pw}
working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = lamb_data, m = m)
```
- Estimate the parameters via a function
```{r 6lamb-estim}
# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
DLL = "poi_hmm", silent = TRUE)
# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
mod_tmb$convergence == 0
# Results
summary(sdreport(obj_tmb), "report")
```
For the code used to generate coverage probabilities and acceleration results, please take a look at *[code/poi_hmm_lamb.R](#poi_hmm_lamb.R)*.
On a minor note, when comparing our estimation results to those reported by @leroux, some non-negligible differences can be noted.
The reasons for this are difficult to determine, but some likely explanations are given in the following.
First, differences in the parameter estimates may result e.g. from the optimizing algorithms used and related setting (e.g. convergence criterion, number of steps, optimization routines used in 1992, etc).
Moreover, @leroux seem to base their calculations on an altered likelihood, which is reduced by removing the constant term $\sum_{i=1}^{T} \log(x_{i}!)$ from the log-likelihood.
This modification may also possess an impact on the behavior of the optimization algorithm, as e.g. relative convergence criteria and step size could be affected.
The altered likelihood becomes apparent when computing it on a one-state HMM.
A one-state Poisson HMM is a Poisson regression model, for which the log-likelihood has the expression
\begin{align*}
l(\lambda) &= \log \left(\prod_{i=1}^{T} \frac{\lambda^{x_i} e^{-\lambda}}{x_{i}!} \right)\\
&= - T \lambda + \log(\lambda) \left( \sum_{i=1}^{T} x_i \right) - \sum_{i=1}^{T} \log(x_{i}!).
\end{align*}
The authors find a ML estimate $\lambda = 0.3583$ and a log-likelihood of -174.26.
In contrast, calculating the log-likelihood explicitly shows a different result.
```{r 6leroux-likelihood-calculation}
x <- lamb_data
# We use n instead of T in R code
n <- length(x)
l <- 0.3583
- n * l + log(l) * sum(x) - sum(log(factorial(x)))
```
The log-likelihood is different, but when the constant $- \sum_{i=1}^{T} \log(x_{i}!)$ is removed, it matches our result.
```{r 6leroux-likelihood}
- n * l + log(l) * sum(x)
```
## Simulated data
We detail here the code used to simulate two datasets from two-states Poisson HMMs, one of size `r DATA_SIZE_SIMU1` and one of size `r DATA_SIZE_SIMU2`.
Then, using the same procedure as above, we estimate a model using different initial parameters.
- Set initial parameters (data size and HMM parameters)
```{r 6simu-gen-init-params}
DATA_SIZE_SIMU <- 2000
m <- 2
# Generate parameters
lambda <- seq(10, 14, length.out = m)
# Create the transition probability matrix with 0.8 on its diagonal
gamma <- matrix(c(0.8, 0.2,
0.2, 0.8), nrow = m, ncol = m)
delta <- stat.dist(gamma)
```
The `stat.dist` function computes the stationary distribution.
- Generate data with one of the functions defined in [Generating data](#generating-data)
```{r 6simu-gen-data}
simu_data <- pois.HMM.generate.sample(ns = DATA_SIZE_SIMU,
mod = list(m = m,
lambda = lambda,
gamma = gamma,
delta = delta))$data
```
- Set initial parameters
```{r 6simu-init-params}
# Parameters and covariates
m <- 2
gamma <- matrix(c(0.6, 0.4,
0.4, 0.6), nrow = m, ncol = m)
lambda <- seq(quantile(simu_data, 0.1), quantile(simu_data, 0.9), length.out = m)
delta <- stat.dist(gamma)
# Display Poisson means
lambda
```
- Transform them into working parameters
```{r 6simu-pn2pw}
working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = simu_data, m = m)
```
- Estimate the parameters via a function
```{r 6simu-estim}
# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
DLL = "poi_hmm", silent = TRUE)
# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
mod_tmb$convergence == 0
# Results
summary(sdreport(obj_tmb), "report")
```
For the code used to generate coverage probabilities and acceleration results, please take a look at *[code/poi_hmm_simu1.R](#poi_hmm_simu1.R)*, *[code/poi_hmm_simu2.R](#poi_hmm_simu2.R)*, *[code/poi_hmm_simu3.R](#poi_hmm_simu3.R)* and *[code/poi_hmm_simu4.R](#poi_hmm_simu4.R)*.