You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 11-comparing-models.Rmd
-2Lines changed: 0 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -257,13 +257,11 @@ where the residuals $\epsilon_{ij}$ are assumed to be independent and follow a G
257
257
A Bayesian linear model makes additional assumptions. In addition to specifying a distribution for the residuals, we require _prior distribution_ specifications for the model parameters ( $\beta_j$ and $\sigma$ ). These are distributions for the parameters that the model assumes before being exposed to the observed data. For example, a simple set of prior distributions for our model might be:
258
258
259
259
260
-
$$
261
260
\begin{align}
262
261
\epsilon_{ij} &\sim N(0, \sigma) \notag \\
263
262
\beta_j &\sim N(0, 10) \notag \\
264
263
\sigma &\sim \text{exponential}(1) \notag
265
264
\end{align}
266
-
$$
267
265
268
266
These priors set the possible/probable ranges of the model parameters and have no unknown parameters. For example, the prior on $\sigma$ indicates that values must be larger than zero, are very right-skewed, and have values that are usually less than 3 or 4.
Each of these models result in linear class boundaries. Which one should be we use? Since, for these data, the number of model parameters does not vary, the statistical approach is to compute the (log) likelihood for each model and determine the model with the largest value. Traditionally, the likelihood is computed using the same data that were used to estimate the parameters, not using approaches like data splitting or resampling from Chapters \@ref(splitting) and \@ref(resampling).
126
126
127
127
For a data frame `training_set`, let's create a function to compute the different models and extract the likelihood statistics for the training set (using `broom::glance()`):
In this output, the p-values correspond to separate hypothesis tests for each parameter:
198
198
199
-
$$
200
199
\begin{align}
201
200
H_0&: \beta_j = 0 \notag \\
202
201
H_a&: \beta_j \ne 0 \notag
203
202
\end{align}
204
-
$$
205
203
206
204
for each of the model parameters. Looking at these results, `phd` (the prestige of their department) may not have any relationship with the outcome.
207
205
@@ -238,12 +236,10 @@ glm_boot %>%
238
236
239
237
Determining which predictors to include in the model is a difficult problem. One approach is to conduct likelihood ratio tests (LRT) [@McCullaghNelder89] between nested models. Based on the confidence intervals, we have evidence that a simpler model without `phd` may be sufficient. Let's fit a smaller model, then conduct a statistical test:
240
238
241
-
$$
242
239
\begin{align}
243
240
H_0&: \beta_{phd} = 0 \notag \\
244
241
H_a&: \beta_{phd} \ne 0 \notag
245
242
\end{align}
246
-
$$
247
243
248
244
This hypothesis was previously tested when we showed the tidied results for `log_lin_fit`. That particular approach used results from a single model fit via a Wald statistic (i.e. the parameter divided by its standard error). For that approach, the p-value was `r tidy(log_lin_fit) %>% filter(term == "phd") %>% pluck("p.value") %>% format.pval()`. We can tidy the results for the LRT to get the p-value:
where the $x$ covariates affect the non-zero count values and the $z$ covariates influence the probability of a zero count. The two sets of predictors do not need to be mutually exclusive.
282
276
@@ -295,13 +289,10 @@ zero_inflated_fit
295
289
296
290
Since the coefficients for this model are also estimated using maximum likelihood, let's try to use another likelihood ratio test to understand if the new model terms are helpful. We will _simultaneously_ test that
0 commit comments