-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Generalized) Linear Mixed Model Tutorial #1518
Comments
Thanks for the inquiry. Linear mixed models in a Bayesian setting are simply multi-level models. In a Frequentist setting, an LMM with random intercept requires specifying a distribution not only for the error terms, but random intercepts as well. In addition, other model parameters, e.g. a global intercept, are assumed to be fixed effects. The implication is that an LMM, which usually has both fixed and random effects, requires special considerations for estimating the effects; whereas In a model with only fixed effects, you can estimate parameters in the "usual" way using, for example, optimization algorithms, like gradient descent. (Or in the case of linear regression, the best linear unbiased estimator is available in closed form.) In a Bayesian setting, you can specify a multi-level model with intercepts for each group like this (in Turing): using Turing
using Distributions
import Random
@model multilevel_with_random_intercept(y, X, z) = begin
# number of unique groups
num_random_intercepts = length(unique(z))
# number of predictors
num_predictors = size(X, 2)
### NOTE: Carefully chosen priors should be used for a particular application. ###
# Prior for standard deviation for errors.
sigma ~ LogNormal()
# Prior for coefficients for predictors.
beta ~ filldist(Normal(), num_predictors)
# Prior for intercept.
intercept ~ Normal()
# Prior for variance of random intercepts. Usually requires thoughtful specification.
s2 ~ InverseGamma(3, 2)
s = sqrt(s2)
# Prior for random intercepts.
random_intercepts ~ filldist(Normal(0, s), num_random_intercepts)
# likelihood.
y .~ Normal.(intercept .+ X * beta + random_intercepts[z], sigma)
end
# Generate data.
Random.seed!(0)
N = 200
X = randn(N, 3)
z = [fill(1, 50); fill(2, 50); fill(3, 50); fill(4, 50)]
beta = [-2, 0, 2]
intercept = 1
random_intercepts = [-1, 1, -1, 1]
sigma = .1
y = intercept .+ X * beta + random_intercepts[z] + randn(N) * .sigma
# Sample via NUTS.
chain = sample(multilevel_with_random_intercept(y, X, z), NUTS(), 2000)
# Print the posterior mean of model parameters.
mean(chain)
# Mean
# parameters mean
# Symbol Float64
#
# beta[1] -1.9875
# beta[2] -0.0015
# beta[3] 1.9902
# intercept 0.7918
# random_intercepts[1] -0.7694
# random_intercepts[2] 1.2085
# random_intercepts[3] -0.7999
# random_intercepts[4] 1.2024
# s2 1.1104
# sigma 0.0997
# Note that above, the priors usually require at least some degree of consideration and should not be too vague. Also note that in a Bayesian model, all parameters are random. (i.e. The global intercept would not be termed a "fixed effect".) You can do something similar for a model with slopes for each group ("random slopes"). Let me know if you need more clarification. A GLMM would replace the Gaussian likelihood with something else (e.g. Bernoulli, Binomial, Poisson, etc.) for when the response does not have support on the entire real line. |
Thanks! This helps a lot. I am having a bit of trouble extending it to the random slope case, mainly since its hard to debug dimensions. I modified the above by adding this to the end. But I get an error about Dimension Mismatch
Units wise, I need to add it to the beta before left multiplication with X, but I guess the issue is that beta is p x 1 and I need to have random_slopes[z] be p x 1 as well, which means the original random_slopes should be a vector of vectors I think but I don't know how to write that. |
Let me write this model a different way. y = X β + Z u + ϵ where
If you want a random-intercept-only model, then Z would be a matrix of ones and zeros. If there are only 3 groups and N=12, with the first 4 observations being in the first group, the next 4 being in the second group, and the last 4 being in the third group, then Z would be
If you want random slopes and intercepts, and you only have one covariate (so K=2, with first column being 1's and second column being the values of the predictors) then Z could be
where xi,j is the j-th measurement of the predictor for group i. If you have more covariates, you just add more columns to Z. (i.e. add another three columns to Z for another predictor.) The predictors that appear in Z don't have to be the same ones that appear in X. Note that if you include a global intercept, X should contain (exactly) one column of ones. Under this form, you can avoid the index (z) used in the previous model, and just supply X, y, and Z as data. I haven't tested the following code, but something like this should work: @model function multilevel_model(y, X, Z)
# number of predictors
num_predictors_X = size(X, 2) # including intercept (column of 1s) if desired
num_predictors_Z = size(Z, 2)
### NOTE: Carefully chosen priors should be used for the particular application. ###
# Prior for standard deviation for errors.
sigma ~ LogNormal()
# Prior for coefficients for predictors. The supplied prior can be a little vague.
beta ~ filldist(Normal(0, 10), num_predictors_X)
# Prior for variance of group effects. Usually requires thoughtful specification.
s2 ~ InverseGamma(3, 2)
s = sqrt(s2)
# Prior for group effects.
u ~ filldist(Normal(0, s), num_predictors_Z)
# likelihood.
y .~ Normal.(X * beta + Z * u, sigma)
end |
Thanks so much! I'm sure this will also help a lot of others looking to do Bayesian Multilevel Models, and the syntax here looks intuitive. This matrix representation helps since it is easier to generalize. I just checked on my data and it works (which is just a simple 1D case of Y vs X but 4 groups for the REs). Just one more thing is in my case, I want the fixed effects intercept and slope to have different prior values. I tried this:
but it didn't seem to work. However, when I did:
It worked. So I wonder, is there a way to broadcast the Normal prior here all in 1 go when you want a different prior for each coefficient? Or do you have to manually specify like this even if its from the same distribution? |
Try this: beta ~ arraydist(Normal.([0.5, 100], [0.2, 20])) The documentation on Turing is constantly changing, so I'm going to quote a bit from the current doc:
From: https://turing.ml/dev/docs/using-turing/performancetips |
Thanks, using arraydist seems to work. Experimenting with something a bit more advanced now--I noticed that in the above model you specified the Random Effects all have the same variance (and presumably the random intercept/slope are independent of each other). What I mean is for example the frequentist LMM in R lme4 using VarCorr() you get the variance of the RE intercept, RE slope, and their covariance. How do I get those variance components here, I assume it would involve using a Covariance matrix prior. I tried:
Basically I am trying to use a Wishart as the prior for the random effect covariance matrix (since I think it generates Positive Semidefinite matrices?) with 1/40* identity scale (I got the 1/40 from 1/n here https://en.wikipedia.org/wiki/Wishart_distribution#Use_in_Bayesian_statistics) . The covariance matrix should have num_predictors_Z dimensions for each of the random intercept coefficients and slope I think? I get the error PosDefException: matrix is not Hermitian. Cholesky factorization failed But I thought the Wishart distribution generates Positive Definite matrices so this shouldn't be an issue? Am I doing something wrong? How can I get variance components? |
Can you try enforcing symmetry with using LinearAlgebra
MultivariateNormal(fill(0,num_predictors_Z), Symmetric(Σ)) |
I get the same error, it seems to be occurring (based on the Atom IDE red highlighting) in the step before that where Σ is defined. So not sure if its numerical issues (which Symmetric() should theoretically prevent to my knowledge) or something else with the syntax |
I'm not very experienced with multilevel / mixed effects models. But for mixed effects models, specifying a covariance for the https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/198-30.pdf My guess would be that adding small values (e.g. If that doesn't work, could you provide a minimal example to reproduce the errors? |
I'll have to think about the Wishart prior specifications. |
The Stan documentation is a good resource on multilevel models. They argue in favour of an LKJ prior over the Wishart. Using the LKJ doesn't solve the numerical precision problem, though. |
Thanks for providing examples of multilevel models. I have been trying to scale up your example to a more common amount of data. Unfortunately, it is running very slow and the progress meter is not displaying a estimated time. Its not clear to me whether memoization is applicable in this case. Do you have any recommendations? using Turing, ReverseDiff
using Distributions
import Random
Turing.setadbackend(:reversediff)
@model multilevel_with_random_intercept(y, X, z) = begin
# number of unique groups
num_random_intercepts = length(unique(z))
# number of predictors
num_predictors = size(X, 2)
### NOTE: Carefully chosen priors should be used for a particular application. ###
# Prior for standard deviation for errors.
sigma ~ LogNormal()
# Prior for coefficients for predictors.
beta ~ filldist(Normal(), num_predictors)
# Prior for intercept.
intercept ~ Normal()
# Prior for variance of random intercepts. Usually requires thoughtful specification.
s2 ~ InverseGamma(3, 2)
s = sqrt(s2)
# Prior for random intercepts.
random_intercepts ~ filldist(Normal(0, s), num_random_intercepts)
# likelihood.
y .~ Normal.(intercept .+ X * beta + random_intercepts[z], sigma)
end
# Generate data.
Random.seed!(0)
N = 30
n = 50
X = randn(N*n, 3)
z = repeat(1:N, inner=n)
beta = [-2, 0, 2]
intercept = 1
random_intercepts = rand(Normal(0,1), N)
sigma = .1
y = intercept .+ X * beta + random_intercepts[z] + randn(N*n) * sigma
# Sample via NUTS.
chain = sample(multilevel_with_random_intercept(y, X, z), NUTS(), 2000, progress=true)
# Print the posterior mean of model parameters.
mean(chain) |
My guess is that NUTS is using a large tree depth. Could you try replacing |
Thanks for getting back to me. I added |
Try a combination of gradually increasing the |
The hyperprior for the intercept SDs are way too low # Prior for variance of random intercepts. Usually requires thoughtful specification.
s2 ~ InverseGamma(3, 2)
s = sqrt(s2) try this: s ~ Truncated(Cauchy(0, 2), 0, Inf)
random_intercepts ~ filldist(Normal(0, s), num_random_intercepts) and |
My go-to varying intercept model looks like this: # Model
@model varying_intercept(X, idx, y) = begin
n_gr = length(unique(idx))
predictors = size(X, 2)
# priors
μ ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
σ ~ Exponential(1 / std(y)) # residual SD
# Coefficients Student-t(ν = 3)
β ~ filldist(TDist(3), size(X, 2))
# Prior for variance of random intercepts. Usually requires thoughtful specification.
σᵢ ~ Truncated(Cauchy(0, 2), 0, Inf)
# s = sqrt(s2)
μᵢ ~ filldist(Normal(0, σᵢ), n_gr) # group-level intercepts
# likelihood
ŷ = μ .+ X * β .+ μᵢ[idx]
y ~ MvNormal(ŷ, σ)
end |
Usually it should be more efficient to fix |
Are CategoricalArrays supported by Turing? I am just trying to do that in a Bernoulli Model sing Turing, RDatasets
# We need a logistic function, which is provided by StatsFuns.
using StatsFuns:logistic
default = RDatasets.dataset("ISLR", "Default")
X = Matrix(select(default, [:Student, :Balance, :Income]))
y = default[:, :Default]
# original formula: default ~ student + balance + income, family
@model logreg(X, y) = begin
d = size(X, 2)
μ ~ Normal(0, 2.5 * std(y))
β ~ filldist(TDist(3), d)
v = logistic.(μ .+ X * β)
y .~ Bernoulli.(v)
end
model = logreg(X, y)
chn = sample(model, NUTS(1_000, 0.65), MCMCThreads(), 2_000, 4) The problem is that ERROR: LoadError: TaskFailedException:
MethodError: no method matching /(::CategoricalValue{String,UInt8}, ::Int64) |
|
@devmotion you are right! Thanks! Going from 26s to 25s (1s improvement but that's something) the model now is: # Model
@model varying_intercept(X, idx, y; n_gr = length(unique(idx))) = begin
predictors = size(X, 2)
# priors
μ ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
σ ~ Exponential(1 / std(y)) # residual SD
# Coefficients Student-t(ν = 3)
β ~ filldist(TDist(3), predictors)
# Prior for variance of random intercepts. Usually requires thoughtful specification.
σᵢ ~ Truncated(Cauchy(0, 2), 0, Inf)
# s = sqrt(s2)
μᵢ ~ filldist(Normal(0, σᵢ), n_gr) # group-level intercepts
# likelihood
ŷ = μ .+ X * β .+ μᵢ[idx]
y ~ MvNormal(ŷ, σ)
end Or even more efficient: # Model
@model varying_intercept(X, idx, y; n_gr = length(unique(idx)), predictors = size(X, 2)) = begin
# priors
μ ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
σ ~ Exponential(1 / std(y)) # residual SD
# Coefficients Student-t(ν = 3)
β ~ filldist(TDist(3), predictors)
# Prior for variance of random intercepts. Usually requires thoughtful specification.
σᵢ ~ Truncated(Cauchy(0, 2), 0, Inf)
# s = sqrt(s2)
μᵢ ~ filldist(Normal(0, σᵢ), n_gr) # group-level intercepts
# likelihood
ŷ = μ .+ X * β .+ μᵢ[idx]
y ~ MvNormal(ŷ, σ)
end |
Also, while we are at it. Could you guys help me out? I'm trying to implement a multivariate varying-slope model that uses a cholesky decomposition. But I keep getting bad ESS and Rhat: using Turing, RDatasets
using Random:seed!
using Statistics: mean, std
using LinearAlgebra: cholesky
seed!(1)
mtcars = RDatasets.dataset("datasets", "mtcars")
# Data prep
y = mtcars[:, :MPG]
idx = mtcars[:, :Cyl] # vector of group indeces
idx = map(idx) do i
i == 4 ? 1 :
i == 6 ? 2 :
i == 8 ? 3 : missing
end
X = Matrix(select(mtcars, [:HP, :WT])) # the model matrix
# Model
@model varying_intercept_multi(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2),
L=cholesky(cor(X)).L, mus=transpose(mean(X, dims=1))) = begin
# priors
μ ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
# Prior for variance of random intercepts. Usually requires thoughtful specification.
σᵢ ~ Truncated(Cauchy(0, 2), 0, Inf)
μᵢ ~ filldist(Normal(0, σᵢ), n_gr) # group-level intercepts
# Non-Centered Parameterization
β_raw ~ filldist(Normal(), predictors) # NCP hyperprior for slopes
σⱼ ~ filldist(Truncated(Cauchy(0, 5), 0, Inf), predictors) # SD hyperprior for slopes
α ~ filldist(Normal(), predictors) # mean hyperprior for slopes
β = mus + σⱼ .* (L * α)
σ ~ Exponential(1 / std(y)) # residual SD
# likelihood
ŷ = vec(μ .+ X * β .+ μᵢ[idx])
y ~ MvNormal(ŷ, σ)
# generated quantities
return β
end |
|
Sorry my bad yes it is defined it was a mistake while copying and pasting. |
If it helps, I have some code in this issue that you can adapt. I'm trying to work out which ad backends are better in which circumstances. Using reverse ad can save a lot of time with bigger models. Using the Hsb82 dataset, for example, Zygote (when it worked) let me run the model in 15 min or so. With forwarddiff, it took like like 6 hours. |
By default Turing runs on forward diff? |
Yeah, that's correct. |
Please also see https://github.com/TuringLang/TuringGLM.jl |
Hello,
I am relatively new to Turing.jl and Bayesian in general. I was wondering how I could go about implementing an LMM/GLMM model with a random intercept/slope, I couldn't find the tutorial for this in the documentation here: https://turing.ml/dev/tutorials/
I know there is some Stan stuff out there but I have never used that either so its been a bit difficult to translate. Turing.jl is actually the first time I am trying a Probabilistic Programming Language. The syntax I have seen so far is really nice, but nothing for multilevel models. Since multilevel models are a big use case for Bayesian, it would be nice to have a tutorial on them. Then users could refer to that for their use cases.
Thanks!
The text was updated successfully, but these errors were encountered: