Questions regarding parallel implementation of BVAR #76

thestockman27 · 2022-05-16T20:28:23Z

Hi there,

I am wondering if there has been any work done to build a function that rather than utilizing parallelization to construct several BVAR models, instead distributes the tasks of one model to multiple clusters. To clarify, distributing the MCMCs for each variable or hyperparameter. Is this feasible?

I am running several large back-tests and have found that my the MCMCs are simply too time-consuming for my current modelling effort, even at just 10k draws. I have even built out a connection to SPARK, but it was still too time consuming and became quite costly. (Yes, I recognize that may more properly serve as an indictment of the scale of my back-test process. Duly noted! I am considering refinement.)

Alternatively, would it be possible to bypass the MCMC process and instead derive the density of the posterior? I believe that the SAS proc VARMAX does just this.

Just spit-balling here...Thank you for all your work on this package!

nk027 · 2022-05-17T08:29:44Z

Hey @thestockman27,
Great to hear that your working with the package!

First, if you want the full posterior (of this type of VAR) you will need some kind of sampling approach. Of course, there are other natural conjugate models for which the full posterior is available (let me know if that's what you need). If you want point estimators, there are many approximate approaches that could help (e.g. INLA, variational Bayes), but they're out-of-scope for the package. However, if point estimates of the AR coefficients are what you need, the starting draws may be enough for you, since the initial hyper-parameters are basically set via maximum likelihood.

Regarding the parallelisation -- there already is some in the the linear algebra operations (if the BLAS supports it), but MCMC is a fundamentally sequential algorithm where step i (for all variables) depends on i-1 and the variables depend on each other, so there's little room for parallelisation. However, if you have many (under--used) cores, you can run parallel chains and aggregate the draws! In that case, you can keep the number of burn-ins to a minimum (that shouldn't be too bad, a high number of burns is just very conservative). Two small notes to myself on that -- it would be nice to have:

a function to merge multiple draws from a parallel run
the option of setting other starting values for the hyper-parameters (it would be nice to spawn parallel processes after the chain has converged)

Best,

thestockman27 · 2022-05-17T18:29:08Z

Thanks for the quick and thorough response.

Yes, I think point estimates for the AR coefficients would satisfy my immediate use case. I believe this occurs within bvar() via bv_ml() and calc_logml() and the initial values of the hyperparameters are generated via rmvn_proposal(). I'll admit that much of these calculations are beyond my current level of comprehension...

How would I then override the sampling, perhaps by setting ml_store and hyper_store equal to the initial draws of ml_draw() and hyper_draw()?

Merging chains run in parallel sounds like a neat idea.

nk027 · 2022-05-18T08:53:54Z

So the most straightforward way would be to just use a standard Normal Inverse-Wishart prior, like so

N <- 100
P <- 4

# Coefficients ---
beta <- matrix(rgamma(P^2, 1, 1), P)
# They ought to be stationary
beta <- beta / (max(abs(eigen(beta)$values)))

# VAR data ---
D <- matrix(NA_real_, N, P)
D[1, ] <- rnorm(P)
for(i in seq(2, N)) {
  D[i, ] <- D[i - 1, ] %*% beta + rnorm(P)
}

X <- cbind(1, BVAR:::lag_var(D, lags = 1L)[-1, ])
Y <- D[-1, ]
plot.ts(Y)

# Priors ---
beta_p0 <- diag(P + 1) * 1e-8
beta_mu0 <- matrix(0, P + 1, P)
sigma_d0 <- 0.1

# Beta ---
# Posterior precision (inverse of variance)
beta_p1 <- crossprod(X) + beta_p0
# Posterior mean -- weighted combination of data and prior mean
beta_mu1 <- solve(beta_p1, (crossprod(X, Y) + beta_p0 %*% beta_mu0))

# Sigma ---
# Posterior shape
sigma_c1 <- 1 + N / 2
# Posterior rate
res <- Y - X %*% beta_mu1
sigma_d1 <- sigma_d0 + crossprod(res) / 2
# Mean of sigma
sigma_c1 / sigma_d1

If you absolutely want to use the GLP model in an empirical Bayes fashion you can just follow the steps until the optimisation (line #241) and then use the output of bv_ml() with those hyper-parameters -- that should contain all you need to calculate the point estimates.

Best,

thestockman27 · 2022-08-26T17:51:36Z

Hi there, hope you have had a nice summer.

I was able to implement your suggestion for point estimates using the initial draws calculated by optim. Seen here.

I have noticed that adjusting the seed via set.seed() will impact the results of my backtesting significantly. It would seem that the quasi-newton optimization does not reach the consistent estimates of the ML.

I am considering following the methodology of the VARMAX procedure from SAS where they utilize the Kalman Filter to calculate the exact ML.

I suppose I just wanted to run this by you, any thoughts? I would appreciate any suggestions/criticisms that could better guide me.

nk027 added the enhancement New feature or request label May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding parallel implementation of BVAR #76

Questions regarding parallel implementation of BVAR #76

thestockman27 commented May 16, 2022 •

edited

Loading

nk027 commented May 17, 2022

thestockman27 commented May 17, 2022

nk027 commented May 18, 2022

thestockman27 commented Aug 26, 2022

Questions regarding parallel implementation of BVAR #76

Questions regarding parallel implementation of BVAR #76

Comments

thestockman27 commented May 16, 2022 • edited Loading

nk027 commented May 17, 2022

thestockman27 commented May 17, 2022

nk027 commented May 18, 2022

thestockman27 commented Aug 26, 2022

thestockman27 commented May 16, 2022 •

edited

Loading