Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Adaptive Nuts #3033

Merged
merged 67 commits into from
Jul 26, 2021
Merged

Parallel Adaptive Nuts #3033

merged 67 commits into from
Jul 26, 2021

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Mar 24, 2021

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

Okay so this PR has the cmdstan facing version of hmc_nuts_diag_e_adapt() (and I plan to also do dense in this PR). The signature differs from the original hmc_nuts_diag_e_adapt() in that the init, init_inv_metric, sample_writer, and diagnostic_writer are std::vector<>`'s of what the normal signature is.

My suggestion is that for cmdstan we should have all of these be std::vectors when they are created. Then for the algorithms that are not parallel yet we set vec_argument[0] as the input argument for the algorithm. That will let us get diag/dense matric adaptive nuts into cmdstan and then start adding parallel versions of all the other algorithms. Does that work for everyone?

Intended Effect

Allow multiple chains to be invoked for adaptive nuts with diag/dense metrics

How to Verify

Test added for parallel diag e adapt

STAN_NUM_THREADS=4 ./runTests.py ./src/test/unit/services/sample/hmc_nuts_diag_e_adapt_parallel_test.cpp

Side Effects

Documentation

Still need to add docs

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Steve Bronder

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@SteveBronder SteveBronder mentioned this pull request Mar 24, 2021
3 tasks
@SteveBronder SteveBronder changed the title Feature/parallel nuts [WIP] Parallel Adaptive Nuts Mar 24, 2021
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.33 3.43 0.97 -2.83% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.0% slower
eight_schools/eight_schools.stan 0.11 0.12 0.98 -1.96% slower
gp_regr/gp_regr.stan 0.16 0.16 0.98 -1.82% slower
irt_2pl/irt_2pl.stan 5.44 5.4 1.01 0.68% faster
performance.compilation 91.22 89.08 1.02 2.35% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.6 8.6 1.0 -0.03% slower
pkpd/one_comp_mm_elim_abs.stan 30.02 30.8 0.97 -2.6% slower
sir/sir.stan 122.21 123.24 0.99 -0.84% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 -0.03% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.24 3.02 1.07 6.86% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.39 0.96 -3.7% slower
arK/arK.stan 2.02 1.87 1.08 7.7% faster
arma/arma.stan 0.95 0.65 1.47 31.86% faster
garch/garch.stan 0.53 0.58 0.9 -10.89% slower
Mean result: 1.02817362089

Jenkins Console Log
Blue Ocean
Commit hash: 5055e21


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments. I don't like having the 1 chain/n-chain code paths be separate.

Model& model, const stan::io::var_context& init,
const stan::io::var_context& init_inv_metric, unsigned int random_seed,
Model& model, const InitContext& init,
const InitMetricContext& init_inv_metric, unsigned int random_seed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the types changing here? I don't see an advantage to templating this. Now the template hides even the base polymorphic type.

Copy link
Collaborator Author

@SteveBronder SteveBronder Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah they are! Notice that now when an initial inverse metric is not given we actually return back an stan::io::dump (or vector of). This is nice for the compiler because now it knows the real type coming in so devirtualization happens a lot easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it. It's so hard to read code when the types info disappears like here. I doubt the virtual function calls here are killing our performance. Both of these things are used once at initialization and that is it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a var_context more informative? It's just an abstract base class so you still need to go look at the callee to see what's happening when this objects member functions are called. What if I included in the docs that this is either going to be a stan::io::dump or stan::io::json_context etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a var_context more informative

Well it's still easier to search for (you can find that it's an abstract base class, and then you can find what inherits from it and whatnot) and this is how everything is kinda done in the rest of services.

I think the reason we'd go templating here is if we were doing something we couldn't just solve with the existing polymorphism, or it's just easier to code the templated thing.

If that's the case, let's do templating. If that's not the case, let's do templating it if you really want to. Not the biggest deal but I prefer the existing stuff.

} catch (const std::domain_error& e) {
return error_codes::CONFIG;
}
util::run_adaptive_sampler(samplers, model, cont_vectors, num_warmup,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the other code, I don't like how the 1 chain and N chain implementations are separate. It seems like they could be the same code and the 1 chain thing would just have 1 chain and that's fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two right now mostly because of cmdstan. Once we add the parallel version to cmdstan we can remove 1 chain version

@SteveBronder SteveBronder changed the base branch from feature/parallel-adaptive to develop March 25, 2021 22:30
@SteveBronder SteveBronder marked this pull request as ready for review March 25, 2021 22:30
@SteveBronder
Copy link
Collaborator Author

@bbbales2 I think this PR should have everything in it that we need to do the cmdstan stuff. Once we think the hmc_nuts_diag_e_adapt version looks good I can pretty easily update the other samplers and add tests for them

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.13 3.16 0.99 -0.91% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.02% slower
eight_schools/eight_schools.stan 0.12 0.11 1.01 0.96% faster
gp_regr/gp_regr.stan 0.16 0.16 0.97 -2.92% slower
irt_2pl/irt_2pl.stan 5.89 5.96 0.99 -1.06% slower
performance.compilation 87.48 86.99 1.01 0.57% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.62 8.53 1.01 1.04% faster
pkpd/one_comp_mm_elim_abs.stan 29.23 30.63 0.95 -4.81% slower
sir/sir.stan 128.64 143.65 0.9 -11.67% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.01 0.81% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 2.98 1.01 1.43% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.4 0.99 -0.76% slower
arK/arK.stan 2.56 1.87 1.37 26.85% faster
arma/arma.stan 0.64 0.93 0.69 -43.98% slower
garch/garch.stan 0.64 0.64 1.0 0.47% faster
Mean result: 0.991987978995

Jenkins Console Log
Blue Ocean
Commit hash: c50cf1a


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.09 3.09 1.0 0.02% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.45% slower
eight_schools/eight_schools.stan 0.11 0.11 1.0 0.13% faster
gp_regr/gp_regr.stan 0.16 0.16 1.01 0.66% faster
irt_2pl/irt_2pl.stan 5.93 5.87 1.01 1.06% faster
performance.compilation 89.16 86.73 1.03 2.73% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.06 8.56 1.06 5.51% faster
pkpd/one_comp_mm_elim_abs.stan 30.01 29.83 1.01 0.59% faster
sir/sir.stan 128.31 135.96 0.94 -5.96% slower
gp_regr/gen_gp_data.stan 0.03 0.03 0.98 -1.7% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 3.01 1.0 -0.35% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.39 1.01 0.5% faster
arK/arK.stan 2.53 1.88 1.35 25.89% faster
arma/arma.stan 0.64 0.92 0.69 -44.29% slower
garch/garch.stan 0.63 0.63 1.0 -0.1% slower
Mean result: 1.00509786129

Jenkins Console Log
Blue Ocean
Commit hash: 91d51e1


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder dismissed bbbales2’s stale review July 2, 2021 23:33

Wds15 taking over

@wds15
Copy link
Contributor

wds15 commented Jul 6, 2021

The ball is in your court, right?

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Jul 6, 2021

Lol oh I was waiting on you, is there something in the review I missed?

@wds15
Copy link
Contributor

wds15 commented Jul 6, 2021

Sorry.. no, you did not miss things... I missed your update. The last thing to sort out is the chain id output labeling. See above.

Once we got that, then I need to check that pre-compilation of the services is still all doing its job. Quick to do by just timing the bernoulli example compile time.

Then we are good from my view.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.35 3.13 1.07 6.64% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.34% slower
eight_schools/eight_schools.stan 0.12 0.11 1.04 3.52% faster
gp_regr/gp_regr.stan 0.16 0.16 0.98 -1.91% slower
irt_2pl/irt_2pl.stan 5.87 5.87 1.0 -0.03% slower
performance.compilation 89.6 86.93 1.03 2.98% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.58 8.55 1.0 0.33% faster
pkpd/one_comp_mm_elim_abs.stan 29.39 30.23 0.97 -2.88% slower
sir/sir.stan 129.33 130.63 0.99 -1.0% slower
gp_regr/gen_gp_data.stan 0.03 0.03 1.01 0.6% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 3.01 0.99 -0.63% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.39 1.02 2.09% faster
arK/arK.stan 2.55 1.9 1.34 25.49% faster
arma/arma.stan 0.65 0.94 0.69 -45.63% slower
garch/garch.stan 0.64 0.64 1.0 -0.04% slower
Mean result: 1.00617287845

Jenkins Console Log
Blue Ocean
Commit hash: 569df85


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator Author

Alrighty @wds15 I also updated stan-dev/cmdstan#987 with this branch + the change for the init being set to 1.

I moved the chains argument to be under sample as for now that's the only one that uses it. I think the only other one that would use it would be variational and we can make chains an argument for variational once we have a version of VB that uses multiple chains. So calling is now like

examples/diamonds/diamonds sample num_samples=150 num_warmup=150 chains=8 \
 data file=examples/diamonds/diamonds2.json threads=8

@wds15
Copy link
Contributor

wds15 commented Jul 12, 2021

So one of my last tests is to compare the compile time with 2.27.0 is:

[21:35:41][sebi@sebastians-macbook-pro-1:~/work/cmdstan-2.27.0]$ time make examples/bernoulli/bernoulli

--- Translating Stan model to C++ code ---
bin/stanc  --o=examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.stan

--- Compiling, linking C++ code ---
clang++ -DSTAN_THREADS -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.75.0 -I stan/lib/stan_math/lib/sundials_5.7.0/include    -DBOOST_DISABLE_ASSERTS         -c -include-pch stan/src/stan/model/model_header.hpp.gch -x c++ -o examples/bernoulli/bernoulli.o examples/bernoulli/bernoulli.hpp
clang++ -DSTAN_THREADS -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.75.0 -I stan/lib/stan_math/lib/sundials_5.7.0/include    -DBOOST_DISABLE_ASSERTS               -Wl,-L,"/Users/sebi/work/cmdstan-2.27.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/sebi/work/cmdstan-2.27.0/stan/lib/stan_math/lib/tbb"      examples/bernoulli/bernoulli.o src/cmdstan/main.o        -Wl,-L,"/Users/sebi/work/cmdstan-2.27.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/sebi/work/cmdstan-2.27.0/stan/lib/stan_math/lib/tbb"   stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc_proxy.dylib -o examples/bernoulli/bernoulli
rm -f examples/bernoulli/bernoulli.o

real	0m6.449s
user	0m5.979s
sys	0m0.449s

and with this branch I got

--- Translating Stan model to C++ code ---
bin/stanc  --o=examples/bernoulli/bernoulli-2.hpp examples/bernoulli/bernoulli-2.stan

--- Compiling, linking C++ code ---
clang++ -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2020.3/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.75.0 -I stan/lib/stan_math/lib/sundials_5.7.0/include    -DBOOST_DISABLE_ASSERTS         -c -include-pch stan/src/stan/model/model_header_threads.hpp.gch -x c++ -o examples/bernoulli/bernoulli-2.o examples/bernoulli/bernoulli-2.hpp
clang++ -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2020.3/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.75.0 -I stan/lib/stan_math/lib/sundials_5.7.0/include    -DBOOST_DISABLE_ASSERTS               -Wl,-L,"/Users/sebi/work/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/sebi/work/cmdstan/stan/lib/stan_math/lib/tbb"      examples/bernoulli/bernoulli-2.o src/cmdstan/main_threads.o        -Wl,-L,"/Users/sebi/work/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/sebi/work/cmdstan/stan/lib/stan_math/lib/tbb"   stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_5.7.0/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc_proxy.dylib -o examples/bernoulli/bernoulli-2
rm -f examples/bernoulli/bernoulli-2.o

real	0m6.941s
user	0m6.432s
sys	0m0.458s

So possibly this branch does slow down things a tiny bit wrt to compilation (upon the second time to be clear), but I'd assume this is within noise.

@SteveBronder
Copy link
Collaborator Author

Yeah imo I'd expect possibly a tiny slowdown in compilation, but overall that seems to be fine.

You think this is ready to merge?

@wds15
Copy link
Contributor

wds15 commented Jul 13, 2021

From my memory this is good now. Let me do one last round over this.

This is my first bigger PR review for Stan, so that's why I take a bit more time. I'd guess we are good.

Copy link
Contributor

@wds15 wds15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, did not get to finish the review. Leaving these comments for now on the doc.

* @param[in] random_seed random seed for the random number generator
* @param[in] init_chain_id first chain id. The pseudo random number generator
will advance by for each chain by an integer sequence from `init_chain_id` to
`num_chains`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last chain id won't be num_chains, but init_chain_id+num_chains-1

@@ -19,14 +19,17 @@ namespace util {
* duplicated.
*
* @param[in] seed the random seed
* @param[in] chain the chain id
* @param[in] init_chain_id the chain id
* @param[in] chain_num For multi-chain, the ch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the doc "For multi-chain, the ch" is not meaningful to me. What about

@param[in] init_chain_id start of chain ids
 @param[in] chain_num chain id offset such that chain_id is init_chain_id+chain_num

Also... reading the comment here suggest to me that we should actually not allow for a chain id of 0, right??? (I mean the comment to the function)

Copy link
Contributor

@wds15 wds15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small laundry items, not more. Then we are good. Will look at the cmdstan num_thread thing now.

stepsize, stepsize_jitter, max_depth, delta, gamma, kappa, t0,
init_buffer, term_buffer, window, interrupt, logger, init_writer[0],
sample_writer[0], diagnostic_writer[0]);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else branch is not needed. A simple if for the num_chains=1 case is sufficient (same for the other functions).

interrupt, logger, init_writer[0], sample_writer[0],
diagnostic_writer[0]);
} else {
std::vector<std::unique_ptr<stan::io::dump>> unit_e_metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else is not needed.

* @param[in] random_seed random seed for the random number generator
* @param[in] init_chain_id first chain id. The pseudo random number generator
will advance by for each chain by an integer sequence from `init_chain_id` to
`num_chains`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc needs correction as before. The range is init_chain_id to init_chain_id+num_chains-1

* @param[in,out] sample_writer std vector of Writers for draws of each chain.
* @param[in,out] diagnostic_writer std vector of Writers for diagnostic
information of each chain.
* @param[in] num_chains The number of chains to run in parallel. `init`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_chains doc string should appear after Model so that it matches the order of the arguments.

init_buffer, term_buffer, window, interrupt, logger, init_writer[0],
sample_writer[0], diagnostic_writer[0]);
} else {
using sample_t = stan::mcmc::adapt_diag_e_nuts<Model, boost::ecuyer1988>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else not needed

@@ -19,14 +19,17 @@ namespace util {
* duplicated.
*
* @param[in] seed the random seed
* @param[in] chain the chain id
* @param[in] init_chain_id the chain id
* @param[in] chain_num For multi-chain, the ch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we actually need to have create_rng take a init_chain_id and a num_chains? Why don't we leave the function as is and just pre-compute the chain_id's in the parallel sample functions? That seems simpler to me.

interrupt, logger, init_writer[0], sample_writer[0],
diagnostic_writer[0]);
} else {
std::vector<std::unique_ptr<stan::io::dump>> unit_e_metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no else here needed

@@ -66,7 +71,7 @@ void run_adaptive_sampler(Sampler& sampler, Model& model,
auto start_warm = std::chrono::steady_clock::now();
util::generate_transitions(sampler, num_warmup, 0, num_warmup + num_samples,
num_thin, refresh, save_warmup, true, writer, s,
model, rng, interrupt, logger);
model, rng, interrupt, logger, chain_id, n_chain);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late catch...but why is it called n_chain here and not num_chain? num_chain looks more consistent with all the other variable conventions...would need to be addressed in the entire PR.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.08 3.11 0.99 -0.95% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.65% slower
eight_schools/eight_schools.stan 0.11 0.11 1.04 3.84% faster
gp_regr/gp_regr.stan 0.16 0.16 1.0 0.01% faster
irt_2pl/irt_2pl.stan 5.95 5.9 1.01 0.79% faster
performance.compilation 88.93 86.8 1.02 2.39% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.57 8.55 1.0 0.25% faster
pkpd/one_comp_mm_elim_abs.stan 29.99 30.02 1.0 -0.09% slower
sir/sir.stan 128.5 128.07 1.0 0.33% faster
gp_regr/gen_gp_data.stan 0.03 0.03 1.01 1.46% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 3.04 0.99 -0.74% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.38 1.04 3.67% faster
arK/arK.stan 2.55 1.89 1.35 26.01% faster
arma/arma.stan 0.65 0.92 0.7 -42.0% slower
garch/garch.stan 0.64 0.63 1.01 0.86% faster
Mean result: 1.01078599896

Jenkins Console Log
Blue Ocean
Commit hash: a11a48a


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@wds15
Copy link
Contributor

wds15 commented Jul 20, 2021

It's on me again?

@SteveBronder
Copy link
Collaborator Author

Yep go for it!

@SteveBronder
Copy link
Collaborator Author

@wds15 bump!

@wds15
Copy link
Contributor

wds15 commented Jul 23, 2021

Monday…sorry…

@SteveBronder
Copy link
Collaborator Author

All good!

@wds15
Copy link
Contributor

wds15 commented Jul 26, 2021

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants