Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate Intel TBB #1376

Merged
merged 43 commits into from
Oct 10, 2019
Merged

integrate Intel TBB #1376

merged 43 commits into from
Oct 10, 2019

Conversation

wds15
Copy link
Contributor

@wds15 wds15 commented Oct 1, 2019

Summary

This PR integrate the Intel TBB with stan-math. Specifically this PR ensures the TBB's threadpool initialization and it switches map_rect C++11 threads implementation over to use the tbb::parallel_for (which has scheduling and a more efficient threadpool).

In order to keep the Intel TBB optional in the next release, the TBB code is only used whenever STAN_THREADS is set. This makes the TBB only mandatory whenever threads are being used. The Intel TBB will become mandatory in the near future regardless of the STAN_THREADS use (which has licensing implications for GPL-2 projects using Stan-math).

The initialisation of the TBB happens in two places:

  • General initialisation of the maximal concurrency level in the TBB threadpool. This is handled by the new init_threadpool_tbb function defined in stan/math/prim/core/init_threadpool_tbb.hpp. The number of threads used is defined in the STAN_NUM_THREADS environment variable.

  • For all TBB worker threads, the AD tape is initialised through an observer which is registered with the Intel TBB. Every time a worker thread enters the threadpool, the observer is called and ensures proper initialisation of the AD tape resource automatically. See stan/math/rev/core/init_chainablestack_tbb.hpp.

Note: This PR is a branch from the feature/intel-tbb-lib such that it includes these changes in addition to the actual changes introduced in addition. As reference branch the feature/intel-tbb-lib has been selected to make it easier to follow the actual changes introduced relative to the base branch.

Performance Tests

All reported times are in seconds.

threads Ubuntu i7 develop Ubuntu i7 TBB Ubuntu i7 speedup Ubuntu AMD TBB Ubuntu AMD develop Ubuntu AMD speedup
1 166 156 0.060 166.768 148.131 -0.125
2 91 84 0.077 90.459 155.651 0.418
4 80 45 0.438 49.331 93.949 0.475
8 76 42 0.448 28.109 80.378 0.650
12 22.028 72.776 0.697
14 22.421 72.742 0.697
16 23.166 66.685 0.653
18 23.056 69.001 0.666
24 23.525 80.779 0.709
32 23.047 87.403 0.736
threads Mac i9 develop Mac i9 TBB Mac i9 speedup
1 253 184 0.272
2 145 103 0.290
4 91 64 0.297
6 81 55 0.321
threads Win i7 develop Win i7 TBB Win i7 speedup
1 452 464 -0.027
2 222 230 -0.036
4 148 132 0.108
6 146 116 0.205

Tests

  • test/unit/math/prim/core/init_threadpool_tbb_test.cpp tests initialisation of the threadpool

  • test/unit/math/prim/core/init_threadpool_tbb_late_test.cpp tests initialisation of the threadpool in case a c++ user has already initialized the scheduler (in which case nothing happens)

  • test/unit/math/rev/mat/functor/gradient_test.cpp tests if the threaded gradient code runs fine when using the Intel TBB as backend. This tests if the effects of AD tape initialisation have taken place.

Side Effects

The initialisation of the TBB threadpool is only optional. That is, should client code not call the init_threadpool_tbb function, then the first execution of any TBB code will trigger default initialisation of the TBB threadpool which is the default behaviour of the TBB. In case the client code initialises first the TBB through explicit instantiation of the tbb::task_scheduler_init interface, then a subsequent call to the init_threadpool_tbb has no effect - this is again the default behaviour of the TBB (only the very first call is honoured).

For interfaces it is recommended to make a call to stan::math::init_threadpool_tbb() prior to using Stan-math. This will ensure the proper initialisation of the threadpool and ensure that no more STAN_NUM_THREADS threads are running in the threadpool.

Todo

  • agreement on design
  • a few more tests for init_threadpool_tbb
  • agree on the fate of stan::math::get_num_threads (moved to stan/math/prim/scal/fun/get_num_threads.hpp), see comments within

Checklist

  • Math issue #(issue number)

  • Copyright holder: Sebastian Weber

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@@ -78,6 +79,7 @@
#include <stan/math/prim/scal/fun/identity_free.hpp>
#include <stan/math/prim/scal/fun/if_else.hpp>
#include <stan/math/prim/scal/fun/inc_beta.hpp>
#include <stan/math/prim/scal/fun/init_threadpool_tbb.hpp>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these sit in prim/scal/fun? Maybe we should have a backend folder where we put both TBB and OpenCL stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to suggestions. Another option could be stan/math/core which is motivated by rev/core - but I am really open to where to stick this; I thought prim/scal/fun reflects the right dependency which is why I put it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think core is fine. I imagine there wont be much "TBB core" code outside of this which would justify making another folder. Other TBB code is going to live either inside of Stan code or specific stan math functions right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is a review-time detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map_rect which uses the TBB will, of course, stay where it is. So not much other TBB code (if any) will go into stan/math/core. It just needs to be included always.

// 2. Consider to switch from std::thread::hardware_concurrency() to
// tbb::task_scheduler_init::default_num_threads() once TBB is
// mandatory.
// 3. pull out of internal?
Copy link
Collaborator

@SteveBronder SteveBronder Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you need to use get_num_threads outside of this file? If so then yeah probs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future I can imagine that client code will want to find out about STAN_NUM_THREADS and at that point it would be good to have this in the stan::math namespace... but maybe that is not needed and we can keep it where it is.

inline int get_num_threads(int num_jobs) {
int num_threads = 1;
#ifdef STAN_THREADS
const char* env_stan_num_threads = std::getenv("STAN_NUM_THREADS");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: We should start thinking about how to add the number of threads and the OpenCL devices so they are flags at runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... not sure. There are pros and cons for both ways (runtime environment vs interface configuration). I am continuing here the system we started for map_rect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(moving all my comments up to here so that it's easier for others to read the code from the github page)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 50: personally, especially with multiple if elses, I like having brackets to show leaving and entering a scope

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs for hardware_concurrency

Number of concurrent threads supported. If the value is not well defined or not computable, returns ​0​

Should we have a check here and if the returned value is zero return 1?

Also does this return the number of logical or physical cores? I think previously we saw that hyperthreading etc was not that useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_num_jobs function is right now used inside map_rect, but its use will go away there once we switch over map_rect to use the TBB.

When we switch over map_rect to the TBB, we can then include this function inside the init_threadpool_tbb.hpp file, sure. Then we should also remove the num_jobs argument which is really not needed. The get_num_threads should only parse the STAN_NUM_THREADS and that's it.

The value -1 means "use all cores". I would change this to then return the TBB default for that. As I recall that is without hyper-threading as I saw that last time.

*
* @param num_jobs number of jobs
* @return number of threads to use
* @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable
Copy link
Member

@rok-cesnovar rok-cesnovar Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a final review-style comment than what you need right now, but this throws an std::invalid_argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you are right... will fix.

* @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable
* is invalid
*/
inline bool init_threadpool_tbb(tbb::stack_size_type stack_size = 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in someone calls this more than once? Should this throw, return silently without changing anything? Is there a TBB API call you can make to check whether the threadpool was initialized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what is_active does? Should we return immediately if is_active is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment 1: First note that the tbb::task_scheduler_init is static. So you ever create a single instance of this thing. Any further call is basically ignored. This is in line with the TBB way of working in this case. Repeated initialisation is not possible. You do it once and that's it - no more reconfiguration after that. If you want to have differently sized working areas then you can use so-called task_arenas.

Comment 2: Why should we return immediately? In case the static init object is active, then this one was used (and the parsing has worked) so that we will just return again true if it was true before.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple quick comments, I'll look over the tests more tonight or tomorrow

* @throws std::runtime_error if the value of STAN_NUM_THREADS env. variable
* is invalid
*/
inline bool init_threadpool_tbb(tbb::stack_size_type stack_size = 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I'm clear, stack_size set the amount of memory each thread has at initialization right? Is there a reason to say 0 as the default and not something like 64 just to give them a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 let's the TBB choose that initial size. We should leave the default here as is, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah totally missed that in the docs my bad


static tbb::task_scheduler_init tbb_scheduler(tbb_max_threads, stack_size);

return tbb_scheduler.is_active();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From docs

By default, Intel TBB 2.2 automatically creates a task scheduler the first time that a thread uses task scheduling services and destroys it when the last such thread exits.

Does this function have a side effect somewhere that actually turns on some global scheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side effect is initialisation of the threadpool with the scheduler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add that as a @note in the doxygen

namespace stan {
namespace math {

static std::mutex cout_mutex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stuff like this I find it helpful to have a // FIXME(DELETE) somewhere just so I remember to delete this stuff

static std::mutex cout_mutex;

// initialize AD tape for TBB threads
struct ad_tape_observer : public tbb::task_scheduler_observer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General docs would be nice. With this stuff especially I, Rok, pretty much any other reviewer will be reading a lot of these methods and schemes for the first time. So a little doc here is nice to lay down intent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure... I was in a hurry yesterday night... can add some URL refs to TBB doc, for example.

stan/math/rev/core/init_chainablestack_tbb.hpp Outdated Show resolved Hide resolved
}

private:
std::unordered_map<std::thread::id, ChainableStack*> ad_tape_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You can use using to reuse this type in the code for a little nicer readability

  2. Why not a vector pair? I've only really found unordered_map to actually be faster when you have a pretty large N

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unordered_map is just a convenient way to have an index set which is indexed with std::thread_id keys. Speed is totally irrelevant here (this is called just once per created thread for the entire life-time of the threading in the pool). So I want to use std::thread_id as a key.

@rok-cesnovar
Copy link
Member

From a design perspective I agree with how this is done. I dont think you can make a more clean solution here. There are minor review-time issues with braces, docs and such but I think Steve has already flagged most of those.

Initializing the threadpool is straightforward now with another layer and the details of get_num_threads hidden away. The open question there is what to do if someone calls the init threadpool more than once.

The chainable stack is also pretty straightforward, with the only question, I think, being whether unordered map is the way to go or not.

Can we write any sort of tests for the get_num_threads function. That is going to be tricky probably.

@rok-cesnovar
Copy link
Member

How much additional work is changing map_rect going to be? I am starting to think this is so close we need to merge the Cmdstan and Stan Math TBB PRs, the former does still need some work. Otherwise reviewing and preparing all these branches is going to be a huge pain for everyone.

For the sake of complying with our agreement made at the Stan Math meeting, I am stating here that take care of reverting all PRs if we, for some reason, fail to meet the demands we set .

@wds15
Copy link
Contributor Author

wds15 commented Oct 1, 2019

There are three cases which will lead to no effect for a call to init_threadpool_tbb:

  • Initializing with tbb::task_scheduler_init the TBB threadpool before calling init_threadpool_tbb. In that case the return will be false for the active state.
  • Calling init_threadpool_tbb repeatedly will lead to initialising the threadpool on the first time. Then the active state returned will always be true. Any call after the first will not change anything.
  • If you use any TBB code which leads to tasks being dispatched (so any access to scheduling facilities), then the TBB code will on the spot initialise its threadpool using defaults (use all cores). In that case calling init_threadpool_tbb will have no effect and the active state is false.

This is the default behaviour of the TBB. I don't see why we should deviate from that. I can add some URLs to the TBB doc if that helps?

get_num_threads is already well tested in test/unit//math/prim/mat/functor/num_threads_test.cpp... those need to be adapted when we take away the num_jobs argument.

For the observer: I want to use std::thread_id as a key and as such I have to use an unordered_map...I don't see another way doing that otherwise conveniently.

I can certainly add the map_rect code here. That is straightforward. I just thought it would be better to do it in a follow-up PR, no? Either way is fine with me.

I will address these things later tonight. Let me know if my responses don't make sense / you think differently / I misunderstood.

With this PR I would also like to update the wiki page which talks about threading.

@SteveBronder
Copy link
Collaborator

Also it looks like the design here differs from the design doc's ScopedChainableStack?

https://github.com/wds15/design-docs/blob/parallel_autodiff/designs/0003-parallel_autodiff.md#step-2-independent-ad

@wds15
Copy link
Contributor Author

wds15 commented Oct 1, 2019

What do you think differs? I am relatively sure that things are consistent with one another (maybe not the same... the parallel design RFC is months old).

@rok-cesnovar
Copy link
Member

There are three cases which will lead to no effect for a call to init_threadpool_tbb:
This is the default behaviour of the TBB. I don't see why we should deviate from that.

I just wanted to double check what would happen. A line in the doc stating something like "calling this after the threadpool has already been initialized will have no effect" would suffice in my opinion.

get_num_threads is already well tested in test/unit//math/prim/mat/functor/num_threads_test.cpp... those need to be adapted when we take away the num_jobs argument.

👍

I can certainly add the map_rect code here. That is straightforward. I just thought it would be better to do it in a follow-up PR, no? Either way is fine with me.

I would do it here. If you say the map_rect changes are small, I think adding them here makes sense. The only non-trivial thing here is the observer code.
I would then run the warfarin testbench we did for TLSv* (I can run it on Ubuntu and Windows) to double check everything is as expected. Would that work for you?

@wds15
Copy link
Contributor Author

wds15 commented Oct 1, 2019

Adding the map_rect things here is easy. Ok. Then I suggest to do:

  • address things mentioned (including moving init things to stan/math/core)
  • add the map_rect with the TBB
  • once that is merged we do in another PR moving map_rect back into prim where it belongs. If you want me to do that in this PR, then you will see a few files being moved around... what do you prefer? This PR for this moving or in another? There are no more references to rev-things in the map_rect using the TBB when we change.

Performance testing? Ok... if you think its needed.

Cool! This thing is landing!

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Oct 1, 2019

  • once that is merged we do in another PR moving map_rect back into prim

Yes, do that separately.

Performance testing? Ok... if you think its needed.

Yes, I think it would be great to have them here, this is the first thing with TBB. We have the scripts ready and it should not take that much time. I would take it more as an additional advertisement for TBB in Stan. I will prepare the scripts, I think that is reasonable as I am the one requesting the tests and to take some burden of of you.

Cool! This thing is landing!

🚀

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.96)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.93)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.95)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.07)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.9)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98147692471
Commit hash: 2479cce900cd81e521a74808da13b051bd996495

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.95)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.98)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.99428447690
Commit hash: 81cbff2

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.92)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.94)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.08)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.0)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.94)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98153594776
Commit hash: 395372d47f28cee7f4937a8c8d5585907a7a015b

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 1.0)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.87)
Result: 0.99192952134
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.04)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.84)
Result: 0.99333679134
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.03)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.96)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.86)
Result: 0.98738357614
Commit hash: 3d337877304b04edbe7b8e07ab1d425b49b870ae

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.98)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.03)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.02)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 0.96)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.84)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.98)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.97)
Result: 0.98710218725
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.97)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.93)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.95)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.3)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.07)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.02)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.14)
Result: 1.02407227782
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.77)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.86)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.6)
(performance.compilation, 1.17)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.11)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.08)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.2)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.06)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.12)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.98)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.2)
Result: 1.07446600559
Commit hash: 0c6f6c89ae4e3855eacbd2f02905881c2a49e9d6

@SteveBronder
Copy link
Collaborator

Why does this keep going off?

@rok-cesnovar
Copy link
Member

We are running a cmdstan PR with a submodule reference to this PR. And that also sends the cmdstan perf results here.

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.03)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.98)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.02)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.04)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.92)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.99)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.99)
Result: 0.99942056911
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

@rok-cesnovar
Copy link
Member

I am restarting this again. It got stuck in the Stan upstream part with

Cannot contact EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67): hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67) failed. The channel is closing down or has closed down

Could not connect to EC2 (Windows) - Build, Docker, WSL (i-0f525173fa0eaaf67) to send interrupt signal to process

and there was no other way than simply restarting...

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.98)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 1.07)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.02)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.99)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.09)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.03)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.94)
Result: 1.01064086204
Commit hash: e0eeced4a9440de30c3fef7db1c2faf100d7142d

@wds15
Copy link
Contributor Author

wds15 commented Oct 9, 2019

Again the same error? Is there a Jenkins / Windows problem or a problem on this branch?

@rok-cesnovar
Copy link
Member

Jenkins/Windows. I restarted the Always run part 2 now.

@wds15
Copy link
Contributor Author

wds15 commented Oct 9, 2019

Will this ever go through? I hope this time we go through the Windows stuff...

@wds15
Copy link
Contributor Author

wds15 commented Oct 9, 2019

ok let me merge develop into this one since the allow spaces went in.

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Oct 9, 2019

Windows seems to be fixed now, shouldn't happen again!

@wds15 It failed at the testing stage ( because of Jenkins ), restarted it here: https://jenkins.mc-stan.org/blue/organizations/jenkins/Math%20Pipeline/detail/PR-1376/34/pipeline/

@rok-cesnovar
Copy link
Member

Yeah, but it still points to PR-744 that was merged and doesnt exist anymore. So it will def fail again. It seems that Windows Jenkins is stable now so hopefully this is all over in the morning CET time.

@wds15
Copy link
Contributor Author

wds15 commented Oct 9, 2019

thanks for restarting! I need to merge now develop into this... but looking at stan downstream tests it looks to me as if we need there possibly as well the tbb.dll copy thing?

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.99)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.98)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.03)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.99)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.02)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.02)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.05)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.89)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.01)
Result: 0.99778605623
Commit hash: cd626022fdee22a0603909203220f6ac79b68f42

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.94)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.01)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.03)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.92)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.88)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.0)
Result: 0.98252425644
Commit hash: fd680d3

Copy link
Member

@rok-cesnovar rok-cesnovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets get this merged! Thank you @wds15!!

@wds15 wds15 merged commit 731e9a5 into develop Oct 10, 2019
@serban-nicusor-toptal serban-nicusor-toptal added this to the 3.0.0 milestone Oct 18, 2019
@mcol mcol deleted the feature/intel-tbb-init branch January 5, 2020 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants