-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Parallel Prototype #1616
[WIP] Parallel Prototype #1616
Conversation
…into proto-parallel-v3
…into proto-parallel-v3
…into proto-parallel-v3
…into proto-parallel-v3
…rograms to handle general case
@wds15 @SteveBronder I got initial docs for this up at stan-dev/docs#161, and a very small case study here: https://github.com/bbbales2/cmdstan_map_rect_tutorial/blob/reduce_sum/reduce_sum_tutorial.Rmd (copied from a Richard McElreath example @mitzimorris recommended: https://github.com/rmcelreath/cmdstan_map_rect_tutorial/blob/master/README.md). Example turned out to have a 7x speedup on 8 cores which is convenient lol, but maybe a bit too ambitious :/. Feel free to edit these docs and push as you see fit. It's all first-pass stuff. |
Nice! As for the Windows threading question in there: Yes, threading works just fine on Windows. |
@SteveBronder adj_jac_apply is in. More pulls more pulls :D! |
@bbbales2 cool doc for users which you started. I forked that, done some additions here and there and filed a PR against your repository. Have a look if you like that. |
@SteveBronder you put together a nice todo list to get this in...but I don't find that any more. Still, I recall my name showed up for some doc...what exactly you think I should write up? |
When I went into doc stuff it was mostly just the stuff I knew / could dictate from the code so I wasn't sure if a lot of it was nuanced enough. If you can read through it and add additional info you think is good that would be cool. |
@wds15 The list I got from Mitzi was (I'm heavily editing it to reflect things I did):
I think the docs wait on the design-docs. (edit: feel free to go edit them, just warning that a large bit of the docs are copied from the design-docs, so those changes will cascade downwards) Inline docs, feel free to work on. In terms of code, we need a branch with the accumulate_adjoints/save_varis/deep_copy_vars/count_vars functions, and also we need tests and checks added to reduce_sum to validate the grainsize argument. If you could double check that we have all the necessary tests in place for reduce_sum that'd be cool too. I kinda just threw those together and didn't think much more about them. |
I'm personally just talking about inline docs.
These are in #1800 |
When I merged in develop it had me delete these tests from the Jenkins file:
I'm just posting this here so I remember to investigate why those disappeared later. Felt weird deleting them manually here. Usually these git merges happen automatically and I wouldn't worry about it. |
Probably due to PR #1789 This PR (parallel prototype) introduces some Jenkinsfile formatting for some reason. Hence the conflict. |
@rok-cesnovar thanks, yeah that looks like it. I just didn't want to accidentally mess anything up! |
…rs of different things for reduce_sum (for design-doc pull request #17)
…gs/RELEASE_500/final)
@wds15 @SteveBronder alright I think I got the right checks and tests in for reduce_sum. It should be complete. |
(of course it's probably not complete I'm just saying it's fair to go through it with a fine toothed comb and if anything is missing it's a problem) |
Ooops forgot the deterministic version of the function. Can get that later today. |
|
I think I have an idea how to test the deterministic thing. The promise is that the splits are always of the same size and at the same locations. We can create a mock reducer object which raises an exception if our expectations are not met. Does that make sense? |
That is a good point. That is what we want to guarantee. I'll try this. |
I added I called it I wasn't in on deterministic cause it makes the execution sound deterministic. But the breaking into pieces should be deterministic. But maybe still |
Closing this since the last PR is open |
This is just a PR for discussion of the tbb parallelism prototype
From Ben's last comment on discourse chat below
https://discourse.mc-stan.org/t/parallel-autodiff-v3/12474/40
Alright, stuff is up (proto-parallel-v3...cleanup/proto-parallel-v3).
I screwed up autodiff on the sliced arguments. I was scared about what would happen if all the vars in the sliced arguments had the same vari, so I just didn’t implement that.
I broke the things into two functions – parallel_sum_var and parallel_sum. I think we need enable_if logic or something to distinguish between these things properly.
I got rid of the init argument.
I got rid of local_operands_and_partials.
Arguments are stored as a tuple of const references:
math/stan/math/rev/functor/reduce_sum.hpp
Line 24 in cbcf42f
Adjoints are accumulated in an eigen vector:
math/stan/math/rev/functor/reduce_sum.hpp
Line 27 in cbcf42f
The member function ‘deep_copy’ of the recursive_reducer that handles the deep copies:
math/stan/math/rev/functor/reduce_sum.hpp
Line 114 in cbcf42f
The member function ‘accumulate_adjoints’ collects the the adjoints from the nested autodiff and arranges them in the Eigen::VectorXd:
math/stan/math/rev/functor/reduce_sum.hpp
Line 131 in cbcf42f
The reduce function accumulates the sum and the adjoint:
math/stan/math/rev/functor/reduce_sum.hpp
Line 140 in cbcf42f
It seems like operator() of a single recursive_reducer might get called multiple times though? I was surprised by this.
The member function count_memory of parallel_sum_rev_impl counts the number of vars in all the input arguments so it can allocate memory to store the vari pointers:
math/stan/math/rev/functor/reduce_sum.hpp
Line 217 in cbcf42f
The member function save_varis of parallel_sum_rev_impl copies the vari pointers from the arguments into an array for the precomputed vari function:
math/stan/math/rev/functor/reduce_sum.hpp
Line 236 in cbcf42f
This should allow for Stan arguments real, int, real[], int[], vector, row_vector, matrix to be passed as the shared arguments. I think we can generalize to get arrays of arrays of whatever without too much trouble, but given I’m not actually testing much of any of that, I didn’t want to push the features yet.
It would require changes to deep_copy/count/accumulate_adjoints/save_varis.
We need to go through this and think out the references/forwarding stuff. I’ve forgotten how && works and if we need it. I wasn’t strictly being const correct when I coded this as first and it led to memory problems cause the deep copy wasn’t actually copying, so gotta be careful with that stuff I guess.
I had an implementation of apply in an old pull that got closed so I added it as a function here: https://github.com/stan-dev/math/blob/cbcf42fad798015b317ba6dab7a6ddc5d9983aa2/stan/math/prim/scal/functor/apply.hpp . It comes with tests too. I think I looked around at a bunch of stuff when I did this one and tried to get the forwarding right or whatnot. I should probably have a citation somewhere in the source code :/.