-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transposition to kernel generator #1769
Add transposition to kernel generator #1769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few Qs related to the templates here
/** | ||
* Represents a transpose in kernel generator expressions. | ||
* | ||
* Warning: transposing this expression is not supported! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is kind of confusing, like we can't do transpose(transpose(x))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, this is not true anymore. Removed.
* @tparam Derived derived type | ||
* @tparam T_a type of first argument | ||
* @tparam T_b type of second argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't match up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this may be a better place for clearer names like calling T
Expr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
explicit transpose_(T&& a) : base(std::forward<T>(a)) {} | ||
|
||
/** | ||
* Creates a deep copy of this expression. | ||
* @return copy of \c *this | ||
*/ | ||
inline transpose_<std::remove_reference_t<T>> deep_copy() { | ||
return transpose_<std::remove_reference_t<T>>{ | ||
std::get<0>(arguments_).deep_copy()}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda weird to me. Let's say we call
double b = 10;
transpose_<double> foo(b);
(or insert any non ref type). Then the constructor here is going to be
explicit transpose_(double&& a) : base(std::forward<double>(a)) {}
Where std::forward<double>(a)
here is actually going to give back an rvalue reference since forward with an lvalue as the template and rvalue as the type still produces an rvalue (see godbolt below).
I think you need to have a separate template here
template <typename OpT, require_same_t<T, OpT>* = nullptr>
explicit transpose_(OpT&& a) : base(std::forward<OpT>(a)) {}
The godbolt here shows this effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having rvalue if rvalue is passed in here is fine. Your example on godbolt does not compile because you have wrong template argument in S<int> a(b);
- this should be S<int&> a(b);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in your example in the comment you have an error - the constructor for
double b = 10;
transpose_<double> foo(b);
would be:
explicit transpose_(double& a) : base(std::forward<double&>(a)) {}
* @return part of kernel with code for this and nested expressions | ||
*/ | ||
inline kernel_parts generate(const std::string& i, const std::string& j, | ||
const std::string var_name_arg) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass this string by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason. Fixed.
Odd it looks like this is failing the cholesky test? Ping me when it's fixed and I'll do the review |
(also fyi feel free to ping me whenever to remind me to review) |
…stable/2017-11-14)
…gs/RELEASE_500/final)
…stable/2017-11-14)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@SteveBronder This is (finaly) ready for a review. |
Cool! I'll take a look tmrw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one comment on the .eval()
there which seems odd. Rest of it looks good!
@@ -81,7 +81,7 @@ inline matrix_cl<T> tri_inverse(const matrix_cl<T>& A) { | |||
zero_mat.template zeros<stan::math::matrix_cl_view::Entire>(); | |||
inv_padded.template zeros<stan::math::matrix_cl_view::Entire>(); | |||
if (tri_view == matrix_cl_view::Upper) { | |||
inv_mat = transpose(inv_mat); | |||
inv_mat = transpose(inv_mat).eval(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for .eval()
here? Shouldn't the kernel kick off once it's being assigned to an already constructed matrix_cl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that without eval we have aliasing issues, similar as Eigen has. Since source and destination is the same matrix threads that work on lower/upper triangular part can (and do) overwrite each other's input values with their outputs. With eval we create new matrix for destination. That is also how individual transpose kernel works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay if you can file an issue to write some docs about this (and then add the docs later) then I'm cool with approving this. We should probably have something like Eigen does if we also have the same aliasing issues.
You can either add that as a module (like in the below comment) or follow the instructions here on adding a new page
using Eigen::MatrixXd; | ||
using stan::math::matrix_cl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its usually good practice to put these inside of the tests than have them floating in global
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Also have yinz written up a paper about the kernel generate yet? This is starting to get v v v complicated and something like a module on the stan math site would be nice to have for this |
Yep, the paper is almost complete, but it does not go in much more details than the design doc I wrote. I will think about adding something to the site. Remind me, where is the source for that? |
The source for the site? I think if we had something like the below it would be fine
|
I think just including the paper is fine as long as you have some details on the different types of optimizations going on in here |
glad we did that design doc it made my understanding of this whole thing a lot more clear! |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
I added docs about aliasing to doxygen. Extended documentation about kernel generator will come in its own PR (I plan two more PRs before that). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Summary
Adds transposition to kernel generator. Existing transposition kernel is removed.
Tests
Added new tests for transposition. Existing tests in
opencl/prim/transpose_test.cpp
also test the new code.Side Effects
None.
Checklist
Math issue Implement OpenCL kernel generator #1342
Copyright holder: Tadej Ciglarič
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested