-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#5635: CUDA: Add Overloads for parallel_scan with return value for ThreadVectorRange #6235
#5635: CUDA: Add Overloads for parallel_scan with return value for ThreadVectorRange #6235
Conversation
aabab23
to
f7de16e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like most duplications are not necessary.
3379b25
to
fc9d4fc
Compare
cc84fdd
to
fd767b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest we make this PR only about adding the TeamThreadRangeBoundariesStruct
overloads
why is this PR touching this file: core/src/impl/Kokkos_HostThreadTeam.hpp ? |
fd767b8
to
aa58b51
Compare
aa58b51
to
d2a3b91
Compare
Marking as draft since dependencies have not been merged. |
d2a3b91
to
efebb10
Compare
efebb10
to
4a266d8
Compare
This one actually fails the tests:
Converting back to draft for now, I'll look into it. |
I pushed a fix. |
aa05188
to
5f279b0
Compare
retest this please |
|
Related to #5635 #6453
Depends on #6292(merged)Edit: this now also contains a fix for Cuda parallel_scan ThreadVectorRange range (4a819b6).