Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#5635: CUDA: Add Overloads for parallel_scan with return value for ThreadVectorRange #6235

Merged

Conversation

thearusable
Copy link
Contributor

@thearusable thearusable commented Jun 22, 2023

Related to #5635 #6453
Depends on #6292 (merged)


Edit: this now also contains a fix for Cuda parallel_scan ThreadVectorRange range (4a819b6).

@thearusable thearusable requested a review from fnrizzi June 22, 2023 10:34
@thearusable thearusable self-assigned this Jun 22, 2023
@thearusable thearusable requested a review from cz4rs June 22, 2023 13:08
@thearusable thearusable force-pushed the 5635-cuda-parallel-scan-with-value branch from aabab23 to f7de16e Compare July 6, 2023 16:15
@thearusable thearusable marked this pull request as ready for review July 6, 2023 16:16
@thearusable thearusable changed the title [WIP] #5635: CUDA: Add Overloads for parallel_scan with return value #5635: CUDA: Add Overloads for parallel_scan with return value Jul 6, 2023
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most duplications are not necessary.

core/src/Cuda/Kokkos_Cuda_Team.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_HostThreadTeam.hpp Outdated Show resolved Hide resolved
core/unit_test/incremental/Test16_ParallelScan.hpp Outdated Show resolved Hide resolved
@thearusable thearusable force-pushed the 5635-cuda-parallel-scan-with-value branch 2 times, most recently from 3379b25 to fc9d4fc Compare July 10, 2023 10:13
@thearusable thearusable changed the title #5635: CUDA: Add Overloads for parallel_scan with return value #5635: [2/X] CUDA: Add Overloads for parallel_scan with return value Jul 10, 2023
@thearusable thearusable force-pushed the 5635-cuda-parallel-scan-with-value branch 2 times, most recently from cc84fdd to fd767b8 Compare July 13, 2023 10:23
@thearusable thearusable requested a review from masterleinad July 13, 2023 10:25
Copy link
Contributor

@fnrizzi fnrizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest we make this PR only about adding the TeamThreadRangeBoundariesStruct overloads

@fnrizzi
Copy link
Contributor

fnrizzi commented Jul 17, 2023

why is this PR touching this file: core/src/impl/Kokkos_HostThreadTeam.hpp ?

@thearusable thearusable force-pushed the 5635-cuda-parallel-scan-with-value branch from fd767b8 to aa58b51 Compare July 24, 2023 13:59
@thearusable thearusable changed the title #5635: [2/X] CUDA: Add Overloads for parallel_scan with return value #5635: CUDA: Add Overloads for parallel_scan with return value for ThreadVectorRange Jul 24, 2023
@thearusable thearusable force-pushed the 5635-cuda-parallel-scan-with-value branch from aa58b51 to d2a3b91 Compare July 24, 2023 14:41
@masterleinad
Copy link
Contributor

Marking as draft since dependencies have not been merged.

@masterleinad masterleinad marked this pull request as draft August 9, 2023 21:20
@masterleinad masterleinad mentioned this pull request Sep 14, 2023
@cz4rs cz4rs force-pushed the 5635-cuda-parallel-scan-with-value branch from d2a3b91 to efebb10 Compare September 19, 2023 08:30
@cz4rs cz4rs force-pushed the 5635-cuda-parallel-scan-with-value branch from efebb10 to 4a266d8 Compare September 19, 2023 08:44
masterleinad
masterleinad previously approved these changes Sep 19, 2023
@masterleinad masterleinad marked this pull request as ready for review September 19, 2023 11:39
@cz4rs
Copy link
Contributor

cz4rs commented Sep 19, 2023

This one actually fails the tests:

/var/jenkins/workspace/Kokkos/core/unit_test/TestTeamVector.hpp:1037: Failure
Value of: (TestTeamVector::Test<Kokkos::Cuda>(12))
  Actual: false
Expected: true
[  FAILED  ] cuda.team_vector (1122 ms)

Converting back to draft for now, I'll look into it.

@cz4rs cz4rs marked this pull request as draft September 19, 2023 11:49
@masterleinad masterleinad dismissed their stale review September 19, 2023 13:30

Doesn't work.

@masterleinad
Copy link
Contributor

This one actually fails the tests:

/var/jenkins/workspace/Kokkos/core/unit_test/TestTeamVector.hpp:1037: Failure
Value of: (TestTeamVector::Test<Kokkos::Cuda>(12))
  Actual: false
Expected: true
[  FAILED  ] cuda.team_vector (1122 ms)

Converting back to draft for now, I'll look into it.

I pushed a fix.

@masterleinad masterleinad marked this pull request as ready for review September 19, 2023 21:03
@masterleinad masterleinad force-pushed the 5635-cuda-parallel-scan-with-value branch from aa05188 to 5f279b0 Compare September 19, 2023 23:15
@fnrizzi
Copy link
Contributor

fnrizzi commented Sep 21, 2023

retest this please

@masterleinad
Copy link
Contributor

gcc-8.4.0 is timing out but this pull request shouldn't affect that build anyway.

@crtrott crtrott merged commit 68a97a1 into kokkos:develop Sep 27, 2023
@ajpowelsnl ajpowelsnl assigned nmm0 and unassigned nmm0 May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants