Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@gpu.itersPerThread-cyclic bug fix #26218

Merged

Conversation

vasslitvinov
Copy link
Member

This PR fixes a bug where the same loop iteration was executed by multiple GPU threads in some cases involving the attribute @gpu.itersPerThread with cyclic argument set to true. The test itersPerThread.chpl is now beefed up to detect this buggy behavior, should it occur again.

Semantics

This PR upholds the original intention of cyclic itersPerThread that maps the loop iterations onto the smallest number of GPU threads such that each thread executes at most itersPerThread loop iterations. In a discussion wtihin the group, we leaned against mapping the loop iterations onto ALL the threads that the GPU will fire up for the corresponding kernel, if this is different.

For example, consider a loop with 12 iterations and itersPerThread=4. They are mapped over 12/4=3 threads as follows:

thread iterations
0 0, 3, 6, 9
1 1, 4, 7, 10
2 2, 5, 8, 11

The mapping could be different if the loop is also annotated with @gpu.blockSize(2). In this case, the GPU will execute ceil(3/2)=2 blocks and therefore 2*2=4 threads, so we could map the iterations in a cyclic manner to all 4 threads as follows:

thread iterations
0 0, 4, 8
1 1, 5, 9
2 2, 6, 10
3 3, 7, 11

We chose against this option because it can change the number of threads that the user expects, which is undesirable in GPU programming. So in this example only 3 threads will execute loop iterations regardless of how many threads the GPU will fire.

Testing: paratest, gpu=amd, nvidia.

Signed-off-by: Vassily Litvinov <vasslitvinov@users.noreply.github.com>
@vasslitvinov vasslitvinov merged commit c8821f4 into chapel-lang:main Nov 6, 2024
7 checks passed
@vasslitvinov vasslitvinov deleted the fix-itersPerThread-cyclic branch November 6, 2024 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants