@gpu.itersPerThread-cyclic bug fix #26218

vasslitvinov · 2024-11-06T02:43:51Z

This PR fixes a bug where the same loop iteration was executed by multiple GPU threads in some cases involving the attribute @gpu.itersPerThread with cyclic argument set to true. The test itersPerThread.chpl is now beefed up to detect this buggy behavior, should it occur again.

Semantics

This PR upholds the original intention of cyclic itersPerThread that maps the loop iterations onto the smallest number of GPU threads such that each thread executes at most itersPerThread loop iterations. In a discussion wtihin the group, we leaned against mapping the loop iterations onto ALL the threads that the GPU will fire up for the corresponding kernel, if this is different.

For example, consider a loop with 12 iterations and itersPerThread=4. They are mapped over 12/4=3 threads as follows:

thread	iterations
0	0, 3, 6, 9
1	1, 4, 7, 10
2	2, 5, 8, 11

The mapping could be different if the loop is also annotated with @gpu.blockSize(2). In this case, the GPU will execute ceil(3/2)=2 blocks and therefore 2*2=4 threads, so we could map the iterations in a cyclic manner to all 4 threads as follows:

thread	iterations
0	0, 4, 8
1	1, 5, 9
2	2, 6, 10
3	3, 7, 11

We chose against this option because it can change the number of threads that the user expects, which is undesirable in GPU programming. So in this example only 3 threads will execute loop iterations regardless of how many threads the GPU will fire.

Testing: paratest, gpu=amd, nvidia.

Signed-off-by: Vassily Litvinov <vasslitvinov@users.noreply.github.com>

itersPerThread to map to GPU threads correctly

59ca419

Signed-off-by: Vassily Litvinov <vasslitvinov@users.noreply.github.com>

e-kayrakli approved these changes Nov 6, 2024

View reviewed changes

vasslitvinov merged commit c8821f4 into chapel-lang:main Nov 6, 2024
7 checks passed

vasslitvinov deleted the fix-itersPerThread-cyclic branch November 6, 2024 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@gpu.itersPerThread-cyclic bug fix #26218

@gpu.itersPerThread-cyclic bug fix #26218

vasslitvinov commented Nov 6, 2024

@gpu.itersPerThread-cyclic bug fix #26218

@gpu.itersPerThread-cyclic bug fix #26218

Conversation

vasslitvinov commented Nov 6, 2024

Semantics