Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow return statements for GPU-only kernels #538

Merged
merged 3 commits into from
Oct 25, 2024
Merged

Allow return statements for GPU-only kernels #538

merged 3 commits into from
Oct 25, 2024

Conversation

pxl-th
Copy link
Member

@pxl-th pxl-th commented Oct 18, 2024

It's a bit annoying to not be able to use return statements.
Currently you have to do something like this, but it looks weird:

@kernel function ker!(args...)
    i = @index(Global)
    _ker!(i, args...)
end

function _ker!(i, args...)
    ...
end

Copy link
Contributor

github-actions bot commented Oct 18, 2024

Benchmark Results

main b3f6676... main/b3f66760325ea4...
saxpy/default/Float16/1024 0.541 ± 0.0044 μs 0.53 ± 0.0041 μs 1.02
saxpy/default/Float16/1048576 0.173 ± 0.0024 ms 0.175 ± 0.0037 ms 0.993
saxpy/default/Float16/16384 3.16 ± 0.053 μs 3.13 ± 0.05 μs 1.01
saxpy/default/Float16/2048 0.708 ± 0.0062 μs 0.7 ± 0.0064 μs 1.01
saxpy/default/Float16/256 0.411 ± 0.0028 μs 0.404 ± 0.0035 μs 1.02
saxpy/default/Float16/262144 0.044 ± 0.00071 ms 0.0457 ± 0.0026 ms 0.964
saxpy/default/Float16/32768 5.87 ± 0.1 μs 5.8 ± 0.09 μs 1.01
saxpy/default/Float16/4096 1.09 ± 0.015 μs 1.09 ± 0.017 μs 1
saxpy/default/Float16/512 0.461 ± 0.0032 μs 0.451 ± 0.0037 μs 1.02
saxpy/default/Float16/64 0.382 ± 0.0027 μs 0.381 ± 0.0044 μs 1
saxpy/default/Float16/65536 11.5 ± 0.2 μs 11.5 ± 0.22 μs 1
saxpy/default/Float32/1024 0.44 ± 0.0063 μs 0.436 ± 0.006 μs 1.01
saxpy/default/Float32/1048576 0.235 ± 0.013 ms 0.231 ± 0.012 ms 1.02
saxpy/default/Float32/16384 2.66 ± 1 μs 3.17 ± 0.98 μs 0.839
saxpy/default/Float32/2048 0.551 ± 0.012 μs 0.541 ± 0.012 μs 1.02
saxpy/default/Float32/256 0.391 ± 0.0049 μs 0.39 ± 0.0065 μs 1
saxpy/default/Float32/262144 0.0581 ± 0.003 ms 0.0548 ± 0.0035 ms 1.06
saxpy/default/Float32/32768 5.33 ± 1.6 μs 6.1 ± 1.6 μs 0.874
saxpy/default/Float32/4096 0.918 ± 0.018 μs 0.91 ± 0.054 μs 1.01
saxpy/default/Float32/512 0.405 ± 0.0041 μs 0.403 ± 0.0044 μs 1
saxpy/default/Float32/64 0.381 ± 0.004 μs 0.372 ± 0.0032 μs 1.02
saxpy/default/Float32/65536 12.3 ± 1.4 μs 12.5 ± 1.7 μs 0.985
saxpy/default/Float64/1024 0.558 ± 0.013 μs 0.532 ± 0.011 μs 1.05
saxpy/default/Float64/1048576 0.485 ± 0.024 ms 0.508 ± 0.051 ms 0.955
saxpy/default/Float64/16384 6.41 ± 1.2 μs 5.15 ± 1.5 μs 1.25
saxpy/default/Float64/2048 0.922 ± 0.0082 μs 0.915 ± 0.023 μs 1.01
saxpy/default/Float64/256 0.408 ± 0.0047 μs 0.403 ± 0.0046 μs 1.01
saxpy/default/Float64/262144 0.114 ± 0.0068 ms 0.118 ± 0.008 ms 0.964
saxpy/default/Float64/32768 13.1 ± 2.2 μs 12.7 ± 1.1 μs 1.03
saxpy/default/Float64/4096 1.71 ± 0.27 μs 1.6 ± 0.24 μs 1.07
saxpy/default/Float64/512 0.443 ± 0.0054 μs 0.434 ± 0.006 μs 1.02
saxpy/default/Float64/64 0.387 ± 0.0035 μs 0.385 ± 0.003 μs 1.01
saxpy/default/Float64/65536 28.2 ± 3 μs 29.2 ± 1.5 μs 0.965
saxpy/static workgroup=(1024,)/Float16/1024 1.91 ± 0.025 μs 1.91 ± 0.022 μs 1
saxpy/static workgroup=(1024,)/Float16/1048576 0.174 ± 0.013 ms 0.161 ± 0.0071 ms 1.08
saxpy/static workgroup=(1024,)/Float16/16384 4.26 ± 0.14 μs 4.17 ± 0.13 μs 1.02
saxpy/static workgroup=(1024,)/Float16/2048 2.09 ± 0.032 μs 2.09 ± 0.038 μs 0.998
saxpy/static workgroup=(1024,)/Float16/256 2.54 ± 0.028 μs 2.55 ± 0.024 μs 0.996
saxpy/static workgroup=(1024,)/Float16/262144 0.0422 ± 0.0019 ms 0.0438 ± 0.003 ms 0.965
saxpy/static workgroup=(1024,)/Float16/32768 6.73 ± 0.26 μs 6.58 ± 0.25 μs 1.02
saxpy/static workgroup=(1024,)/Float16/4096 2.39 ± 0.032 μs 2.4 ± 0.029 μs 0.999
saxpy/static workgroup=(1024,)/Float16/512 2.98 ± 0.052 μs 3 ± 0.048 μs 0.992
saxpy/static workgroup=(1024,)/Float16/64 2.24 ± 0.041 μs 2.26 ± 0.039 μs 0.991
saxpy/static workgroup=(1024,)/Float16/65536 12.7 ± 0.53 μs 12.6 ± 0.53 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1024 1.93 ± 0.021 μs 1.95 ± 0.031 μs 0.993
saxpy/static workgroup=(1024,)/Float32/1048576 0.236 ± 0.013 ms 0.254 ± 0.022 ms 0.93
saxpy/static workgroup=(1024,)/Float32/16384 4.18 ± 0.72 μs 4.22 ± 0.72 μs 0.991
saxpy/static workgroup=(1024,)/Float32/2048 2.08 ± 0.03 μs 2.08 ± 0.031 μs 1
saxpy/static workgroup=(1024,)/Float32/256 2.4 ± 0.037 μs 2.44 ± 0.041 μs 0.984
saxpy/static workgroup=(1024,)/Float32/262144 0.062 ± 0.0032 ms 0.0613 ± 0.0017 ms 1.01
saxpy/static workgroup=(1024,)/Float32/32768 7.62 ± 1.1 μs 7.33 ± 0.6 μs 1.04
saxpy/static workgroup=(1024,)/Float32/4096 2.34 ± 0.043 μs 2.37 ± 0.047 μs 0.989
saxpy/static workgroup=(1024,)/Float32/512 2.4 ± 0.04 μs 2.45 ± 0.039 μs 0.98
saxpy/static workgroup=(1024,)/Float32/64 2.63 ± 8.1 μs 2.65 ± 7.7 μs 0.992
saxpy/static workgroup=(1024,)/Float32/65536 16.5 ± 1.9 μs 15.4 ± 1.1 μs 1.07
saxpy/static workgroup=(1024,)/Float64/1024 2.03 ± 0.036 μs 2.03 ± 0.027 μs 1
saxpy/static workgroup=(1024,)/Float64/1048576 0.542 ± 0.054 ms 0.52 ± 0.038 ms 1.04
saxpy/static workgroup=(1024,)/Float64/16384 7.79 ± 1.2 μs 7.19 ± 0.68 μs 1.08
saxpy/static workgroup=(1024,)/Float64/2048 2.3 ± 0.047 μs 2.3 ± 0.052 μs 0.999
saxpy/static workgroup=(1024,)/Float64/256 2.4 ± 0.057 μs 2.43 ± 0.062 μs 0.986
saxpy/static workgroup=(1024,)/Float64/262144 0.122 ± 0.0092 ms 0.121 ± 0.0077 ms 1.01
saxpy/static workgroup=(1024,)/Float64/32768 15.7 ± 2 μs 15.6 ± 1.4 μs 1.01
saxpy/static workgroup=(1024,)/Float64/4096 3.04 ± 0.23 μs 2.92 ± 0.22 μs 1.04
saxpy/static workgroup=(1024,)/Float64/512 2.38 ± 0.046 μs 2.41 ± 0.052 μs 0.991
saxpy/static workgroup=(1024,)/Float64/64 2.38 ± 4.6 μs 2.4 ± 18 μs 0.991
saxpy/static workgroup=(1024,)/Float64/65536 31.1 ± 2.7 μs 0.032 ± 0.0019 ms 0.972
time_to_load 0.726 ± 0.017 s 0.75 ± 0.0098 s 0.968

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy
Copy link
Member

This needs tests. Making it GPU only is fine given #533

@pxl-th pxl-th requested a review from vchuravy October 23, 2024 21:08
@pxl-th
Copy link
Member Author

pxl-th commented Oct 24, 2024

@vchuravy if that's OK, I'd also like to tag a new release after this PR is merged.

@pxl-th pxl-th merged commit 5769a8e into main Oct 25, 2024
30 of 36 checks passed
@pxl-th pxl-th deleted the pxl-th/return branch October 25, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants