Skip to content

[SYCL][CUDA] Add sub-group barrier #2606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 8, 2020
Merged

Conversation

Pennycook
Copy link
Contributor

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.

Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.

Signed-off-by: John Pennycook john.pennycook@intel.com

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.

Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
@Pennycook Pennycook added enhancement New feature or request spec extension All issues/PRs related to extensions specifications cuda CUDA back-end labels Oct 7, 2020
@Pennycook Pennycook requested review from bader and a team as code owners October 7, 2020 17:50
@Pennycook Pennycook requested a review from againull October 7, 2020 17:50
@Pennycook
Copy link
Contributor Author

Thanks to @Naghasan and @bader for their help in getting this working.

Also, a note to reviewers: I had some trouble getting CMake to handle the additional PTX flags correctly. I'm not a CMake expert, and would welcome any suggestions regarding how to improve what I've committed here. The issue as I understand it is that the list of compilation options constructed in libclc/CMakeLists.txt is passed to two functions in AddLibclc.cmake, but each function consumes those options differently.

One passes the options to add_target_options, which unhelpfully strips the second -Xclang option if it isn't prefixed with SHELL:. The other passes the options directly to add_custom_command unmodified, leaving SHELL: in the command line. The best solution I could find was to write the options assuming that SHELL: was required, then strip them when they weren't necessary.

@bader bader merged commit 551d706 into intel:sycl Oct 8, 2020
FILES generic/libspirv/sycldevice-binding.cpp)
endif()

add_libclc_builtin_set(libspirv-${arch_suffix}
TRIPLE ${t}
TARGET_ENV libspirv
COMPILE_OPT ${mcpu}
COMPILE_OPT ${flags}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPILE_OPT is a multi value option, so you should be able to add the extra flags directly.

A more long term solution would be perhaps to define flag per arch_sufix (they can then be accessed later), but should be for later I guess.

set( mcpu )
# FIXME: Ideally we would not be tied to a specific PTX ISA version
if( ${ARCH} STREQUAL nvptx OR ${ARCH} STREQUAL nvptx64 )
set( flags "SHELL:-Xclang -target-feature" "SHELL:-Xclang +ptx64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using "SHELL: and string( REGEX REPLACE "SHELL:" later is needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_target_options only works if the SHELL: is there, but add_custom_command only works if the SHELL: is not there.

This is definitely a bit of a hack, but it seemed less error-prone than defining the same set of flags twice. If there's a more standard way to do this, please let me know and I'll fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'm no CMake expert so I'm not quite sure how to make it better.

jsji pushed a commit that referenced this pull request Jun 27, 2024
When there is forward declaration of a spirv entry, its decorates are
not translated until its definition is seen. Forward id is re-used for
its entry. Id in entry decorates should use forward id as well.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@305f48884606abf
kbenzie added a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp
kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
…actor"

This reverts commit cc60d08, from
oneapi-src/unified-runtime#2606 due to
CI fails in the DPC++ bump PR that need further investigation
intel#16747
kbenzie added a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
Revert "Merge pull request intel#2606 from Bensuo/cmd-buf_enqueue_refactor"
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
This reverts commit cc60d08, from
oneapi-src/unified-runtime#2606 due to
CI fails in the DPC++ bump PR that need further investigation
#16747
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Revert "Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end enhancement New feature or request spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants