-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL-PTX] Fix __spirv_GroupAsyncCopy #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Victor Lomuller <victor@codeplay.com>
5e4c715
to
f1dfcaf
Compare
It would be great to have a regression test for this case. |
So, this change is non-functional - it's just a performance improvement. Right? |
No, the test hangs for CUDA, probably since #1384. Not sure why this was not caught when it was tested. |
@@ -6,19 +6,17 @@ | |||
// | |||
//===----------------------------------------------------------------------===// | |||
|
|||
#define STRIDED_COPY(DST_AS, SRC_AS, DST_STRIDE, SRC_STRIDE) \ | |||
size_t size = __spirv_LocalInvocationId_x() * \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that hang happens because size can computed to 0 for some of the work items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the update from ocl to spir-v inadvertently changed the step computation from "getting the workgroup size" to "mixing up all local ids". This bug was hidden so far as it relied on a header implementation (removed with #1384).
@vladimirlaz, are these tests failing with #1458? |
regression/group.cpp - hangs |
Both tests pass with this patch |
…_private_api * origin/sycl: (614 commits) [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466) [SYCL][USM] Remove vestigial dead code (intel#1474) [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451) [Driver][SYCL] Emit an error if c compilation is forced (intel#1438) [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454) [SYCL] Change priority of devices in default_selector (intel#1264) [CI] Update CODEOWNERS matching rules order (intel#1468) [SYCL] Share PFWG lambda object through shared memory (intel#1455) [CI] Fix CODEOWNERS file syntax (intel#1464) [SYCL][CUDA] Fix active context when creating base event (intel#1447) [SYCL] Diagnose implicit declaration of kernel function type (intel#1450) [BuildBot] Modify configure script (intel#1421) [SYCL] Resolve min/max conflict (intel#1339) [CI][BuildBot] Fix configure parameter to turn on/off assertions (intel#1449) [SYCL] XFAIL LIT test due to duplicate diagnostic [SYCL] Remove explicit sycl_device attribute requirement Apply more suggestions Apply suggestions Translate new set of Intel FPGA Loop Controls Translate Intel FPGA force_pow2_depth memory attribute ...
…duler_docs * origin/sycl: (26 commits) [Driver][SYCL] Move include/sycl header before other system header locations (intel#1492) [BuildBot] Improve usability of buildbot scripts (intel#1472) [NFC] Add GitHub actions badges to README file (intel#1496) [SYCL] Improve error handling for kernel invocation (intel#1209) [SYCL][Driver] Fix SYCL standards' handling for '-fsycl -fsycl-device-only' invocations (intel#1371) [SYCL] Move type checks to later in Semantic Analysis lifecycle (intel#1465) [CI] Download fixed versions of Python tools (intel#1485) [SYCL] Fix sub_group::broadcast (intel#1482) [SYCL][Test] Disable spec_const_redefine.cpp on all devices but HOST (intel#1488) [SYCL] Only export public API (intel#1456) [SYCL][CUDA] Fix selected_binary argument in piextDeviceSelectBinary (intel#1475) [SYCL] Enable LIT testing with CUDA BE (intel#1458) [SYCL] Fix float to half-type conversion (intel#1395) [NFC] Cleanup unneded macro from builtins implementation (intel#1445) Enable cfg-printer LLVM lit tests only if LLVM linked statically (intel#1479) [SYCL][NFC] Reflect the "allowlist" renaming in the code (intel#1480) [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466) [SYCL][USM] Remove vestigial dead code (intel#1474) [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451) [Driver][SYCL] Emit an error if c compilation is forced (intel#1438) ...
…c_abi_checks * origin/sycl: (625 commits) [SYCL][Test] Disable spec_const_redefine.cpp on all devices but HOST (intel#1488) [SYCL] Only export public API (intel#1456) [SYCL][CUDA] Fix selected_binary argument in piextDeviceSelectBinary (intel#1475) [SYCL] Enable LIT testing with CUDA BE (intel#1458) [SYCL] Fix float to half-type conversion (intel#1395) [NFC] Cleanup unneded macro from builtins implementation (intel#1445) Enable cfg-printer LLVM lit tests only if LLVM linked statically (intel#1479) [SYCL][NFC] Reflect the "allowlist" renaming in the code (intel#1480) [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466) [SYCL][USM] Remove vestigial dead code (intel#1474) [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451) [Driver][SYCL] Emit an error if c compilation is forced (intel#1438) [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454) [SYCL] Change priority of devices in default_selector (intel#1264) [CI] Update CODEOWNERS matching rules order (intel#1468) [SYCL] Share PFWG lambda object through shared memory (intel#1455) [CI] Fix CODEOWNERS file syntax (intel#1464) [SYCL][CUDA] Fix active context when creating base event (intel#1447) [SYCL] Diagnose implicit declaration of kernel function type (intel#1450) [BuildBot] Modify configure script (intel#1421) ...
Fix the
size
computation and make theid
computation so that memory accesses can be coalesced.Signed-off-by: Victor Lomuller victor@codeplay.com