Skip to content

[SYCL-PTX] Fix __spirv_GroupAsyncCopy #1451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2020

Conversation

Naghasan
Copy link
Contributor

@Naghasan Naghasan commented Apr 1, 2020

Fix the size computation and make the id computation so that memory accesses can be coalesced.

Signed-off-by: Victor Lomuller victor@codeplay.com

Signed-off-by: Victor Lomuller <victor@codeplay.com>
@bader
Copy link
Contributor

bader commented Apr 2, 2020

It would be great to have a regression test for this case.

@bader
Copy link
Contributor

bader commented Apr 2, 2020

So, this change is non-functional - it's just a performance improvement. Right?

@Naghasan
Copy link
Contributor Author

Naghasan commented Apr 2, 2020

No, the test hangs for CUDA, probably since #1384. Not sure why this was not caught when it was tested.

@@ -6,19 +6,17 @@
//
//===----------------------------------------------------------------------===//

#define STRIDED_COPY(DST_AS, SRC_AS, DST_STRIDE, SRC_STRIDE) \
size_t size = __spirv_LocalInvocationId_x() * \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that hang happens because size can computed to 0 for some of the work items?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the update from ocl to spir-v inadvertently changed the step computation from "getting the workgroup size" to "mixing up all local ids". This bug was hidden so far as it relied on a header implementation (removed with #1384).

@vladimirlaz
Copy link
Contributor

vladimirlaz commented Apr 3, 2020

It is covered there:

@vladimirlaz, are these tests failing with #1458?

regression/group.cpp - hangs
basic_tests/device_event.cpp - hangs

@vladimirlaz
Copy link
Contributor

Both tests pass with this patch

@bader bader merged commit f836604 into intel:sycl Apr 3, 2020
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 6, 2020
…_private_api

* origin/sycl: (614 commits)
  [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466)
  [SYCL][USM] Remove vestigial dead code (intel#1474)
  [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451)
  [Driver][SYCL] Emit an error if c compilation is forced (intel#1438)
  [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454)
  [SYCL] Change priority of devices in default_selector (intel#1264)
  [CI] Update CODEOWNERS matching rules order (intel#1468)
  [SYCL] Share PFWG lambda object through shared memory (intel#1455)
  [CI] Fix CODEOWNERS file syntax (intel#1464)
  [SYCL][CUDA] Fix active context when creating base event (intel#1447)
  [SYCL] Diagnose implicit declaration of kernel function type (intel#1450)
  [BuildBot] Modify configure script (intel#1421)
  [SYCL] Resolve min/max conflict (intel#1339)
  [CI][BuildBot] Fix configure parameter to turn on/off assertions (intel#1449)
  [SYCL] XFAIL LIT test due to duplicate diagnostic
  [SYCL] Remove explicit sycl_device attribute requirement
  Apply more suggestions
  Apply suggestions
  Translate new set of Intel FPGA Loop Controls
  Translate Intel FPGA force_pow2_depth memory attribute
  ...
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 9, 2020
…duler_docs

* origin/sycl: (26 commits)
  [Driver][SYCL] Move include/sycl header before other system header locations (intel#1492)
  [BuildBot] Improve usability of buildbot scripts (intel#1472)
  [NFC] Add GitHub actions badges to README file (intel#1496)
  [SYCL] Improve error handling for kernel invocation (intel#1209)
  [SYCL][Driver] Fix SYCL standards' handling for '-fsycl -fsycl-device-only' invocations (intel#1371)
  [SYCL] Move type checks to later in Semantic Analysis lifecycle (intel#1465)
  [CI] Download fixed versions of Python tools (intel#1485)
  [SYCL] Fix sub_group::broadcast (intel#1482)
  [SYCL][Test] Disable spec_const_redefine.cpp on all devices but HOST (intel#1488)
  [SYCL] Only export public API (intel#1456)
  [SYCL][CUDA] Fix selected_binary argument in piextDeviceSelectBinary (intel#1475)
  [SYCL] Enable LIT testing with CUDA BE (intel#1458)
  [SYCL] Fix float to half-type conversion (intel#1395)
  [NFC] Cleanup unneded macro from builtins implementation (intel#1445)
  Enable cfg-printer LLVM lit tests only if LLVM linked statically (intel#1479)
  [SYCL][NFC] Reflect the "allowlist" renaming in the code (intel#1480)
  [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466)
  [SYCL][USM] Remove vestigial dead code (intel#1474)
  [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451)
  [Driver][SYCL] Emit an error if c compilation is forced (intel#1438)
  ...
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 15, 2020
…c_abi_checks

* origin/sycl: (625 commits)
  [SYCL][Test] Disable spec_const_redefine.cpp on all devices but HOST (intel#1488)
  [SYCL] Only export public API (intel#1456)
  [SYCL][CUDA] Fix selected_binary argument in piextDeviceSelectBinary (intel#1475)
  [SYCL] Enable LIT testing with CUDA BE (intel#1458)
  [SYCL] Fix float to half-type conversion (intel#1395)
  [NFC] Cleanup unneded macro from builtins implementation (intel#1445)
  Enable cfg-printer LLVM lit tests only if LLVM linked statically (intel#1479)
  [SYCL][NFC] Reflect the "allowlist" renaming in the code (intel#1480)
  [SYCL][Doc] Update prerequisites in GetStartedGuide (intel#1466)
  [SYCL][USM] Remove vestigial dead code (intel#1474)
  [SYCL-PTX] Fix __spirv_GroupAsyncCopy stride computation (intel#1451)
  [Driver][SYCL] Emit an error if c compilation is forced (intel#1438)
  [SYCL] Fix sycl-post-link when no split and symbols are requested. (intel#1454)
  [SYCL] Change priority of devices in default_selector (intel#1264)
  [CI] Update CODEOWNERS matching rules order (intel#1468)
  [SYCL] Share PFWG lambda object through shared memory (intel#1455)
  [CI] Fix CODEOWNERS file syntax (intel#1464)
  [SYCL][CUDA] Fix active context when creating base event (intel#1447)
  [SYCL] Diagnose implicit declaration of kernel function type (intel#1450)
  [BuildBot] Modify configure script (intel#1421)
  ...
@bader bader added the libclc libclc project related issues label Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc project related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants