Skip to content

Conversation

@nrspruit
Copy link
Contributor

@nrspruit nrspruit commented Sep 5, 2024

-pre-commit PR for oneapi-src/unified-runtime#2062

-pre-commit PR for oneapi-src/unified-runtime#2062

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@pbalcer pbalcer marked this pull request as ready for review September 6, 2024 07:52
@pbalcer pbalcer requested a review from a team as a code owner September 6, 2024 07:52
@pbalcer
Copy link
Contributor

pbalcer commented Sep 6, 2024

@intel/llvm-gatekeepers please merge

@martygrant martygrant merged commit e0e7b50 into intel:sycl Sep 6, 2024
@sarnex
Copy link
Contributor

sarnex commented Sep 6, 2024

@nrspruit @pbalcer Seeing many postcommit linux gen12 failures, can these all be related? If so IMO we should revert this.
https://github.com/intel/llvm/actions/runs/10737081184/job/29778134641

Failed Tests (11):
  SYCL :: Basic/sub_group_size_prop.cpp
  SYCL :: EnqueueFunctions/kernel_shortcut_with_kb.cpp
  SYCL :: EnqueueFunctions/kernel_submit_with_event_and_kb.cpp
  SYCL :: EnqueueFunctions/kernel_submit_with_kb.cpp
  SYCL :: KernelCompiler/kernel_compiler_opencl.cpp
  SYCL :: KernelCompiler/multi_device.cpp
  SYCL :: KernelCompiler/opencl_capabilities.cpp
  SYCL :: KernelCompiler/opencl_queries.cpp
  SYCL :: syclcompat/launch/launch_policy.cpp
  SYCL :: syclcompat/util/util_match_all_over_group.cpp
  SYCL :: syclcompat/util/util_match_any_over_group.cpp

@pbalcer
Copy link
Contributor

pbalcer commented Sep 10, 2024

@sarnex It really seems unlikely, this patch was fairly simple and obvious. I'll take a look.

@pbalcer
Copy link
Contributor

pbalcer commented Sep 10, 2024

@sarnex I've chatted with @nrspruit about this, and it doesn't look like something this patch could have caused. Can you try rebooting the gen12 runner?
All the failures look like they are coming from the fpga emulator:

2024-09-06T11:32:09.1849674Z env ONEAPI_DEVICE_SELECTOR=opencl:fpga  /__w/llvm/llvm/build-e2e/Basic/Output/sub_group_size_prop.cpp.tmp.out
2024-09-06T11:32:09.1850627Z # executed command: env ONEAPI_DEVICE_SELECTOR=opencl:fpga /__w/llvm/llvm/build-e2e/Basic/Output/sub_group_size_prop.cpp.tmp.out
2024-09-06T11:32:09.1851198Z # .---command stdout------------
2024-09-06T11:32:09.1851522Z # | Testing sub_group_size property for sub-group size=1
2024-09-06T11:32:09.1851850Z # `-----------------------------
2024-09-06T11:32:09.1852096Z # .---command stderr------------
2024-09-06T11:32:09.1852375Z # | ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
2024-09-06T11:32:09.1852789Z # | ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
2024-09-06T11:32:09.1853365Z # | terminate called after throwing an instance of 'sycl::_V1::exception'
2024-09-06T11:32:09.1853819Z # |   what():  The program was built for 1 devices
2024-09-06T11:32:09.1854211Z # | Build program log for 'Intel(R) FPGA Emulation Device':
2024-09-06T11:32:09.1854526Z # | Compilation started
2024-09-06T11:32:09.1854735Z # | Compilation done
2024-09-06T11:32:09.1854924Z # | Linking started
2024-09-06T11:32:09.1855108Z # | Linking done
2024-09-06T11:32:09.1855289Z # | Device build started
2024-09-06T11:32:09.1855509Z # | Options used by backend compiler: 
2024-09-06T11:32:09.1855774Z # | Failed to build device program
2024-09-06T11:32:09.1856406Z # | error: kernel "_ZTS14SubGroupKernelIL7Variant0ELm1EE": Required subgroup size can't be 1 for subgroup calls
2024-09-06T11:32:09.1856968Z # | CompilerException Checking vectorization factor failed
2024-09-06T11:32:09.1857263Z # | 
2024-09-06T11:32:09.1857443Z # `-----------------------------
2024-09-06T11:32:09.1857712Z # error: command failed with exit status: -6
2024-09-06T11:32:09.1919603Z env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/KernelCompiler/Output/kernel_compiler_opencl.cpp.tmp.out
2024-09-06T11:32:09.1920571Z # executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/KernelCompiler/Output/kernel_compiler_opencl.cpp.tmp.out
2024-09-06T11:32:09.1921201Z # .---command stderr------------
2024-09-06T11:32:09.1921482Z # | ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
2024-09-06T11:32:09.1921909Z # | ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
2024-09-06T11:32:09.1922357Z # `-----------------------------
2024-09-06T11:32:09.1922578Z # RUN: at line 12
2024-09-06T11:32:09.1923111Z env ONEAPI_DEVICE_SELECTOR=opencl:fpga  /__w/llvm/llvm/build-e2e/KernelCompiler/Output/kernel_compiler_opencl.cpp.tmp.out
2024-09-06T11:32:09.1924056Z # executed command: env ONEAPI_DEVICE_SELECTOR=opencl:fpga /__w/llvm/llvm/build-e2e/KernelCompiler/Output/kernel_compiler_opencl.cpp.tmp.out
2024-09-06T11:32:09.1924666Z # .---command stderr------------
2024-09-06T11:32:09.1924942Z # | ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
2024-09-06T11:32:09.1925357Z # | ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
2024-09-06T11:32:09.1925912Z # | terminate called after throwing an instance of 'sycl::_V1::exception'
2024-09-06T11:32:09.1926453Z # |   what():  Native API failed. Native API returns: 54 (UR_RESULT_ERROR_UNSUPPORTED_ENUMERATION)
2024-09-06T11:32:09.1926901Z # `-----------------------------
2024-09-06T11:32:09.1927175Z # error: command failed with exit status: -6

and so on. All the tests that are failing first run the test against the L0 adapter (which is what this patch changed), which seems to be working fine, and then run the same test with opencl:fpga and this is what's failing.

To us it looks like it's the environment that is the problem, not the L0 adapter.

@sarnex
Copy link
Contributor

sarnex commented Sep 10, 2024

Thanks, we are seeing this on multiple runners, so I don't think rebooting it will help.

@intel/llvm-gatekeepers Does someone have time to investigate the above test failures in postcommit?

@pbalcer
Copy link
Contributor

pbalcer commented Sep 10, 2024

We suspect that the issue is due to a bug in FPGA emulator revealed after oneapi-src/unified-runtime#2032. @callumfare is investigating.

@sarnex
Copy link
Contributor

sarnex commented Sep 10, 2024

Thanks, if we can consistently fix it by reverting that commit I would recommend we do that for now if the fix isn't quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants