[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on  CUDA

### Describe the bug

NonUniformGroups/ballot_group_algorithms.cpp failed on self-hosted CUDA runner during SYCL Nightly testing: https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484

```
FAIL: SYCL :: NonUniformGroups/ballot_group_algorithms.cpp (1450 of [19](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:21:20)31)
******************** TEST 'SYCL :: NonUniformGroups/ballot_group_algorithms.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 1
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp -o /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp -o /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 2
env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/NonUniformGroups/Output/ballot_group_algorithms.cpp.tmp.out
# .---command stderr------------
# | ballot_group_algorithms.cpp.tmp.out: /__w/llvm/llvm/llvm/sycl/test-e2e/NonUniformGroups/ballot_group_algorithms.cpp:1: int main(): Assertion `AllAcc[WI] == true' failed.
# `-----------------------------
# error: command failed with exit status: -6
```

### To reproduce

intel/llvm commit id: ad6085c6b449f


### Environment

`sycl-ls --verbose` output:

```
> sycl-ls --verbose

ur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 [8](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:17:9).6 [CUDA 12.4]

Platforms: 1
Platform [#1]:
    Version  : CUDA [12](https://github.com/intel/llvm/actions/runs/8242960746/job/22543077484#step:17:13).4
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 8.6
        Name       : NVIDIA GeForce RTX 3090
        Vendor     : NVIDIA Corporation
        Driver     : CUDA 12.4
        Aspects    : gpu fp fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_groupcl_khr_fp64 cl_khr_subgroups pi_ext_intel_devicelib_assert ur_exp_command_buffer  cl_khr_fp16  ext_oneapi_graph
        info::device::sub_group_sizes: 32
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]
accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.4]
```

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

Describe the bug

To reproduce

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[E2E][CUDA] NonUniformGroups/ballot_group_algorithms.cpp failed on CUDA #12995

Description

Describe the bug

To reproduce

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions