Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

developedby · 2024-05-20T13:41:08Z

Originally from HigherOrderCO/Bend#364 by user @ethanbarry

Description

When I run the compiled CUDA bitonic sorter example (linked in the README) I get this error:

Failed to launch kernels (error code an illegal memory access was encountered)!

To Reproduce

Steps to reproduce the behavior:

bend gen-cu sorter.bend > sorter.cu
nvcc sorter.cu -o sorter
prime-run ./sorter (Launches it on the GPU for Arch Linux.)
Error recieved.

Expected behavior

The program runs on the GPU.
Desktop (please complete the following information):

OS: Linux (Arch 6.9.1-arch1-1)
CPU: Intel i7-11800H
GPU: RTX 3050 Ti Mobile
GPU Driver: Nvidia open kernel modules v550.78
CUDA release 12.4, V12.4.131

Additional context

The program runs using the C codegen backend, but with the CUDA backend, it seems to fail regardless of what I do. If anyone is curious about the prime-run command, it's really just a script that forces the dGPU to handle a task - nothing fancy.

The text was updated successfully, but these errors were encountered:

NotCyberLemon · 2024-05-20T21:27:53Z

I am here from the main Bend repo with issue #Bitonic Sort example failed with GPU kernel error.

I too am having a kernel memory issue:

$ ./sorter # The same as prime-run due to environment variables already being set.
| Failed to launch kernels (error code an illegal memory access was encountered)!

I am also running a mobile gpu where I am getting this issue.

Some GPU properties and info from exec:

--- General Information for device 0 ---
Name: NVIDIA GeForce RTX 3060 Laptop GPU
Compute capability: 8.6
Clock rate: 1425000
Device copy overlap: Enabled
Kernel execution timeout: Enabled

--- Memory Information for device 0 ---
Total global memory: 5996544000
Total constant memory: 65536
Max memory pitch: 2147483647
Texture alignment: 512

--- MP Information for device 0 ---
Multiprocessor count: 30
Shared memory per MP: 49152
Registers per MP: 65536
Threads in warp: 32
Max threads per block: 1024
Max thread dimensions: (1024, 1024, 64)
Max grid dimensions: (2147483647, 65535, 65535)

--- Memory Allocation Test ---
Memory allocation successful!

Specs:

OS: Arch Linux x86_64
Kernel: 6.9.1-zen1-1-zen
GPU: NVIDIA GeForce RTX 3060 Mobile / Max-Q
GPU Driver: nvidia-open-dkms 550.78-4
CUDA Version: 12.4.1-4

As well as that, running it through bend run-cu ./sorter it seems to run indefinitely, though - after a while of testing - I am unable to find what exactly is the cause nor what the execution is being caught on.

2lian · 2024-05-22T03:29:01Z

I had the same issue. I cloned, HVM changed LNet seeting according to #283 , but the current repo V2.0.14 does not work with bend, and I do not know where V2.0.13 (for bend) is.

I never used cargo so excuse me if I am doing some black magic here, but this is how I fixed it for bend:

mkdir ~/hvmtmp
cd ~/hvmtmp
cargo init
cargo add hvm@=2.0.13
cargo vendor vendor
cd vendor/hvm

You are now inside the source of hvm V2.0.13.

Open and edit src/hvm.cu. Line 334 reduce L_NODE_LEN and L_VARS_LEN, but do not reduce too much. This value works on my GTX 1080Ti:

// Local Net
const u32 L_NODE_LEN = 0x2000/4;
const u32 L_VARS_LEN = 0x2000/4;
struct LNet {
  Pair node_buf[L_NODE_LEN];
  Port vars_buf[L_VARS_LEN];
};

Now go back to hvm V2.0.13 you downloaded and install it:

cd ~/hvmtmp/vendor/hvm
cargo +nightly install --path .

This should work, you can now delete ~/hvmtmp.

VictorTaelin · 2024-05-22T23:20:03Z

I wonder why they needed /4 there - 0x1000 should be safe for every architecture, shouldn't it? AFAIK all devices support 48KB shared memory. Perhaps this is using a little bit more, due to the other shared structures?

2lian · 2024-05-23T00:48:34Z

I wonder why they needed /4 there - 0x1000 should be safe for every architecture

To report more about this, on my GTX 1080Ti (using WSL2, cuda toolkit 12.3), I have tried:

Only 0x2000/4 and 0x0500 did work.

gladmo · 2024-05-23T03:32:29Z

all tries not work for me, on my GTX 1050 Ti.

OS: CentOS Linux release 7.9.2009 (Core)
CPU: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
GPU: GTX 1050 Ti
GPU Driver: Nvidia open kernel modules v550.78
CUDA release 12.4, V12.4.131

$ nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1050 Ti     Off |   00000000:01:00.0 Off |                  N/A |
|  0%   58C    P8             N/A /   72W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

test example:

$ time bend run-c sorter.bend
Result: 16646144
bend run-c sorter.bend  47.63s user 0.32s system 435% cpu 11.001 total

$ time bend run-cu sorter.bend
Errors:
1.Failed to parse result from HVM.
Output from HVM was:
"Failed to launch kernels. Error code: an illegal memory access was encountered.\n""exit status: 1"""

bend run-cu sorter.bend  0.03s user 0.06s system 89% cpu 0.097 total

TimotejFasiang · 2024-06-08T11:17:17Z

Did anyone manage to find some L_NODE_LEN and L_VAR_LEN values that work for other GPUs?

developedby mentioned this issue May 20, 2024

Bitonic Sort example fails with GPU kernel error. HigherOrderCO/Bend#364

Open

kings177 added the bug Something isn't working label May 20, 2024

developedby mentioned this issue May 26, 2024

Script stuck when using GPU with bend-cu but not when using CPU with bend or bend-c HigherOrderCO/Bend#498

Closed

EloiStree mentioned this issue Jun 16, 2024

Topic: NVidia under 3060 ? EloiStree/HelloRustBending#25

Open

kings177 mentioned this issue Aug 16, 2024

adds dynamic shared mem allocation to cuda kernels #413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

developedby commented May 20, 2024

NotCyberLemon commented May 20, 2024

2lian commented May 22, 2024 •

edited

Loading

VictorTaelin commented May 22, 2024

2lian commented May 23, 2024

gladmo commented May 23, 2024

TimotejFasiang commented Jun 8, 2024

Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

Bitonic Sort on fails on CUDA with (error code an illegal memory access was encountered) #314

Comments

developedby commented May 20, 2024

NotCyberLemon commented May 20, 2024

2lian commented May 22, 2024 • edited Loading

VictorTaelin commented May 22, 2024

2lian commented May 23, 2024

gladmo commented May 23, 2024

TimotejFasiang commented Jun 8, 2024

2lian commented May 22, 2024 •

edited

Loading