Skip to content

[Bug] Example in flash_decoding/example_mha_inference fails when using "cython" backend on NVIDIA Hopper #403

@smallscientist1

Description

@smallscientist1

In examples/flash_decoding/example_mha_inference.py, if I change the backend to the default cython: kernel = tilelang.compile(program, out_idx=[5], target="cuda", execution_backend="dlpack") to kernel = tilelang.compile(program, out_idx=[5]), the program fails to run.

The error is:

Traceback (most recent call last):
  File "/cfy/tilelang/examples/flash_decoding/example_mha_inference.py", line 311, in <module>
    profiler.assert_allclose(ref_program, rtol=0.01, atol=0.01)
  File "/cfy/tilelang/tilelang/profiler/__init__.py", line 99, in assert_allclose
    torch.cuda.synchronize()
  File "/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py", line 985, in synchronize
    return torch._C._cuda_synchronize()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

My device is H200, CUDA 12.6

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions