Skip to content
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.

RuntimeError: No available kernel. Aborting execution. #9

Open
zarandioon opened this issue Jun 18, 2023 · 1 comment
Open

RuntimeError: No available kernel. Aborting execution. #9

zarandioon opened this issue Jun 18, 2023 · 1 comment

Comments

@zarandioon
Copy link

When I run the inference logic using the following script, I get RuntimeError: No available kernel. Aborting execution. error:

A100 GPU detected, using flash attention if input tensor is on cuda
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:659.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:450.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:661.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:100.)
  out = F.scaled_dot_product_attention(
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]
Traceback (most recent call last):

... <truncated>

  File "/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py", line 100, in flash_attn
    out = F.scaled_dot_product_attention(
RuntimeError: No available kernel.  Aborting execution.

I tried installing the Pytorch nightly version and that did not help:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

NVIDIA driver version:

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

PyTorch version:

pip3 show  torch
Name: torch
Version: 2.1.0.dev20230618+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/azureuser/PaLM/.venv/lib/python3.8/site-packages
Requires: filelock, pytorch-triton, sympy, networkx, jinja2, fsspec, typing-extensions
Required-by: torchvision, torchaudio, PaLM-rlhf-pytorch, lion-pytorch, accelerate

Any idea what could cause this?

@conceptofmind
Copy link
Owner

I think this is an issue related to the use of the Flash Attention kernel in PyTorch. Can you try setting Flash Attention to false?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants