RuntimeError: No available kernel. Aborting execution. #9

zarandioon · 2023-06-18T21:36:15Z

When I run the inference logic using the following script, I get RuntimeError: No available kernel. Aborting execution. error:

A100 GPU detected, using flash attention if input tensor is on cuda
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:659.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:450.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:661.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:100.)
  out = F.scaled_dot_product_attention(
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]
Traceback (most recent call last):

... <truncated>

  File "/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py", line 100, in flash_attn
    out = F.scaled_dot_product_attention(
RuntimeError: No available kernel.  Aborting execution.

I tried installing the Pytorch nightly version and that did not help:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

NVIDIA driver version:

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

PyTorch version:

pip3 show  torch
Name: torch
Version: 2.1.0.dev20230618+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/azureuser/PaLM/.venv/lib/python3.8/site-packages
Requires: filelock, pytorch-triton, sympy, networkx, jinja2, fsspec, typing-extensions
Required-by: torchvision, torchaudio, PaLM-rlhf-pytorch, lion-pytorch, accelerate

Any idea what could cause this?

The text was updated successfully, but these errors were encountered:

conceptofmind · 2023-06-21T19:03:57Z

I think this is an issue related to the use of the Flash Attention kernel in PyTorch. Can you try setting Flash Attention to false?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: No available kernel. Aborting execution. #9

RuntimeError: No available kernel. Aborting execution. #9

zarandioon commented Jun 18, 2023

conceptofmind commented Jun 21, 2023

RuntimeError: No available kernel. Aborting execution. #9

RuntimeError: No available kernel. Aborting execution. #9

Comments

zarandioon commented Jun 18, 2023

conceptofmind commented Jun 21, 2023