Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flash_attn_2 undefined symbol #4182

Closed
1 task done
zslittlehelper opened this issue Oct 5, 2023 · 13 comments
Closed
1 task done

flash_attn_2 undefined symbol #4182

zslittlehelper opened this issue Oct 5, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@zslittlehelper
Copy link

Describe the bug

Fresh WSL2 install with Ubuntu 22.04, followed the instructions to install it in a conda environment.
CUDA Toolkit is installed as well.

Upon attempting to start server.py from the textgen environment I get the error that's been appended in the logs area.

I can work around the issue by commenting line 7 llama_attn_hijack.py file, and I can run exllama models without an issue.
AutoGPTQ can't be used though as that also seems to rely on flash_attn_2.

The issue seems related to this one: Dao-AILab/flash-attention#451
But it is unclear to me how to resolve it in a WSL environment.

Advice?

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Fresh install in a windows environment with WSL2 Ubuntu 22.04, followed by following the install instructions.

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/home/user/text-generation-webui/server.py", line 30, in <module>
    from modules import (
  File "/home/user/text-generation-webui/modules/chat.py", line 18, in <module>
    from modules.text_generation import (
  File "/home/user/text-generation-webui/modules/text_generation.py", line 24, in <module>
    from modules.models import clear_torch_cache, local_rank
  File "/home/user/text-generation-webui/modules/models.py", line 22, in <module>
    from modules import RoPE, llama_attn_hijack, sampler_hijack
  File "/home/user/text-generation-webui/modules/llama_attn_hijack.py", line 7, in <module>
    import transformers.models.llama.modeling_llama
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 45, in <module>
    from flash_attn import flash_attn_func, flash_attn_varlen_func
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 8, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c106SymIntltEl

System Info

WSL2, Ubuntu 22.04.
@zslittlehelper zslittlehelper added the bug Something isn't working label Oct 5, 2023
@zslittlehelper
Copy link
Author

zslittlehelper commented Oct 5, 2023

Managed to resolve this myself, but might be worth something to others.
I'm used to step 2 of the "install-pytorch guide" on oobabooga always installing the cpu version, so I visited the main page (https://pytorch.org/get-started/locally/) and installed what was suggested, which turned out to be stable 2.1.0. This in turn is incompatible with AutoGPTQ (or rather, flashattention2).

After cleanup and reinstall with 2.0.1 (pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118) everything works that I've tested so far.

@joevenzon
Copy link

I had the same problem, and this fixed it. Thank you!

@SuperUserNameMan
Copy link

SuperUserNameMan commented Oct 6, 2023

Why did you close this bug report ?

I have the same error after fresh install using ./start_linux.sh for NVIDIA and the today's git cloneed code.

@zslittlehelper
Copy link
Author

Why did you close this bug report ?

I have the same error after fresh install using ./start_linux.sh for NVIDIA and the today's git cloneed code.

Read the other messages in this thread.

@SuperUserNameMan
Copy link

SuperUserNameMan commented Oct 6, 2023

that's a workaround, not a bug fix. The bug is still there in the official source code.

@zslittlehelper
Copy link
Author

that's a workaround, not a bug fix.

The assumption is that when oobabooga is ready to transition to pytorch 2.1, there will also be support in AutoGPTQ for flashatt2.

Feel free to open a new issue if you are so inclined.

@tigerbears
Copy link

I ran into this issue on Ubuntu 22.04. (Native Ubuntu, not WSL.) I rebuilt my conda environment from scratch, with the above suggested specified versions of torch, torchvision and torchaudio, though from a quick check I believe I had those versions installed previously.

I overlooked the idea to comment out any lines in source. The same problem persisted, but I seem to have resolved things (for me) now, with some different steps taken.

From the guidance in this post in the mentioned issue in flash-attn, I uninstalled the flash-attn wheel TGW had installed, then built and installed flash-attn fresh.

pip uninstall flash-attn
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

I could then launch TGW successfully. I could also load GPTQ models with ExLlama_HF, but could not run load exllamav2 models with ExLlamav2_HF in the web UI. I'd get basically the same error, now just displayed off to the right on the web page.

To resolve that, I uninstalled the exllamav2 wheel that TGW had installed, then reinstalled exllamav2 from source directly.

pip uninstall exllamav2
cd wherever-you-have-exllamav2
[I then switched conda environments from textgen to exllamav2, but don't know if you must]
python setup.py install --user

After a cursory look, now exllamav2 models seem to be working for me with ExLlama_HF. I can't speak to other combinations or loaders (like AutoGPTQ). I also haven't yet checked if I'm getting the benefits of flash-attn, like improved inference speed at higher ctx, but at least things run so far.

@Thireus
Copy link
Contributor

Thireus commented Oct 12, 2023

Thanks @tigerbears for the tip, that worked for me.

pip uninstall exllamav2 # Uninstall current exllamav2 install

cd ~/exllamav2/
conda activate exllama # Go back to exllama conda env
git pull
pip install -r requirements.txt --upgrade

# Fix for: The detected CUDA version (11.7) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.
pip3 install torch==2.0.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

python setup.py install --user

conda activate textgen # Go back to textgen conda env
cd ~/text-generation-webui/

@BuffMcBigHuge
Copy link

I solved this issue by downloading the exllamav2 whl release appropriate for my system, and installing directly, i.e.:

conda activate textgen
cd text-generation-webui
wget https://github.com/turboderp/exllamav2/releases/download/v0.0.7/exllamav2-0.0.7+cu117-cp310-cp310-linux_x86_64.whl
pip install exllamav2-0.0.7+cu117-cp310-cp310-linux_x86_64.whl

@nps798
Copy link

nps798 commented Jan 19, 2024

@tigerbears
thanks!

the following code solved the issue.
pip uninstall flash-attn
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

@apoorvumang
Copy link

FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

is no longer working for me (it doesn't seem to force build? uses cached thing?)

any help?

@apoorvumang
Copy link

I cloned flash-attn repo and built from source python setup.py install. It worked. But I hope there's an easier way? This took way too long to compile

@chengjikang
Copy link

Managed to resolve this myself, but might be worth something to others.我自己设法解决了这个问题,但可能对其他人有价值。 I'm used to step 2 of the "install-pytorch guide" on oobabooga always installing the cpu version, so I visited the main page (https://pytorch.org/get-started/locally/) and installed what was suggested, which turned out to be stable 2.1.0. This in turn is incompatible with AutoGPTQ (or rather, flashattention2).我习惯了 oobabooga 上“安装 pytorch 指南”的第 2 步,总是安装 cpu 版本,所以我访问了主页( https://pytorch.org/get-started/locally/ )并安装了建议的内容,结果是稳定的 2.1.0。这又与 AutoGPTQ(或者更确切地说,flashattention2)不兼容。

After cleanup and reinstall with 2.0.1 (pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118) everything works that I've tested so far.清理并使用 2.0.1 重新安装后(pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/ cu118 )到目前为止我测试过的一切正常。

Thank you very much, I reported flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _znk3c106symintltel this error, but your method works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants