-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flash_attn_2 undefined symbol #4182
Comments
Managed to resolve this myself, but might be worth something to others. After cleanup and reinstall with 2.0.1 (pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118) everything works that I've tested so far. |
I had the same problem, and this fixed it. Thank you! |
Why did you close this bug report ? I have the same error after fresh install using |
Read the other messages in this thread. |
that's a workaround, not a bug fix. The bug is still there in the official source code. |
The assumption is that when oobabooga is ready to transition to pytorch 2.1, there will also be support in AutoGPTQ for flashatt2. Feel free to open a new issue if you are so inclined. |
I ran into this issue on Ubuntu 22.04. (Native Ubuntu, not WSL.) I rebuilt my conda environment from scratch, with the above suggested specified versions of torch, torchvision and torchaudio, though from a quick check I believe I had those versions installed previously. I overlooked the idea to comment out any lines in source. The same problem persisted, but I seem to have resolved things (for me) now, with some different steps taken. From the guidance in this post in the mentioned issue in flash-attn, I uninstalled the flash-attn wheel TGW had installed, then built and installed flash-attn fresh.
I could then launch TGW successfully. I could also load GPTQ models with ExLlama_HF, but could not run load exllamav2 models with ExLlamav2_HF in the web UI. I'd get basically the same error, now just displayed off to the right on the web page. To resolve that, I uninstalled the exllamav2 wheel that TGW had installed, then reinstalled exllamav2 from source directly.
After a cursory look, now exllamav2 models seem to be working for me with ExLlama_HF. I can't speak to other combinations or loaders (like AutoGPTQ). I also haven't yet checked if I'm getting the benefits of flash-attn, like improved inference speed at higher ctx, but at least things run so far. |
Thanks @tigerbears for the tip, that worked for me.
|
I solved this issue by downloading the exllamav2 whl release appropriate for my system, and installing directly, i.e.:
|
@tigerbears the following code solved the issue. |
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn is no longer working for me (it doesn't seem to force build? uses cached thing?) any help? |
I cloned flash-attn repo and built from source |
Thank you very much, I reported flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _znk3c106symintltel this error, but your method works |
Describe the bug
Fresh WSL2 install with Ubuntu 22.04, followed the instructions to install it in a conda environment.
CUDA Toolkit is installed as well.
Upon attempting to start server.py from the textgen environment I get the error that's been appended in the logs area.
I can work around the issue by commenting line 7 llama_attn_hijack.py file, and I can run exllama models without an issue.
AutoGPTQ can't be used though as that also seems to rely on flash_attn_2.
The issue seems related to this one: Dao-AILab/flash-attention#451
But it is unclear to me how to resolve it in a WSL environment.
Advice?
Is there an existing issue for this?
Reproduction
Fresh install in a windows environment with WSL2 Ubuntu 22.04, followed by following the install instructions.
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: