-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416
Comments
I just installed using this method, setup.py didn't work for me |
That may work for Windows but my issue is in Linux |
I'm getting this as well under WSL Ubuntu, after trying to set up 4-bit |
I can confirm the issue. The problem is that |
See comment here for possible workaround: qwopqwop200/GPTQ-for-LLaMa#59 (comment) |
I have managed to install nvcc with
The command above takes some 10 minutes to run and shows no progress bar or updates along the way. This allows me to run
for GPTQ-for-LLaMa installation, but then
|
So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step. |
Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:
|
Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs. Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs). |
Hopefully all this will become more streamlined in the future. |
I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work. Windows (no WSL) LLaMA install/setup (normal/8bit/4bit)Normal & 8bit LLaMA Setup
4bit LLaMA SetupRun these commands:
Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the Downloading LLaMA Models
Running the LLaMA ModelsNormal LLaMA Model
8bit LLaMA Model
4bit LLaMA Model
|
I would recommend changing the pytorch install instructions to:
This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install.
It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with I haven't tried it yet, but it is possible to install just nvcc with: |
When doing python setup_cuda.py install I get: (textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install (det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point. I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration. |
Got it to work using this method. |
@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending. |
You can build the wheel yourself for future use with: |
Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process. |
Also, if anyone using wsl starts having issues with bitsandbytes not finding
|
@jllllll do you have a .whl file? I'm stuck on certain issues which I'm unsure about. I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed I did this too. sorry about the weird paste in advance, I don't know what it's doing (textgen) ubuntu@DESKTOP-LMFT8S4: Normal inference with just server.py won't run for me also, on (textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 11, in import chardet ModuleNotFoundError: No module named 'chardet' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/server.py", line 10, in import gradio as gr File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/init.py", line 3, in import gradio.components as components File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/components.py", line 34, in from gradio import media_data, processing_utils, utils File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/processing_utils.py", line 19, in import requests File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/init.py", line 45, in from .exceptions import RequestsDependencyWarning File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/exceptions.py", line 9, in from .compat import JSONDecodeError as CompatJSONDecodeError File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 13, in import charset_normalizer as chardet File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/init.py", line 23, in from charset_normalizer.api import from_fp, from_path, from_bytes, normalize File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in from charset_normalizer.md import mess_ratio File "charset_normalizer/md.py", line 5, in ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/constant.py) |
Here is a freshly compiled wheel:
Make sure not to use the driver installation option. That isn't for wsl. |
@jllllll I really appreciate that, thanks. |
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
|
I am also getting the same error |
Thanks,this help me a lot. I had been stack with this problem for a day now |
Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly. But then you get this error:
Looks like were running the wrong version of GPTQ for the data we have. |
To solve the load_quant error, which is indeed a problem with a new version of GPTQ, you need to roll back. See: #445 (comment) Also in my case I had to change the name of the tokenizer in tokenizer_config.json to "tokenizer_class": "LlamaTokenizer". That is I think an update in the transformer's repo class. |
Thank you, the problem was a new version of GTPQ, as you said. I rolled back as in #445(comment) . After that I got this error: The whole process of installation I did was:
After that I changed "LLaMATokenizer" to "LlamaTokenizer" in tokenizer_config.json file. |
Thanks @NenadZG. I've updated my instructions with your GPTQ rollback fix. |
FYI, I've also managed to get it to work with the new version of GPTQ, but
i had to re-quantize the weights.
…On Tue, Mar 21, 2023 at 7:43 AM Blake Wyatt ***@***.***> wrote:
Thanks @NenadZG <https://github.com/NenadZG>. I've updated my
instructions with your GPTQ rollback fix.
—
Reply to this email directly, view it on GitHub
<#416 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKFINT4O2ZLBNBPVRMOG43W5G5BRANCNFSM6AAAAAAV7UQRYU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Good to know that's possible. I'll update my instructions when all versions of the model have become requantized. |
The The webui is currently not updated to work with the latest version of GPTQ-for-LLaMa. |
great! |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
Describe the bug
Link to issue in GPTQ-for-LLaMa repo: qwopqwop200/GPTQ-for-LLaMa#59 (comment)
When running
python setup_cuda.py install
in GPTQ-for-LLaMa, I'm now getting this error.Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: