-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU is needed for quantization in M2 MacOS #23970
Comments
No, the bitsandbytes library only works on CUDA GPU. |
To get around this error I set load_in_8bit=False: |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
It doesnt work with https://huggingface.co/TheBloke/Llama-2-70B-GPTQ unfortunately, so it would be great to have a real support for quantized models with mps. |
Are there any options for quantized training with MPS/Apple Silicon? GPTQ doesn't seem to work either, unless I'm missing something. |
For M2 users who suffer from the issue of not detecting GPU: These steps are working for me with Peft and bitsandbytes |
Are you sure it is working with bitsandbytes ? There is check for cuda in the code, I don't see how it can work on MPS. |
@pechaut78 No it does not work with bitsnadbytes :( I get that warning all the time, but I have been able to quantize on my macbook air (M2) with torch nightly and accelerate! other than quantizing, I can train models with torch on gpu by setting the device to 'mps'. It seems like everything works except for bitsandbytes.. |
That IS fantastic but how do you quantize with accélérate ? It uses bitsandbyres. Could you share some code please ? Then you save thé modèl and reloas it, as simple as that ? I Can train too, if i leave thé optimiser to the default one and remove bitsandbyres |
Hello @ALL, Does anyone managed to solve this issue? I'm using the Autotrain Google colab locally on M1 Max, and getting the GPU error. ❌ ERROR | 2024-01-24 20:28:17 | autotrain.trainers.common:wrapper:91 - No GPU found. A GPU is needed for quantization.
If anyone managed to get the solution, please share. Regards, |
M3: I can run a similar example by making these similar changes (I don't see my GPU usage engaged though) import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "meta-llama/Llama-2-7b-hf"
bnb_config = BitsAndBytesConfig(
#load_in_4bit=True,
load_in_4bit=False,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
print(torch.backends.mps.is_available())
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=bnb_config, trust_remote_code=True, device_map='mps'
) |
The example above didn't seem to work for me. I get |
@AsteroidHunter You should try: |
@AsteroidHunter and @jakobgraetz This is not to torch not being installed by rather due to bitsandbytes not supporting Apple Silicon |
@panoskyriakis His stacktrace is |
@jakobgraetz Yep, excactly. It's not gonna work even if you change it, bitsandbytes checks for cuda in different places. |
I gave up on bitsandbytes after that. I hope it is Mac GPU compatible at some point, because quantization would really speed things up! I hope the devs are working on this... |
I guess we have not choice but to wait, looks there is already issue created bitsandbytes-foundation/bitsandbytes#252 |
Have you tried other methods that worked? |
System Info
M2 MacOS that has 12 CPU and 38 GPU, 96GB processor, 2TB storage
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am getting this issue " No GPU found. A GPU is needed for quantization." for the following code snippet trying to be implemented in M2 MacOS that has 12 CPU and 38 GPU in it. How is QLORA/quantization will work on M2 MacOS systems that has "mps" in it?
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "EleutherAI/gpt-neox-20b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config) #, device_map={"":0})
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device=torch.device('cpu'))
Expected behavior
I expect the code will run on "mps" of M2 MacOS
The text was updated successfully, but these errors were encountered: