GPU is needed for quantization in M2 MacOS #23970

phdykd · 2023-06-03T05:18:19Z

System Info

M2 MacOS that has 12 CPU and 38 GPU, 96GB processor, 2TB storage

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am getting this issue " No GPU found. A GPU is needed for quantization." for the following code snippet trying to be implemented in M2 MacOS that has 12 CPU and 38 GPU in it. How is QLORA/quantization will work on M2 MacOS systems that has "mps" in it?

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "EleutherAI/gpt-neox-20b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config) #, device_map={"":0})

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device=torch.device('cpu'))

Expected behavior

I expect the code will run on "mps" of M2 MacOS

The text was updated successfully, but these errors were encountered:

sgugger · 2023-06-05T15:37:32Z

No, the bitsandbytes library only works on CUDA GPU.

jacobweiss2305 · 2023-06-20T14:03:50Z

To get around this error I set load_in_8bit=False:
AutoModelForSeq2SeqLM.from_pretrained(model_id,load_in_8bit=False)

github-actions · 2023-07-14T15:02:48Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

edmondja · 2023-10-06T19:08:36Z

To get around this error I set load_in_8bit=False: AutoModelForSeq2SeqLM.from_pretrained(model_id,load_in_8bit=False)

It doesnt work with https://huggingface.co/TheBloke/Llama-2-70B-GPTQ unfortunately, so it would be great to have a real support for quantized models with mps.

Datamance · 2023-10-24T18:49:40Z

No, the bitsandbytes library only works on CUDA GPU.

Are there any options for quantized training with MPS/Apple Silicon? GPTQ doesn't seem to work either, unless I'm missing something.

parisapouya92 · 2023-11-08T00:35:30Z

For M2 users who suffer from the issue of not detecting GPU:
1- install pytorch-nightly version (supports GPU acceleration for Apple Silicon GPUs)
2- install transformers == 4.31
3- install accelerate and biysandbytes (I installed from github)
4- check if torch recognizes your device (print(torch.backends.mps.is_available()) should return True)
5- set the device type to 'mps' in AutoModelForCausalLM.from_pretrained(): AutoModelForCausalLM.from_pretrained(device_map='mps')

These steps are working for me with Peft and bitsandbytes

pechaut78 · 2023-12-09T21:16:48Z

For M2 users who suffer from the issue of not detecting GPU: 1- install pytorch-nightly version (supports GPU acceleration for Apple Silicon GPUs) 2- install transformers == 4.31 3- install accelerate and biysandbytes (I installed from github) 4- check if torch recognizes your device (print(torch.backends.mps.is_available()) should return True) 5- set the device type to 'mps' in AutoModelForCausalLM.from_pretrained(): AutoModelForCausalLM.from_pretrained(device_map='mps')

These steps are working for me with Peft and bitsandbytes

Are you sure it is working with bitsandbytes ? There is check for cuda in the code, I don't see how it can work on MPS.
Tried, and it does not.

parisapouya92 · 2023-12-10T01:11:39Z

@pechaut78 No it does not work with bitsnadbytes :( I get that warning all the time, but I have been able to quantize on my macbook air (M2) with torch nightly and accelerate! other than quantizing, I can train models with torch on gpu by setting the device to 'mps'. It seems like everything works except for bitsandbytes..

pechaut78 · 2023-12-10T06:05:16Z

@pechaut78 No it does not work with bitsnadbytes :( I get that warning all the time, but I have been able to quantize on my macbook air (M2) with torch nightly and accelerate! other than quantizing, I can train models with torch on gpu by setting the device to 'mps'. It seems like everything works except for bitsandbytes..

That IS fantastic but how do you quantize with accélérate ? It uses bitsandbyres. Could you share some code please ? Then you save thé modèl and reloas it, as simple as that ?

I Can train too, if i leave thé optimiser to the default one and remove bitsandbyres

a-ml · 2024-01-24T19:36:27Z

Hello @ALL,

Does anyone managed to solve this issue? I'm using the Autotrain Google colab locally on M1 Max, and getting the GPU error.

❌ ERROR | 2024-01-24 20:28:17 | autotrain.trainers.common:wrapper:91 - No GPU found. A GPU is needed for quantization.

import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

tensor([1.], device='mps:0')

If anyone managed to get the solution, please share.

Regards,

msusol · 2024-02-08T19:46:11Z

M3: I can run a similar example by making these similar changes (I don't see my GPU usage engaged though)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "meta-llama/Llama-2-7b-hf"

bnb_config = BitsAndBytesConfig(
#load_in_4bit=True,
load_in_4bit=False,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

print(torch.backends.mps.is_available())

model = AutoModelForCausalLM.from_pretrained(
    model_name, quantization_config=bnb_config, trust_remote_code=True, device_map='mps'
)

AsteroidHunter · 2024-02-12T06:28:10Z

The example above didn't seem to work for me. I get NameError: name 'torch' is not defined.

jakobgraetz · 2024-02-18T12:38:37Z

@AsteroidHunter You should try: pip3 install torch torchvision torchaudio and then in your code file import torch

panoskyriakis · 2024-02-20T06:30:00Z

@AsteroidHunter and @jakobgraetz This is not to torch not being installed by rather due to bitsandbytes not supporting Apple Silicon

jakobgraetz · 2024-02-20T19:20:24Z

@AsteroidHunter and @jakobgraetz This is not to torch not being installed by rather due to bitsandbytes not supporting Apple Silicon

@panoskyriakis His stacktrace is NameError: name 'torch' is not defined - I checked the library source and it actually only imports torch after an if clause checking if bitsandbytes is available. I guess you could just change that in the library and then hope you won’t run into any issues later (bad idea though, better strategy would be to wait for an update).

panoskyriakis · 2024-02-20T20:24:49Z

@jakobgraetz Yep, excactly. It's not gonna work even if you change it, bitsandbytes checks for cuda in different places.

AsteroidHunter · 2024-02-20T22:35:00Z

I gave up on bitsandbytes after that. I hope it is Mac GPU compatible at some point, because quantization would really speed things up! I hope the devs are working on this...

viantguest · 2024-02-22T01:03:57Z

I guess we have not choice but to wait, looks there is already issue created bitsandbytes-foundation/bitsandbytes#252

zycalice · 2024-09-17T06:19:29Z

I gave up on bitsandbytes after that. I hope it is Mac GPU compatible at some point, because quantization would really speed things up! I hope the devs are working on this...

Have you tried other methods that worked?

github-actions bot closed this as completed Jul 23, 2023

varunsatish mentioned this issue Aug 12, 2024

Documenting fixes from August 12 run on OSSC varunsatish/llama-recipes-fertility#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU is needed for quantization in M2 MacOS #23970

GPU is needed for quantization in M2 MacOS #23970

phdykd commented Jun 3, 2023

sgugger commented Jun 5, 2023

jacobweiss2305 commented Jun 20, 2023

github-actions bot commented Jul 14, 2023

edmondja commented Oct 6, 2023

Datamance commented Oct 24, 2023

parisapouya92 commented Nov 8, 2023

pechaut78 commented Dec 9, 2023

parisapouya92 commented Dec 10, 2023

pechaut78 commented Dec 10, 2023

a-ml commented Jan 24, 2024

msusol commented Feb 8, 2024 •

edited

Loading

AsteroidHunter commented Feb 12, 2024

jakobgraetz commented Feb 18, 2024

panoskyriakis commented Feb 20, 2024

jakobgraetz commented Feb 20, 2024

panoskyriakis commented Feb 20, 2024

AsteroidHunter commented Feb 20, 2024

viantguest commented Feb 22, 2024

zycalice commented Sep 17, 2024

GPU is needed for quantization in M2 MacOS #23970

GPU is needed for quantization in M2 MacOS #23970

Comments

phdykd commented Jun 3, 2023

System Info

Who can help?

Information

Tasks

Reproduction

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config) #, device_map={"":0})

Expected behavior

sgugger commented Jun 5, 2023

jacobweiss2305 commented Jun 20, 2023

github-actions bot commented Jul 14, 2023

edmondja commented Oct 6, 2023

Datamance commented Oct 24, 2023

parisapouya92 commented Nov 8, 2023

pechaut78 commented Dec 9, 2023

parisapouya92 commented Dec 10, 2023

pechaut78 commented Dec 10, 2023

a-ml commented Jan 24, 2024

msusol commented Feb 8, 2024 • edited Loading

AsteroidHunter commented Feb 12, 2024

jakobgraetz commented Feb 18, 2024

panoskyriakis commented Feb 20, 2024

jakobgraetz commented Feb 20, 2024

panoskyriakis commented Feb 20, 2024

AsteroidHunter commented Feb 20, 2024

viantguest commented Feb 22, 2024

zycalice commented Sep 17, 2024

msusol commented Feb 8, 2024 •

edited

Loading