Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU is needed for quantization in M2 MacOS #23970

Closed
4 tasks
phdykd opened this issue Jun 3, 2023 · 19 comments
Closed
4 tasks

GPU is needed for quantization in M2 MacOS #23970

phdykd opened this issue Jun 3, 2023 · 19 comments

Comments

@phdykd
Copy link

phdykd commented Jun 3, 2023

System Info

M2 MacOS that has 12 CPU and 38 GPU, 96GB processor, 2TB storage

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am getting this issue " No GPU found. A GPU is needed for quantization." for the following code snippet trying to be implemented in M2 MacOS that has 12 CPU and 38 GPU in it. How is QLORA/quantization will work on M2 MacOS systems that has "mps" in it?

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "EleutherAI/gpt-neox-20b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config) #, device_map={"":0})

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device=torch.device('cpu'))

Expected behavior

I expect the code will run on "mps" of M2 MacOS

@sgugger
Copy link
Collaborator

sgugger commented Jun 5, 2023

No, the bitsandbytes library only works on CUDA GPU.

@jacobweiss2305
Copy link

To get around this error I set load_in_8bit=False:
AutoModelForSeq2SeqLM.from_pretrained(model_id,load_in_8bit=False)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@edmondja
Copy link

edmondja commented Oct 6, 2023

To get around this error I set load_in_8bit=False: AutoModelForSeq2SeqLM.from_pretrained(model_id,load_in_8bit=False)

It doesnt work with https://huggingface.co/TheBloke/Llama-2-70B-GPTQ unfortunately, so it would be great to have a real support for quantized models with mps.

@Datamance
Copy link

No, the bitsandbytes library only works on CUDA GPU.

Are there any options for quantized training with MPS/Apple Silicon? GPTQ doesn't seem to work either, unless I'm missing something.

@parisapouya92
Copy link

For M2 users who suffer from the issue of not detecting GPU:
1- install pytorch-nightly version (supports GPU acceleration for Apple Silicon GPUs)
2- install transformers == 4.31
3- install accelerate and biysandbytes (I installed from github)
4- check if torch recognizes your device (print(torch.backends.mps.is_available()) should return True)
5- set the device type to 'mps' in AutoModelForCausalLM.from_pretrained(): AutoModelForCausalLM.from_pretrained(device_map='mps')

These steps are working for me with Peft and bitsandbytes

@pechaut78
Copy link

For M2 users who suffer from the issue of not detecting GPU: 1- install pytorch-nightly version (supports GPU acceleration for Apple Silicon GPUs) 2- install transformers == 4.31 3- install accelerate and biysandbytes (I installed from github) 4- check if torch recognizes your device (print(torch.backends.mps.is_available()) should return True) 5- set the device type to 'mps' in AutoModelForCausalLM.from_pretrained(): AutoModelForCausalLM.from_pretrained(device_map='mps')

These steps are working for me with Peft and bitsandbytes

Are you sure it is working with bitsandbytes ? There is check for cuda in the code, I don't see how it can work on MPS.
Tried, and it does not.

@parisapouya92
Copy link

@pechaut78 No it does not work with bitsnadbytes :( I get that warning all the time, but I have been able to quantize on my macbook air (M2) with torch nightly and accelerate! other than quantizing, I can train models with torch on gpu by setting the device to 'mps'. It seems like everything works except for bitsandbytes..

@pechaut78
Copy link

@pechaut78 No it does not work with bitsnadbytes :( I get that warning all the time, but I have been able to quantize on my macbook air (M2) with torch nightly and accelerate! other than quantizing, I can train models with torch on gpu by setting the device to 'mps'. It seems like everything works except for bitsandbytes..

That IS fantastic but how do you quantize with accélérate ? It uses bitsandbyres. Could you share some code please ? Then you save thé modèl and reloas it, as simple as that ?

I Can train too, if i leave thé optimiser to the default one and remove bitsandbyres

@a-ml
Copy link

a-ml commented Jan 24, 2024

Hello @ALL,

Does anyone managed to solve this issue? I'm using the Autotrain Google colab locally on M1 Max, and getting the GPU error.

❌ ERROR | 2024-01-24 20:28:17 | autotrain.trainers.common:wrapper:91 - No GPU found. A GPU is needed for quantization.

import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

tensor([1.], device='mps:0')

If anyone managed to get the solution, please share.

Regards,

@msusol
Copy link

msusol commented Feb 8, 2024

M3: I can run a similar example by making these similar changes (I don't see my GPU usage engaged though)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "meta-llama/Llama-2-7b-hf"

bnb_config = BitsAndBytesConfig(
#load_in_4bit=True,
load_in_4bit=False,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

print(torch.backends.mps.is_available())

model = AutoModelForCausalLM.from_pretrained(
    model_name, quantization_config=bnb_config, trust_remote_code=True, device_map='mps'
)

@AsteroidHunter
Copy link

The example above didn't seem to work for me. I get NameError: name 'torch' is not defined.

@jakobgraetz
Copy link

@AsteroidHunter You should try: pip3 install torch torchvision torchaudio and then in your code file import torch

@panoskyriakis
Copy link

@AsteroidHunter and @jakobgraetz This is not to torch not being installed by rather due to bitsandbytes not supporting Apple Silicon

@jakobgraetz
Copy link

@AsteroidHunter and @jakobgraetz This is not to torch not being installed by rather due to bitsandbytes not supporting Apple Silicon

@panoskyriakis His stacktrace is NameError: name 'torch' is not defined - I checked the library source and it actually only imports torch after an if clause checking if bitsandbytes is available. I guess you could just change that in the library and then hope you won’t run into any issues later (bad idea though, better strategy would be to wait for an update).

@panoskyriakis
Copy link

@jakobgraetz Yep, excactly. It's not gonna work even if you change it, bitsandbytes checks for cuda in different places.

@AsteroidHunter
Copy link

I gave up on bitsandbytes after that. I hope it is Mac GPU compatible at some point, because quantization would really speed things up! I hope the devs are working on this...

@viantguest
Copy link

I guess we have not choice but to wait, looks there is already issue created bitsandbytes-foundation/bitsandbytes#252

@zycalice
Copy link

I gave up on bitsandbytes after that. I hope it is Mac GPU compatible at some point, because quantization would really speed things up! I hope the devs are working on this...

Have you tried other methods that worked?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests