-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running on H100? #9
Comments
@joris-sense I ran in an A100 instance and not H100 :( can't you do only LoRA or full FT since you have an access to an H100? |
I am sitting on it right now and the training loop seems to work when I replace bitsandbytes by quanto =) So I use
Does this have disadvantages compared to bitsandbytes, or is there something else I should use? |
@joris-sense I think they're the same thing, if anything Quanto is more up to date |
My understanding was that the main advantage of LoRA/QLORA is the reduced memory requirement rather than improved speed? In any case, trying it out, the H100 has similar speed for all 3 methods. Thinking about it, why does the Jupyter notebook take 50 GB of VRAM even when training a QLOR model with 8B parameters? Shouldn't it be a lot less for 4 bit, on the order of 4 GB? |
@joris-sense I forgot to mention there but my training setup was only freezing image encoder and not doing LoRA training. I have now uploaded new versions of the notebook and script that is much more QLoRA focused and realized it takes around 17GB VRAM, with 0.002% of params being trained. Can you try? |
That still seems like a lot to me, it looks like the model's weights are stored unquantized as this is more than 2*8 GB of VRAM (as I understand, with QLORA the model weights should be stored with quantization as well). I didn't get your new script to work on my machine with QLORA and stick to full finetuning for now. I seem to get different errors in each execution, but one of them was that flashattention complains during inference (which I copied from your last version -- I am also missing an inference part in the new one) that the model's weights are stored in float32 (as seen by
) and it throws an error "FlashAttention only support fp16 and bf16 data type"). If I convert the weights, the model does infer but it seems to be unrelated to what it is trained on. I also didn't get below 50 GB of VRAM, and sometimes still get out of memory errors, and with your new global variables, the thing didn't seem to find a GPU on a one-H100 setup (I know this is probably too vague to act upon, maybe I will figure out more and make a more reproducible report over the weekend). Also note that your notebook's default settings are still |
Hey, when trying to run Idefics_FT.ipynb on a H100 machine, I seem to be getting the problem described here. Is there a way around this, using something else than bitsandbytes maybe?
The text was updated successfully, but these errors were encountered: