Running on H100? #9

joris-sense · 2024-09-05T13:38:10Z

Hey, when trying to run Idefics_FT.ipynb on a H100 machine, I seem to be getting the problem described here. Is there a way around this, using something else than bitsandbytes maybe?

merveenoyan · 2024-09-06T09:12:19Z

@joris-sense I ran in an A100 instance and not H100 :( can't you do only LoRA or full FT since you have an access to an H100?

joris-sense · 2024-09-06T09:16:41Z

I am sitting on it right now and the training loop seems to work when I replace bitsandbytes by quanto =)

So I use

from transformers import QuantoConfig

if USE_QLORA:
    quanto_config = QuantoConfig(weights="int4")
    model = Idefics3ForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", quantization_config=quanto_config if USE_QLORA else None, #bnb_config
_attn_implementation="flash_attention_2", )

Does this have disadvantages compared to bitsandbytes, or is there something else I should use?

merveenoyan · 2024-09-06T09:23:21Z

@joris-sense I think they're the same thing, if anything Quanto is more up to date

joris-sense · 2024-09-06T11:58:12Z

@joris-sense I ran in an A100 instance and not H100 :( can't you do only LoRA or full FT since you have an access to an H100?

My understanding was that the main advantage of LoRA/QLORA is the reduced memory requirement rather than improved speed? In any case, trying it out, the H100 has similar speed for all 3 methods.

Thinking about it, why does the Jupyter notebook take 50 GB of VRAM even when training a QLOR model with 8B parameters? Shouldn't it be a lot less for 4 bit, on the order of 4 GB?

merveenoyan · 2024-09-10T15:54:31Z

@joris-sense I forgot to mention there but my training setup was only freezing image encoder and not doing LoRA training. I have now uploaded new versions of the notebook and script that is much more QLoRA focused and realized it takes around 17GB VRAM, with 0.002% of params being trained. Can you try?

joris-sense · 2024-09-13T16:40:27Z

That still seems like a lot to me, it looks like the model's weights are stored unquantized as this is more than 2*8 GB of VRAM (as I understand, with QLORA the model weights should be stored with quantization as well).

I didn't get your new script to work on my machine with QLORA and stick to full finetuning for now. I seem to get different errors in each execution, but one of them was that flashattention complains during inference (which I copied from your last version -- I am also missing an inference part in the new one) that the model's weights are stored in float32 (as seen by

dtype = next(model.parameters()).dtype

print(dtype)

) and it throws an error "FlashAttention only support fp16 and bf16 data type"). If I convert the weights, the model does infer but it seems to be unrelated to what it is trained on. I also didn't get below 50 GB of VRAM, and sometimes still get out of memory errors, and with your new global variables, the thing didn't seem to find a GPU on a one-H100 setup (I know this is probably too vague to act upon, maybe I will figure out more and make a more reproducible report over the weekend). Also note that your notebook's default settings are still USE_LORA=False and USE_QLORA=False, and it still references model before it is defined in cell 8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on H100? #9

Running on H100? #9

joris-sense commented Sep 5, 2024

merveenoyan commented Sep 6, 2024

joris-sense commented Sep 6, 2024 •

edited

Loading

merveenoyan commented Sep 6, 2024

joris-sense commented Sep 6, 2024 •

edited

Loading

merveenoyan commented Sep 10, 2024

joris-sense commented Sep 13, 2024

Running on H100? #9

Running on H100? #9

Comments

joris-sense commented Sep 5, 2024

merveenoyan commented Sep 6, 2024

joris-sense commented Sep 6, 2024 • edited Loading

merveenoyan commented Sep 6, 2024

joris-sense commented Sep 6, 2024 • edited Loading

merveenoyan commented Sep 10, 2024

joris-sense commented Sep 13, 2024

joris-sense commented Sep 6, 2024 •

edited

Loading

joris-sense commented Sep 6, 2024 •

edited

Loading