You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:
Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
The LoRA from this train produced expected results at inference when applied to the unquantized llama models
VRAM usage during the train was observed to be evenly seemed split between cards
GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.
The text was updated successfully, but these errors were encountered:
Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:
The text was updated successfully, but these errors were encountered: