issues to fix reported from discord #11

winglian · 2023-05-05T04:57:55Z

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
The LoRA from this train produced expected results at inference when applied to the unquantized llama models
VRAM usage during the train was observed to be evenly seemed split between cards
GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.

The text was updated successfully, but these errors were encountered:

winglian · 2023-09-25T15:22:22Z

I think these have been resolved by now

winglian closed this as completed Sep 25, 2023

unknown-submitter-000 mentioned this issue Nov 1, 2023

Socket Timeout after 30 minutes running Issue #809

Closed

8 tasks

Provide feedback