Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues to fix reported from discord #11

Closed
winglian opened this issue May 5, 2023 · 1 comment
Closed

issues to fix reported from discord #11

winglian opened this issue May 5, 2023 · 1 comment

Comments

@winglian
Copy link
Collaborator

winglian commented May 5, 2023

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

  1. Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
  2. The LoRA from this train produced expected results at inference when applied to the unquantized llama models
  3. VRAM usage during the train was observed to be evenly seemed split between cards
  4. GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
  5. Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.
@winglian
Copy link
Collaborator Author

I think these have been resolved by now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant