-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New OOM bug introduced in bitsandbytes 0.38.x? #324
Comments
Hi, thanks for reporting this! I have investigated this and proposed a fix in #330. Overall, it looks like the issue is caused by weight layout conversions when |
similar problem when finetune with Environment nvidia card: A100 80G Log finetune llama-30b with lora failed using FastChat and then try to use
after loading the model, it costs about ~34G. and then rises to ~53G after
Solution |
does anyone know if this issue is fixed in 0.39? |
@cfhammill This is not fixed in 0.39.0. Note this oom bug only happens on 8bit. 4bit is fine. I am trying to figure out what is causing this. |
Anyone found a workaround to this? Really frustrating bug making it basically impossible to save the state_dict of large 8bit models. Downgrading does not seem to be a good solution. cc @TimDettmers |
Hi everybody, and sorry for the delay with this issue! I took a closer look at the underlying problem, and the issue seems to be here: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L335-L338. To the best of my understanding, the problem is that the current logic duplicates the memory usage of each I made an attempt to fix this problem in #503: in my setup, the OOM issue seems to disappear, but the solution involves a memory overhead when you load the checkpoint (which should be not more than ~100-200 MB of GPU RAM for most setups) @KukumavMozolo @better629 @cfhammill @Qubitium @psinger if you have the time, I'd be very happy if you could try a fix from the PR above and see if it resolves the issue in your case. Also, if the GPU memory overhead at checkpoint loading time is not acceptable, we can try to think of another solution |
Thank you, this has been addressed in 2d321a7. |
The downgrade to 0.37 was to address bitsandbytes-foundation/bitsandbytes#324, but this has been addressed in 2d321a7524cd5b which landed in 0.39.1.
Hi there, apparently 0.38.1 or 0.38.0 introduced a bug that increases memory consumption by a lot when trying to save a model.
For details see this post in the alpaca-lora githubg.
The bug doesn't happen when using bitsandbytes==0.37.2
Note, while this script says otherwise i am pretty sure i have cuda 11.7 installed
` python -m bitsandbytes
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: CUDA runtime path found: ...lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 102
CUDA SETUP: Required library version not found: libbitsandbytes_cuda102.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=102
python setup.py install
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: CUDA runtime path found: .../lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 102
CUDA SETUP: Required library version not found: libbitsandbytes_cuda102.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=102
python setup.py install
CUDA SETUP: Setup Failed!
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: CUDA runtime path found:.../lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 102
CUDA SETUP: Required library version not found: libbitsandbytes_cuda102.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=102
python setup.py install
CUDA SETUP: Setup Failed!
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: CUDA runtime path found: .../lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 102
CUDA SETUP: Required library version not found: libbitsandbytes_cuda102.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=102
python setup.py install
CUDA SETUP: Setup Failed!
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=102
python setup.py install
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 146, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
import(pkg_name)
File ".../lib/python3.10/site-packages/bitsandbytes/init.py", line 7, in
from .autograd._functions import (
File ".../lib/python3.10/site-packages/bitsandbytes/autograd/init.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File ".../lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 9, in
import bitsandbytes.functional as F
File ".../lib/python3.10/site-packages/bitsandbytes/functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File ".../lib/python3.10/site-packages/bitsandbytes/cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
`
The text was updated successfully, but these errors were encountered: