-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inference a large model using NVMe offload. AssertionError: More elements 524709888 than buffer size 100,000,000 #3506
Comments
any updates on this, have the same issue? |
Actually, I just noticed @Markshilong had a tried changing @andre-bauer, can you please share the result of changing |
Another thing to try is to disable param prefetching/caching and reduce "zero_optimization": {
"stage": 3,
"offload_param": {
"device": "nvme",
"nvme_path": "/home/mark/Research/nvme_offload_path",
"buffer_count": 2,
"buffer_size": 6e8,
"max_in_cpu": 0
},
"reduce_bucket_size": model_hidden_size * model_hidden_size,
"stage3_prefetch_bucket_size": 0,
"stage3_max_live_parameters": 1e8,
"stage3_max_reuse_distance": 0,
"stage3_param_persistence_threshold": 10 * model_hidden_size
},
|
@tjruwase Thank you for your answer, changeing |
@andre-bauer, can you please share a stack trace of the OOM? |
Any update on this issue. I got the same error. [2024-01-22 16:50:44,316] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 26.4 GB, percent = 84.6% I am fine-tuning GPT-3 6.7B with a single GPU of RTX 3090 24G memory. This is my config file:{ "zero_optimization": {
"gradient_clipping": 1.0, "fp16": { "bf16": { "wall_clock_breakdown" : false This is my history:[2024-01-22 16:50:41,962] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.7+870ae041, git-hash=870ae041, git-branch=master Big appreciation for any help. |
Describe the bug
I'm trying to run inference of a 54 billion model (facebook/nllb-moe-54b) using NVMe offload on my laptop with a RTX 3060 (6GB GPU memory). But I get a Error message:
AssertionError: More elements 524709888 than buffer size 100,000,000
Full error message is:
The ds_config is:
Then I try to change "buffer_size" in "offload_param" of ds_config from 1e8 to 6e8, but then I got 'CUDA out of memory' like this:
If I have a GPU with larger memory, it may works but how can I run NVMe offload on this 6GB memory GPU?
To Reproduce
Steps to reproduce the behavior:
My script is modified from huggingface/transformers#16616
Here is my script. I use 'deepspeed --num_gpus 1 nllb_ZeRO_inference.py' to run.
Screenshots

System info (please complete the following information):
The text was updated successfully, but these errors were encountered: