Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value #84

tohneecao · 2023-04-27T06:29:15Z

OutOfMemoryError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 6; 79.19 GiB total capacity; 66.51 GiB already allocated; 61.56 MiB free; 67.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda
.OutOfMemoryError : return forward_call(*input, **kwargs)CUDA out of memory. Tried to allocate 62.00 MiB (GPU 4; 79.19 GiB total capacity; 66.51 GiB already allocated; 61.56 MiB free; 67.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

mayank31398 · 2023-05-10T01:36:23Z

max_split_size_mb won't work with deepspeed inference I think.
This is only for pure pytorch native code.

tohneecao changed the title ~~Big batchsize cause OOM~~ Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value #84

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value #84

tohneecao commented Apr 27, 2023

mayank31398 commented May 10, 2023

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value #84

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value #84

Comments

tohneecao commented Apr 27, 2023

mayank31398 commented May 10, 2023