Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the required GPU memory for running this project? #4

Open
LwAiBug opened this issue Apr 24, 2023 · 13 comments
Open

What is the required GPU memory for running this project? #4

LwAiBug opened this issue Apr 24, 2023 · 13 comments

Comments

@LwAiBug
Copy link

LwAiBug commented Apr 24, 2023

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.17 GiB (GPU 0; 14.84 GiB total capacity; 11.81 GiB already allocated; 1.67 GiB free; 12.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Still getting errors even with 16GB GPU memory

@TuanTNG
Copy link

TuanTNG commented Apr 25, 2023

@LwAiBug
I meet the same error while running on 1080Ti GPU.

To solve the CUDA out-of-memory error, I discover that I need to resize the video resolution before using this annotation tool.
For example, my original video has a resolution of 1920x1200, I need to resize it to 960x600 (I use Python, read frames --> resize --> write frames to video), and then it worked properly.

@gaomingqi
Copy link
Owner

Hello, thanks for the feedback. The required GPU memory depends on the video's resolution. Below are some example resolutions and the estimated GPU memory usage:

Resolution Estimated GPU memory usage
1920 x 896 13GB
1280 x 720 7GB
576 x 320 5GB

@TuanTNG resizing is a good idea, and we are working on it and will enable the model to support resizing before tracking soon. Thanks.

@LwAiBug
Copy link
Author

LwAiBug commented Apr 25, 2023

I can understand ,that there will be a better solution without changing the video resolution in the future

@g8392
Copy link

g8392 commented Apr 25, 2023

running out of memory with inpainting, 1080p / rtx 3090, Linux 22.04

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.48 GiB (GPU 0; 23.69 GiB total capacity; 17.76 GiB already allocated; 625.69 MiB free; 18.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@teidenzero
Copy link

if you're on windows, go to command prompt and type
set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024
or
set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

close command prompt and try again

@teidenzero
Copy link

I am getting the same error on a 4090 with 24Gb trying to inpaint a video 1000x562

@gaomingqi
Copy link
Owner

gaomingqi commented Apr 26, 2023

@g8392 @teidenzero Sorry the GPU memory requirements listed above are only for tracking. The model would cost much more for inpainting. We have added resizing to down-scale the resolution before inpainting. Please see the screen shot below:

image

where you can determine the ratio for down-scaling via the sliding controller. Thanks.

@entrusc
Copy link

entrusc commented Apr 26, 2023

can you please provide rough estimates of how much VRAM is needed for inpainting for usual resolutions like 720p, 1080p? Thanks!

@zhangjingzj96
Copy link

Based on my personal experience, I have used E2FGVI inpainting to process many videos. For a 720x1280 resolution video, with over 40GB of GPU memory, you can process 100 frames of video at once. I usually set the neighbor_stride to 5, reducing this parameter can help to decrease the consumption of GPU memory, but may also reduce the quality of inpainting for some videos. I use an A40 graphics card, which can complete the inpainting of 100 frames of a 720x1280 resolution video in about 80 seconds. I hope this answer can be used as a reference.

@gaomingqi
Copy link
Owner

@entrusc @g8392 @teidenzero hello, the table below is the estimated GPU memory requirements for inpainting (with default config in E2FGVI) (OOM indicates > 48GB):

Resolution 50 frames 100 frames 1000 frames
1920 x 1080 OOM OOM OOM
1280 x 720 30GB 46GB OOM
720 x 480 13GB 21GB OOM
640 x 480 11GB 19GB OOM
320 x 240 4GB 4.5GB OOM
160 x 120 2.5GB 3GB 21GB

It is observed that GPU MEMORY requirement in E2FGVI depends on both video resolution and video length. This is because E2FGVI evenly samples frames as the temporal context. The longer the video the more video frames are involved during inpainting, leading to the Out-Of-Memory (OOM).

@zhangjingzj96 thanks for your information! 🚩 Now we have shifted E2FGVI from inpainting the whole video to sequentially inpainting a set of sub-videos (with fixed length, e.g., 50), which effectively decouples GPU memory requirements and video length. After modification, the estimated GPU memory requirements are (with default configurations):

Resolution 50 frames 100 frames 1000 frames
1920 x 1080 OOM OOM OOM
1280 x 720 30GB 46GB 46GB
720 x 480 13GB 21GB 21GB
640 x 480 11GB 19GB 19GB
320 x 240 4GB 4.5GB 4.5GB
160 x 120 2.5GB 3GB 3GB

As mentioned by @zhangjingzj96 , reducing neighbor_stride can further reduce GPU memory usage, it is accessible in Track-Anything in:

# config info for E2FGVI
neighbor_stride: 5
num_ref: -1
step: 10
# config infor for E2FGVI with splits (updated on 23/04/29)
num_subset_frames: 50
num_external_ref: 2 # (>=0)

Also, decreasing num_external_ref or increasing step can also reduce the memory usage.

Besides adjusting the E2FGVI configurations, Track-Anything supports resizing the video before inpainting, as mentioned in my previous response. Thanks.

@baaoh
Copy link

baaoh commented May 11, 2023

if we used Track-Anything to do a sequence of image masks, would the limit for video length be practically erased?

@ami012003
Copy link

ami012003 commented May 15, 2024

Error -
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 7.78 GiB total capacity; 440.23 MiB already allocated; 28.69 MiB free; 468.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

GPU / Driver Specifications - nvidia-smi
Wed May 15 11:40:29 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3070 Off | 00000000:21:00.0 On | N/A |
| 36% 44C P2 56W / 220W | 6936MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1192 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 2028 G /usr/lib/xorg/Xorg 81MiB |
| 0 N/A N/A 2155 G /usr/bin/gnome-shell 75MiB |
| 0 N/A N/A 2804 G ...seed-version=20240513-180116.288000 72MiB |
| 0 N/A N/A 4904 C python 6644MiB |
+---------------------------------------------------------------------------------------+

Already tried - 1. Setting max split size
2. torch.cuda.set_per_process_memory_fraction(max_split_size_mb)
3. cleaning torch cache
4. killing PID 4904
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

pytorch installed using the command - pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

please help!

@ahmadrajati
Copy link

For some tasks that don't need much masking corrections, you can edit "vos_tracking_video" function inside app.py and instead of passing all frames at once to model.generator(), you can call this method for every n frames, and use the last prediction as the template_mask for the frames in each step. You should also save results and clear memory in each step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants