Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 6.00 GiB total capacity; 118.62 MiB already allocated; 4.20 GiB free; 362.00 MiB reserved in total by PyTorch) #1698

Closed
gchihiha opened this issue Dec 16, 2020 · 6 comments
Labels
bug Something isn't working Stale Stale and schedule for closing soon

Comments

@gchihiha
Copy link

python train.py --img 416 --batch 4 --epochs 300 --data ./test_train_datas/data.yaml --cfg models/yolov5s.yaml --weights ./weights/yolov5s.pt

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\conv.py", line 415, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 6.00 GiB total capacity; 118.62 MiB already allocated; 4.20 GiB free; 362.00 MiB reserved in total by PyTorch)

python -V
Python 3.8.6

pip list
Package Version


absl-py 0.11.0
attr 0.3.1
attrs 20.3.0
cachetools 4.1.1
certifi 2020.12.5
chardet 3.0.4
coremltools 4.0
cycler 0.10.0
Cython 0.29.21
future 0.18.2
google-auth 1.23.0
google-auth-oauthlib 0.4.2
grpcio 1.34.0
idna 2.10
kiwisolver 1.3.1
Markdown 3.3.3
matplotlib 3.3.3
mpmath 1.1.0
numpy 1.19.4
oauthlib 3.1.0
onnx 1.7.0
onnx-simplifier 0.2.19
onnxoptimizer 0.1.1
onnxruntime 1.6.0
opencv-python 4.4.0.46
packaging 20.7
Pillow 8.0.1
pip 20.3.1
protobuf 3.14.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 2.4.7
python-dateutil 2.8.1
PyYAML 5.3.1
requests 2.25.0
requests-oauthlib 1.3.0
rsa 4.6
scipy 1.5.4
setuptools 49.2.1
six 1.15.0
sympy 1.7
tensorboard 2.4.0
tensorboard-plugin-wit 1.7.0
torch 1.6.0+cu101
torchaudio 0.7.2
torchvision 0.7.0+cu101
tqdm 4.54.1
typing-extensions 3.7.4.3
urllib3 1.26.2
Werkzeug 1.0.1
wheel 0.36.1

@gchihiha gchihiha added the bug Something isn't working label Dec 16, 2020
@gchihiha
Copy link
Author

This problem seems to be related to my physical memory usage

@glenn-jocher
Copy link
Member

@gchihiha try smaller --batch-size or smaller --img-size, or smaller model, or larger GPU.

@Mulbetty
Copy link

@gchihiha try smaller --batch-size or smaller --img-size, or smaller model, or larger GPU.

The error appears when I have trained about 50 epoches, I dont know the reason. And when I retrained again, it appeared at the first epoch.

@gchihiha
Copy link
Author

@gchihiha try smaller --batch-size or smaller --img-size, or smaller model, or larger GPU.

The error appears when I have trained about 50 epoches, I dont know the reason. And when I retrained again, it appeared at the first epoch.

see your pc memory(not gpu memory).

@Mulbetty
Copy link

@gchihiha try smaller --batch-size or smaller --img-size, or smaller model, or larger GPU.

The error appears when I have trained about 50 epoches, I dont know the reason. And when I retrained again, it appeared at the first epoch.

see your pc memory(not gpu memory).

The pc memory just used 50 persent, But the cuda usages wolud jump to 100 persent and the gpu memory was used 4g of 6. I kill all other process, and it worked.

@gchihiha gchihiha mentioned this issue Dec 22, 2020
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

3 participants