Skip to content

is_cuda_out_of_memory Misses CUDA OOM Errors #6819

@ejohb

Description

@ejohb

🐛 Bug

pytorch_lightning.utilities.memory.is_cuda_out_of_memory does not work reliably, causing things like auto_scale_batch_size tuning to fail.

Please reproduce using the BoringModel

I'm afraid I don't know what BoringModel is.

To Reproduce

This returns false. is_cuda_out_of_memory(RuntimeError('CUDA error: out of memory'))

The function only checks for the string CUDA out of memory.

Expected behavior

The above should return True.

Environment

  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3090
    - NVIDIA GeForce RTX 2060 SUPER
    - available: True
    - version: 11.1
  • Packages:
    - numpy: 1.20.1
    - pyTorch_debug: False
    - pyTorch_version: 1.8.0+cu111
    - pytorch-lightning: 1.2.4
    - tqdm: 4.59.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor:
    - python: 3.7.9
    - version: Proposal for help #1 SMP Tue Jun 23 12:58:10 UTC 2020
root@eb:~# neofetch
       _,met$$$$$gg.          root@eb
    ,g$$$$$$$$$$$$$$$P.       -------
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 10 (buster) on Windows 10 x86_64
 ,$$P'              `$$$.     Kernel: 4.19.128-microsoft-standard
',$$P       ,ggs.     `$$b:   Uptime: 19 hours, 44 mins
`d$$'     ,$P"'   .    $$$    Packages: 594 (dpkg)
 $$P      d$'     ,    $$P    Shell: bash 5.0.3
 $$:      $$.   -    ,d$$'    Terminal: /dev/pts/4
 $$;      Y$b._   _,d$P'      CPU: AMD Ryzen Threadripper 3970X 32- (64) @ 3.693GHz
 Y$$.    `.`"Y$$$$P"'         Memory: 23149MiB / 257643MiB
 `$$b      "-.__
  `Y$$
   `Y$$.
     `$$b.
       `Y$$b.
          `"Y$b._
              `"""

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked ontuner

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions