Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered #6867

Closed
Mgithus opened this issue Aug 14, 2023 · 3 comments

Comments

@Mgithus
Copy link

Mgithus commented Aug 14, 2023

Data information:
The dataset info given on colab notebook of this code on monai website (https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb) is as follows:
Modality: MRI Size: 1470 3D volumes (1251 Training + 219 Validation)
In 1251 training samples each has 4 3D modalities and 1 3D segmentation mask in it.(1251*5 = 6255 total images)
image shape: (240, 240, 155)
label shape: (240, 240, 155)

Code information:
Trying to run following code from monai website without any modifications:
[https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb]

Error:

Epoch 0/4 569/1001 loss: nan time 0.83s
Epoch 0/4 570/1001 loss: nan time 4.15s
Traceback (most recent call last):
File "notebook_of_swin_unetr.py", line 429, in
) = trainer(
File "notebook_of_swin_unetr.py", line 346, in trainer
train_loss = train_epoch(
File "notebook_of_swin_unetr.py", line 261, in train_epoch
loss.backward()
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/tensor.py", line 214, in backward
return handle_torch_function(
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/overrides.py", line 1060, in handle_torch_function
result = overloaded_arg.torch_function(public_api, types, args, kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/data/meta_tensor.py", line 249, in torch_function
ret = super().torch_function(func, types, args, kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/tensor.py", line 995, in torch_function
ret = func(*args, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 99, in backward
torch.autograd.backward(outputs, args)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
(aug10) dlrs@spml3:~/Desktop/jul_25$ python -c 'import monai; monai.config.print_debug_info()'
"sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.

Printing MONAI config...

MONAI version: 1.0.0
Numpy version: 1.21.6
Pytorch version: 1.7.1+cu110
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 1700933
MONAI file: /home/dlrs/.local/lib/python3.8/site-packages/monai/init.py

Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 5.1.0
scikit-image version: 0.21.0
Pillow version: 10.0.0
Tensorboard version: 2.14.0
gdown version: 4.7.1
TorchVision version: 0.8.2+cu110
tqdm version: 4.66.1
lmdb version: 1.4.1
psutil version: 5.9.5
pandas version: 2.0.3
einops version: 0.6.1
transformers version: 4.31.0
mlflow version: 2.5.0
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================
Printing system config...

System: Linux
Linux version: Ubuntu 20.04.6 LTS
Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
Processor: x86_64
Machine: x86_64
Python version: 3.8.17
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/home/dlrs/.anaconda/navigator/Code/logs/20230814T112632/ptyhost.log', fd=39, position=0, mode='a', flags=33793), popenfile(path='/snap/code/137/usr/share/code/resources/app/node_modules.asar', fd=41, position=64064, mode='r', flags=32768), popenfile(path='/snap/code/137/usr/share/code/v8_context_snapshot.bin', fd=103, position=0, mode='r', flags=32768)]
Num physical CPUs: 4
Num logical CPUs: 4
Num usable CPUs: 4
CPU usage (%): [36.9, 40.8, 42.9, 49.4]
CPU freq. (MHz): 1994
Load avg. in last 1, 5, 15 mins (%): [59.2, 67.2, 75.0]
Disk usage (%): 45.8
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 15.6
Available memory (GB): 9.3
Used memory (GB): 5.7

================================
Printing GPU config...

Num GPUs: 1
Has CUDA: True
CUDA version: 11.0
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80']
GPU 0 Name: NVIDIA GeForce GTX 1080 Ti
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 28
GPU 0 Total memory (GB): 10.9
GPU 0 CUDA capability (maj.min): 6.1

@KumoLiu
Copy link
Contributor

KumoLiu commented Aug 15, 2023

Hi @Mgithus, I have tried this tutorial with MONAI v1.2 image and I can't reproduce the error. Could you please try it with the latest stable version?
Thanks!

@Mgithus
Copy link
Author

Mgithus commented Sep 10, 2023

Thnx @KumoLiu, I have tried but did not able to resolve it.

@KumoLiu
Copy link
Contributor

KumoLiu commented Dec 7, 2023

Hope this can help, https://discuss.pytorch.org/t/summarize-the-reasons-for-the-common-error-illegal-memory-access/130406.
Move to discussion, feel free to create another one.

@Project-MONAI Project-MONAI locked and limited conversation to collaborators Dec 7, 2023
@KumoLiu KumoLiu converted this issue into discussion #7298 Dec 7, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants