Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there are so many question like "RuntimeError: Some background workers are no longer alive", BUT no one can resove it #2659

Open
Diting-li opened this issue Dec 28, 2024 · 2 comments
Assignees

Comments

@Diting-li
Copy link

No description provided.

@Diting-li
Copy link
Author

My computing resources are i9 14900K, 96GB RAM, 4090GPU 24GB. When I perform Inflance on a CT image, it cannot be executed and displays GPU OOM. After transferring to the CPU, there is a memory crash, and my file: shape (576, 768, 768)
Direction: (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)
Image starting point: (-384.0, -384.0, -72.5)
Image slice spacing: (0.30000001192092896, 0.30000001192092896, 0.30000001192092896)

@dongdongtong
Copy link

Similar bugs in nnunetv2 with pytorch 2.5.0, cuda 12.1, python 3.12.8.

When I am running the nnunetv2 framework, I am keeping to get the similar errors reporting that

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
  File "/home/ziyang/anaconda3/envs/nnunet/bin/nnUNetv2_train", line 8, in <module>
    sys.exit(run_training_entry())
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 275, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/run/run_training.py", line 211, in run_training
    nnunet_trainer.run_training()
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1370, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__
    item = self.__get_next_item()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-3 (results_loop):
Traceback (most recent call last):
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "/home/ziyang/anaconda3/envs/nnunet/lib/python3.12/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

But I cannot figure it out where the bug is. When I add nnUNet_n_proc_DA=0 to run the training command, the bug disappears, but the cost may be the slower running time.

nnUNet_n_proc_DA=0 nnUNetv2_train 162 3d_fullres 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants