-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processes are terminated in multi-GPU setting when using multiple models and seeds #2519
Comments
Hi @KunzstBGR, This issue seems to be come from PytorchLightning and not Darts. It might also arise from the fact that you use multi-gpu. Can you check if it persists when you use Have you tried to change Also, is it normal that you don't save checkpoints or generate any kind of forecasts in your code snippet? |
Hi @madtoinou ,
|
Nice, I would not be able to tell why swapping the order of the loops fixed it but as long as it works, it's great! All good if you save the checkpoints and perform evaluation in a separate loop, I was just curious since it was not visible in the code snippet. It's indeed better to do it separately. If the issue is solved, can you please close it? |
Hi,
When comparing multiple models and multiple seeds using a nested loop, all processes are terminated when the loop switches from one model class to the next. Does anyone have an idea why? Maybe I'm doing this wrong. Or is this a pytorch-lightning issue?
Error message:
Child process with PID 652 terminated with code 1. Forcefully terminating all other processes to avoid zombies
Relevant code snippet:
The text was updated successfully, but these errors were encountered: