Add support for gracefully exiting when preemptive instance are being retracted. #9567

tchaton · 2021-09-16T14:23:21Z

🚀 Feature

Right now, when the script is being killed, a fault tolerant checkpoint would be saved.
However, Lightning can't always ensure it is always fully happening in reproducible part of the codebase.

When running in the cloud, there is 2-3 min to kill the script.

The proposal is to add a mechanism to detect a killing signal as been sent and Lightning will terminate on the next reproducible part in this time windown.

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

_{Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.}

tchaton added the feature Is an improvement or enhancement label Sep 16, 2021

tchaton mentioned this issue Sep 16, 2021

[Feat] Add graceful detection of signal to exit + SignalConnector and merge SlurmConnector. #9566

Merged

12 tasks

justusschock assigned tchaton Sep 17, 2021

justusschock added this to the v1.5 milestone Sep 17, 2021

tchaton closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gracefully exiting when preemptive instance are being retracted. #9567

Add support for gracefully exiting when preemptive instance are being retracted. #9567

tchaton commented Sep 16, 2021

Add support for gracefully exiting when preemptive instance are being retracted. #9567

Add support for gracefully exiting when preemptive instance are being retracted. #9567

Comments

tchaton commented Sep 16, 2021

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡