You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, when the script is being killed, a fault tolerant checkpoint would be saved.
However, Lightning can't always ensure it is always fully happening in reproducible part of the codebase.
When running in the cloud, there is 2-3 min to kill the script.
The proposal is to add a mechanism to detect a killing signal as been sent and Lightning will terminate on the next reproducible part in this time windown.
Motivation
Pitch
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered:
🚀 Feature
Right now, when the script is being killed, a fault tolerant checkpoint would be saved.
However, Lightning can't always ensure it is always fully happening in reproducible part of the codebase.
When running in the cloud, there is 2-3 min to kill the script.
The proposal is to add a mechanism to detect a killing signal as been sent and Lightning will terminate on the next reproducible part in this time windown.
Motivation
Pitch
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered: