-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid running on_training_end
after Keyboard Interrupt
#1099
Comments
Hi! thanks for your contribution!, great first issue! |
Good point, it makes sense to skip eval of the unfinished train also when a user is it he doesn't want to wait for another hour lol Mind send a PR? 🤖 |
Thanks for bringing this use-case to our attention @lezwon , definitely something we'd want to consider. What if the What do you think? |
@jeremyjordan that sounds great. We could definitely do that. |
Not sure if we really need to add complexity to the existing callbacks... |
@Borda are you suggesting that we add a new callback ( |
in this moment I was just thinking about adding Trainer status... |
Ok yes I agree, I think the Trainer status would be a simple solution that may also be useful in other situations as well :) |
Cool, does anyone send a PR with Trainer status? I may suggest to implement it as enum/numbering similar like |
Will this status also account for validation/test completed/interrupted? |
yes we could do something like:
I can work on getting this into a PR |
Hey @jeremyjordan, what if we would like to execute different actions if the trainer failed during a validation task? Can we find out which task it failed at from the status? Just wondering if this would be flexible in such a scenario. |
@lezwon do you have an example in mind of where you'd need that? i'd prefer to keep the implementation simple and generic enough such that we don't keep adding new status types. |
Don't really have an example yet. Was just considering a scenario like that though. I guess we could go ahead with the current proposal you mentioned. That should solve my issue for sure 😊 |
@jeremyjordan sounds like a good idea! also I was thinking that since we have TrainerStatus, can it be the main class while TrainerMode servers as a nested enum instead of putting them all under a single enum? something like: class TrainerMode(enum.Enum):
TRAINING = enum.auto()
VALIDATING = enum.auto()
TESTING = enum.auto()
class TrainerStatus(enum.Enum):
PENDING = enum.auto()
FAILED = enum.auto()
INTERRUPTED = enum.auto()
...
MODE = TrainerMode then user can access the current status and mode through: # Interrupted when validation is running
if ... is TrainerStatus.INTERRUPTED and ... is TrainerStatus.MODE.VALIDATING: |
if I understand it correctly the |
not too sure about that, but probably the order matters too when new status or mode is added. so I guess we should use exact numbers like you suggested in case any import or export is involved! 🙂 |
i went back and forth on whether the trainer status would be a valuable addition. ultimately, i decided to opt for a simple attribute denoted when a |
🚀 Feature
Right now, due to #795 ,
on_training_end
runs after aKeyboardInterrupt
. This ends up running code that's meant to be run only after successful training completion. This feature should either be reverted, or an alternative should be provided, so as to run some code only after successful training.Motivation
I am training a model on Sagemaker and have added a notebook shutdown code within the
on_training_end
method. There were times where I had to manually cancel my model training because some parameters were incorrect. However, If I do that, the notebook shuts down immediately. This is because theon_training_end
method runs even after aKeyboardInterrupt
. I don't want my notebook shutting down after a keyboard interrupt, only after successful training completion.Pitch
Maybe add an
on_training_completed
method for code that's meant to be run after successful training.The text was updated successfully, but these errors were encountered: