-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a callback to write huggingface checkpoints during the training run #594
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in Composer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will defer to Evan on this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice looks pretty good. Def doable to make a SaveForInferenceCallback I think in the future. Also some good nuggets here for refactoring CheckpointSaver
At some point yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a flag to only save at end? I imagine the most common use case is I just want to convert to HF once im done training
@mvpatel2000 can't you just specify 1dur as the save interval? |
🤯 ur so big brained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aight LGTM
yes |
This adds a callback that does the huggingface checkpoint conversion during the training job to avoid further difficulty after the job is complete, and take advantage of backgrounded uploads while training is happening.
7b-mpt-hf-ckpt-4-jc3uSG
)l7b-hf-ckpt-4-HpZUAg
)