-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unified Checkpoint] update async save logic #9274
Conversation
Thanks for your contribution! |
ddb768e
to
992063c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9274 +/- ##
===========================================
+ Coverage 52.95% 53.08% +0.13%
===========================================
Files 657 657
Lines 106478 106521 +43
===========================================
+ Hits 56383 56547 +164
+ Misses 50095 49974 -121 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* update async save signal * fix async save hang
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix --------- Co-authored-by: Weiguo Zhu <DrownFish19@gmail.com>
* [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix * bug fix * [Trainer] fix save_model (#9286) * bug fix * bug fix --------- Co-authored-by: Weiguo Zhu <DrownFish19@gmail.com>
PR types
Others
PR changes
Others
Description
output_signal_dir
, which is used to save asynchronous saving signal.