Optimizer load gathered state and record delta feature are supported now #184
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
AdamOffloadOptimizer new feature
Description
为AdamOffload提供两个新的新接口
位于AdamOffloadOptimizer的__init__(self)里传递record_delta=True,将记录adam每个step更新量的统计信息。
AdamOffloadOptimizer的state_dict(self)里传递gather=True,将对所有rank的opt进行gather得到完整的grad ckpt(注意该功能在大模型上可能会使cpu内存炸掉),并且该完整ckpt可以使用optimizer.load_state_dict(torch.load(filename))直接加载(无论在单卡还是多卡),即多卡可以同时加载一个完整的grad ckpt,因为在写文件时会添加一个标识意味着该文件为完整ckpt,从而load每个rank内再切出属于自己的那一份grad
Type of Change
Checklist
Additional Information
Any additional information, configuration, or data that might be necessary for the review.