-
Notifications
You must be signed in to change notification settings - Fork 16
hardcode refmodel.checkpoint.folder to empty #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
self.engine = ForgeEngine(ForgeJobConfig(**engine_config)) | ||
engine_config = ForgeJobConfig(**engine_config) | ||
engine_config.checkpoint.folder = ( | ||
"" # hardcode to empty to force load from initial_load_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does ref model get the initial weights after the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from initial_load_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So maybe add a validator on the ref model's config and only allow fields that differs per model. for example, Also put enable: true
and initial_load_in_hf: true
in the override together with the folder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a todo to make this configurable in the future? Sometimes you do want to update your reference model just very infrequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JenniferWang that would have to be on the titan side right? Also, I think this module will be reusable for any "forward pass" model in the near future and not just for reference model. So I don't know if it's worth doing anything more than this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this not just be accomplished by setting this in the config?
checkpoint:
enable: true
initial_load_path: hf://${model}
folder: ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allenwang28 Yes, it is the same. But I thought it would be a little confusing to the users.
@pbontrager If user want to change the reference model, they can just update the initial_load_path
in the config. Does it resolve some of your concern?
Solving #413
After this, the checkpoint for reference model will be load only from
initial_load_path
. User still have the flexibility to change ref model by modifyinginitial_load_path
.Tested on grpo/main.
ref_model
now will always load from theinitial_load_path
regardless of trainer's checkpoint config.