-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base optimizer tracking #2126
Base optimizer tracking #2126
Conversation
You are owner of some files modified in this pull request. |
/cc @reedwm Can you give us a feedback here. What we need to correctly support Keras I see that in that case in Tensorflow your are just testing checkpoint but not See user proposed tests in #2102 |
The purpose of I'm not very familiar with the Note we test a SavedModel with LossScaleOptimizer here. |
@reedwm Thank you for the feedback. I am going to check your |
@reedwm I've tried to expand your test more in the style of the user proposed one and it is passing on Tensorflow: tensorflow/tensorflow#42749 In our Lookahead case at #2102 instead is passing only with Any hint? |
Hmm I'm not sure. What is the error? Just a guess, but maybe the issue is because you return additional weights from the addons/tensorflow_addons/optimizers/lookahead.py Lines 135 to 137 in ef8b0c9
AFAIK, no other optimizer does this. Maybe it's causing SavedModel to be confused. Also last time I checked, SavedModel does not save slot variables, but checkpoints do. Make sure your SavedModel test is not relying on slot variables being restored. Also note LossScaleOptimizer used to use an approach similar to this PR. tensorflow/tensorflow@f9e99f4 changed it, but before that commit SavedModel still worked, so this approach should be possible. /CC @k-w-w any ideas what the issue could be? |
As you can see we have slots in
|
I still see this @allenlavoie commit in the current master: Is it the motivation of this claim? |
I don't think that's related. IIRC it was because What's the error/issue if you try to save with SavedModel? |
The optimizer or the base/inner optimizer (or both) don't seems to be in the correct state after |
Hrm. Does tf.saved_model.save / tf.saved_model.load have the same issue? There's a bit of extra logic in Model.save/load_model that could be causing issues here. |
With |
@allenlavoie What is exactly the |
Cause I don't understand this case https://github.com/tensorflow/tensorflow/blame/master/tensorflow/python/keras/saving/saving_utils.py#L179 |
RestoredOptimizer is just a container for holding the restored slot variables. If nothing owned the tf.Variable objects they would just go out of scope immediately and be deleted. It doesn't implement any of the other Optimizer methods. Maybe this line is the issue: https://github.com/tensorflow/tensorflow/blob/be2870d0db035d12a207039922d76ae908b1101c/tensorflow/python/keras/optimizer_v2/optimizer_v2.py#L1391 When we tell SavedModel to load something back as a Python object, it takes a Python function to set attributes with. It's calling _set_hyper and passing the inner optimizer, and maybe that's not doing anything reasonable. You could try using a more complicated setter that e.g. typechecks for OptimizerV2s and uses setattr+_track_trackable or something. |
I don't find so much entries of What is the logic of:
|
/cc @omalleyt12 What is the use case of this condition that you have introduced? |
Ok nevermind I've seen the logic of RestoredOptimizer. |
I've just extracted a very small code snippet removing
|
@reedwm With Adam Is also failing the tensorflow test on tour test if the change is correct: tensorflow/tensorflow#42749. |
I commented in tensorflow/tensorflow#42749 (comment). Since slot variables are not restored and Adam has slot variables, the sample code you gave is expected to fail. |
Ok I commented also there. |
I think that this could be merged as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 😃
* Update lookahead.py Inital fix of tensorflow#2094 tensorflow#2102 * Fix linting * Resolve name conflict with mixed prexision * Track baseline optimizer in avg
Description
Brief Description of the PR:
Inner optimizer is not tracked for checkpoint
Fixes # (issue)
#2094
Type of change
Checklist:
How Has This Been Tested?
Tested in #2102
If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:
It still need to pass
save
andload_model
case.