Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end_scale argument #975

Merged

Conversation

stefanocortinovis
Copy link
Contributor

This PR adds an end_scale to contrib.reduce_on_plateau():

  • This argument mimics the behaviour of min_lr in torch.optim.ReduceLROnPlateau by allowing the user to specify a threshold for the scale at which to stop the learning rate decay.
  • The implementation handles both the case of factor < 1.0 (in which case end_scale is treated as a lower bound) and factor > 1.0 (in which case `end_scale is treated as an upper bound).
  • For consistency, the implementation follows the one for end_value in optax.exponential_decay.
  • A new unit test for this functionality is added.

@fabianp
Copy link
Member

fabianp commented May 27, 2024

Thanks @stefanocortinovis for the contribution! The code looks good to me but I'm not so sure about the name.

What would you think about end_factor so it references the fact that it's related to the factor keyword?

@stefanocortinovis
Copy link
Contributor Author

Thanks for the quick feedback, @fabianp! I named it end_scale because it is effectively an upper/lower bound for the scale attribute of the ReduceLROnPlateauState class. Anyway, I'm happy to change it to whatever name you think it's best.

@fabianp
Copy link
Member

fabianp commented May 27, 2024

ah, I see now. Thanks for the response, makes sense

@fabianp
Copy link
Member

fabianp commented May 27, 2024

the reason I'm not convinced with end_scale is that scale relates more to the internal implemented than to the other parameters we expose. In other words, it be confusing for a user just reading the doc what scale refers to.

Anyway, I also couldn't come up with a better name, so let's give this 24h if merge it if nobody comes up with a better suggestion

@vroulet
Copy link
Collaborator

vroulet commented May 27, 2024

Thanks @stefanocortinovis !
Why would the factor be greater than 1?
If we allow for a factor greater than 1 we are not "reducing" the lr and the name of the function may need to be changed.
If we agree that the factor should be lower than 1:

  • we may add an assertion to ensure that.
  • we may call the new end_scale as min_lr which is quite clear.

@stefanocortinovis
Copy link
Contributor Author

Thanks for the suggestion, @vroulet!

Yeah, I was also wondering that. Personally, I have only seen reduce_on_plateau() used to do what its name suggests. I opted for allowing scale > 1.0 for consistency with e.g. exponential_decay, which explicitly allows for rate > 1.0. It would probably makes sense to keep things simple as you recommend though.

In principle, I agree that min_lr would be ideal to communicate the meaning of the argument. However, since reduce_on_plateau() is used to scale the parameter updates directly, it's not exactly the learning rate applied to them. To make myself clear, standard gradient descent with lr = 1e-1 and end_scale = 1e-3 would imply scale >= 1e-3, and hence lr >= 1e-4. That is, end_scale and the actual minimum for lr might be different. Do you think that would be clear anyway?

@vroulet
Copy link
Collaborator

vroulet commented May 27, 2024

Then min_scale would be perfect no?

@fabianp
Copy link
Member

fabianp commented May 27, 2024

Then min_scale would be perfect no?

+1, I like this

@stefanocortinovis
Copy link
Contributor Author

Sounds good! I made the change from end_scale to min_scale and added an assertion to check the value of factor.

@copybara-service copybara-service bot merged commit 2d286b4 into google-deepmind:main Jun 3, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants