Add end_scale argument #975

stefanocortinovis · 2024-05-27T12:45:44Z

This PR adds an end_scale to contrib.reduce_on_plateau():

This argument mimics the behaviour of min_lr in torch.optim.ReduceLROnPlateau by allowing the user to specify a threshold for the scale at which to stop the learning rate decay.
The implementation handles both the case of factor < 1.0 (in which case end_scale is treated as a lower bound) and factor > 1.0 (in which case `end_scale is treated as an upper bound).
For consistency, the implementation follows the one for end_value in optax.exponential_decay.
A new unit test for this functionality is added.

fabianp · 2024-05-27T14:43:39Z

Thanks @stefanocortinovis for the contribution! The code looks good to me but I'm not so sure about the name.

What would you think about end_factor so it references the fact that it's related to the factor keyword?

stefanocortinovis · 2024-05-27T14:50:46Z

Thanks for the quick feedback, @fabianp! I named it end_scale because it is effectively an upper/lower bound for the scale attribute of the ReduceLROnPlateauState class. Anyway, I'm happy to change it to whatever name you think it's best.

fabianp · 2024-05-27T15:05:11Z

ah, I see now. Thanks for the response, makes sense

fabianp · 2024-05-27T15:17:17Z

the reason I'm not convinced with end_scale is that scale relates more to the internal implemented than to the other parameters we expose. In other words, it be confusing for a user just reading the doc what scale refers to.

Anyway, I also couldn't come up with a better name, so let's give this 24h if merge it if nobody comes up with a better suggestion

vroulet · 2024-05-27T15:55:44Z

Thanks @stefanocortinovis !
Why would the factor be greater than 1?
If we allow for a factor greater than 1 we are not "reducing" the lr and the name of the function may need to be changed.
If we agree that the factor should be lower than 1:

we may add an assertion to ensure that.
we may call the new end_scale as min_lr which is quite clear.

stefanocortinovis · 2024-05-27T16:13:33Z

Thanks for the suggestion, @vroulet!

Yeah, I was also wondering that. Personally, I have only seen reduce_on_plateau() used to do what its name suggests. I opted for allowing scale > 1.0 for consistency with e.g. exponential_decay, which explicitly allows for rate > 1.0. It would probably makes sense to keep things simple as you recommend though.

In principle, I agree that min_lr would be ideal to communicate the meaning of the argument. However, since reduce_on_plateau() is used to scale the parameter updates directly, it's not exactly the learning rate applied to them. To make myself clear, standard gradient descent with lr = 1e-1 and end_scale = 1e-3 would imply scale >= 1e-3, and hence lr >= 1e-4. That is, end_scale and the actual minimum for lr might be different. Do you think that would be clear anyway?

vroulet · 2024-05-27T16:28:59Z

Then min_scale would be perfect no?

fabianp · 2024-05-27T16:30:51Z

Then min_scale would be perfect no?

+1, I like this

stefanocortinovis · 2024-05-28T09:47:37Z

Sounds good! I made the change from end_scale to min_scale and added an assertion to check the value of factor.

stefanocortinovis added 2 commits May 27, 2024 13:00

Add end_scale argument

fc5a30d

Fix pylint warning

4defc9a

Rename end_scale to min_scale and assert factor in (0,1)

588661a

fabianp approved these changes May 29, 2024

View reviewed changes

stefanocortinovis mentioned this pull request May 29, 2024

Fix error with x64 loss #976

Merged

fabianp approved these changes May 30, 2024

View reviewed changes

copybara-service bot merged commit 2d286b4 into google-deepmind:main Jun 3, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add end_scale argument #975

Add end_scale argument #975

stefanocortinovis commented May 27, 2024

fabianp commented May 27, 2024

stefanocortinovis commented May 27, 2024

fabianp commented May 27, 2024

fabianp commented May 27, 2024

vroulet commented May 27, 2024

stefanocortinovis commented May 27, 2024

vroulet commented May 27, 2024

fabianp commented May 27, 2024

stefanocortinovis commented May 28, 2024

Add end_scale argument #975

Add end_scale argument #975

Conversation

stefanocortinovis commented May 27, 2024

fabianp commented May 27, 2024

stefanocortinovis commented May 27, 2024

fabianp commented May 27, 2024

fabianp commented May 27, 2024

vroulet commented May 27, 2024

stefanocortinovis commented May 27, 2024

vroulet commented May 27, 2024

fabianp commented May 27, 2024

stefanocortinovis commented May 28, 2024