Implementation question #5

dhpollack · 2019-06-21T13:19:29Z

I noticed that in your implementation you've clamped the values of the weight_norm to min of 0 and max of 10. I have seen this 10 before in other implementations and noticed that this comes from the first version of the lamb paper. However, this number refers to the trust_ratio and not the weight_norm. Have you done any further experiments with this or were you looking at other implementations of the paper and decided to use 10 for that reason. I also implemented lamb with both v1 and the latest version and I didn't notice a difference. Just wanted to know if you did additional testing or were aware of this issue.

The text was updated successfully, but these errors were encountered:

8enmann · 2019-06-21T17:20:23Z

I haven't tried different values, but yes I took 10 from v1 of the paper and am aware of the discrepancy. I have tested the algorithm on a large scale language model and it seems to scale well. I've also tracked the values of the weight norm of different layers and didn't see a clear reason to use a number other than 10. Let me know if you experiment and find a better value!

Tony-Y · 2019-08-28T07:21:33Z

https://gist.github.com/redknightlois/c4023d393eb8f92bb44b2ab582d7ec20#gistcomment-3010232

This comment on Ralamb may be helpful.

8enmann · 2019-09-02T21:54:38Z

@Tony-Y 's link shows a comment from the original author that they use identity function instead of clipping. Thanks Tony!

hitvoice · 2020-04-09T09:46:23Z

DeepSpeed trains Bert with LAMB and clips to [0.08,0.5]: https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/bert-pretraining.md#reproducing-bert-training-results-with-deepspeed

It's quite interesting and confusing that such different values are used in different implementations.

8enmann · 2020-04-24T18:25:25Z

Author open sourced theirs!
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation question #5

Implementation question #5

dhpollack commented Jun 21, 2019

8enmann commented Jun 21, 2019

Tony-Y commented Aug 28, 2019

8enmann commented Sep 2, 2019

hitvoice commented Apr 9, 2020

8enmann commented Apr 24, 2020

Implementation question #5

Implementation question #5

Comments

dhpollack commented Jun 21, 2019

8enmann commented Jun 21, 2019

Tony-Y commented Aug 28, 2019

8enmann commented Sep 2, 2019

hitvoice commented Apr 9, 2020

8enmann commented Apr 24, 2020