You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An optimizer that many folks have been interested in is Shampoo https://arxiv.org/abs/1802.09568 - its fans say it converges faster because it uses second order gradients but still manages to keep memory requirements in check
To further keep the memory requirements in check we can quantize it! There are some existing papers out there that are good recipes for how this could work for int4 https://arxiv.org/abs/2405.18144
Ideally the work above can be turned into a guide on how to implement a new low bit optimizer that people can follow and implement a new optimizer in a day's worth of work if they already understand how the optimizer they're trying to implement works
Opening this on behalf of @winglian
An optimizer that many folks have been interested in is Shampoo https://arxiv.org/abs/1802.09568 - its fans say it converges faster because it uses second order gradients but still manages to keep memory requirements in check
To further keep the memory requirements in check we can quantize it! There are some existing papers out there that are good recipes for how this could work for int4 https://arxiv.org/abs/2405.18144
As far as implementing the work we have many reference examples for int8, int4, fp8 adam and adamw https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim and we have in progress contribution here #1231
Ideally the work above can be turned into a guide on how to implement a new low bit optimizer that people can follow and implement a new optimizer in a day's worth of work if they already understand how the optimizer they're trying to implement works
cc @gau-nernst @andrewor14 @vkuzo @janeyx99 @supriyar
The text was updated successfully, but these errors were encountered: