Optimize DoRA computation when there is no dropout #2107

BenjaminBossan · 2024-09-27T14:10:16Z

Feature request

DoRA could be made faster and to use less memory if the base result were reused for DoRA. However, this is only equivalent if there is no dropout (because the base result will have dropout applied). Therefore, an optimization could be done when dropout=0 (i.e. when nn.Identity is used) or during eval mode.

Motivation

Faster and more memory efficient DoRA when there is no dropout. Experimentally, dropout is not crucial for training DoRA, see this comment.

Your contribution

I can work on this when I have a bit of time but contributions are very welcome.

The text was updated successfully, but these errors were encountered:

ariG23498 · 2024-10-01T04:24:29Z

Hey @BenjaminBossan I would love to work on this.

Should I create a PR and then have the rest of the conversation there?

BenjaminBossan · 2024-10-01T08:56:13Z

Thanks @ariG23498. Do as you like, if you have code feel free to create a (draft) PR, otherwise discussing here is also fine.

BenjaminBossan mentioned this issue Sep 27, 2024

[Add] DoRA Embedding #2006

Merged

2 tasks

BenjaminBossan added the contributions-welcome label Sep 27, 2024

ariG23498 mentioned this issue Oct 1, 2024

Optimize DoRA in eval and no dropout #2122

Merged

BenjaminBossan closed this as completed in #2122 Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DoRA computation when there is no dropout #2107

Optimize DoRA computation when there is no dropout #2107

BenjaminBossan commented Sep 27, 2024

ariG23498 commented Oct 1, 2024

BenjaminBossan commented Oct 1, 2024

Optimize DoRA computation when there is no dropout #2107

Optimize DoRA computation when there is no dropout #2107

Comments

BenjaminBossan commented Sep 27, 2024

Feature request

Motivation

Your contribution

ariG23498 commented Oct 1, 2024

BenjaminBossan commented Oct 1, 2024