ReMoRa + DoRa improves on ReMoRa #10

catid · 2024-06-07T05:37:24Z

Thank you for sharing your results. In return I will share my own:

If you reformulate the code so that during the forward pass, it adds the decompressed MoRa weights into the nn.Linear weights, then you reduce the number of multiplies to the normal number. Furthermore, it becomes compatible with DoRa. In my testing, alternating between repeat and repeat_interleave (ReMoRa) improves on MoRa continued training, and ReMoRa + DoRa improves on ReMoRa.

kongds · 2024-06-07T06:01:36Z

Thanks for sharing the results and advice.

I have tested adding decompressed MoRA to the weight before, but it can be slow in large language models, which needs to copy the entire weight during the forward pass (maybe this can further optimized, since MoRA can directly copy its weight into origin linear instead of multiplication of two matrices like LoRA to merge back).

For ReMoRA + DoRA, are you adding DoRA and MoRA in a linear layer, which seems to use larger trainable parameters than ReMoRA? However, the idea of using both MoRA and LoRA in a linear layer seems interesting, and this might take advantage of both of them.

catid · 2024-08-09T20:24:11Z

Example: https://github.com/catid/dora/blob/9b2055d0b8dd73890e6fbca585a0e52a6a87dde3/dora.py#L66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReMoRa + DoRa improves on ReMoRa #10

ReMoRa + DoRa improves on ReMoRa #10

catid commented Jun 7, 2024

kongds commented Jun 7, 2024

catid commented Aug 9, 2024

ReMoRa + DoRa improves on ReMoRa #10

ReMoRa + DoRa improves on ReMoRa #10

Comments

catid commented Jun 7, 2024

kongds commented Jun 7, 2024

catid commented Aug 9, 2024