Why use Hadamard product in Spatial Aggregation？ #18

jing-zhao9 · 2024-05-27T12:32:15Z

May I ask why concatenation is not used for feature aggregation in the Spatial Aggregation block？

Lupin1998 · 2024-05-29T00:46:08Z

Hi, @jing-zhao9, thanks for your question! This dot production of two branches is one of the efficient designs first proposed in MogaNet, which is also used in Mamba and its recently proposed variants. We called gating according to GLU and found it more powerful than other additive operations like the concatenation you mentioned. You might find an intuitive explanation of why gating operations are effective and efficient in StarNet. Feel free to discuss if there are more questions.

jing-zhao9 · 2024-05-29T07:00:27Z

Thank you for your careful answer！Thank you for your careful explanation! I have another question: why do I encounter gradient explosion when I apply the dot product proposed in MogaNet to my baseline model for training?

Lupin1998 · 2024-08-23T06:56:20Z

Sorry for the late reply. The gradient explosion might sometimes occur in MogaNet because of the gating branch in the Moga module. There are two possible ways: (1) Checking the NAN or Inf during training. If the gradient explosion occurs, resume the training at the previous checkpoint. (2) Removing the SiLU in the branch with multiple DWConv. Two SiLU activation functions provide strong non-linearity with small parameters but increase the risk of instability. You might trade-off the performance and training stability.

Lupin1998 added the question Further information is requested label May 29, 2024

Lupin1998 self-assigned this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use Hadamard product in Spatial Aggregation？ #18

Why use Hadamard product in Spatial Aggregation？ #18

jing-zhao9 commented May 27, 2024

Lupin1998 commented May 29, 2024

jing-zhao9 commented May 29, 2024

Lupin1998 commented Aug 23, 2024

Why use Hadamard product in Spatial Aggregation？ #18

Why use Hadamard product in Spatial Aggregation？ #18

Comments

jing-zhao9 commented May 27, 2024

Lupin1998 commented May 29, 2024

jing-zhao9 commented May 29, 2024

Lupin1998 commented Aug 23, 2024