You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My interpretation of get_custom_L2 is that L2 decay is applied not to the individual weights being trained, but instead to the deploy equivalent weights.
If this is the motivation, wouldn't the eq_kernel also incorporate the identity from the skip connection when self.rbr_identity is not None? Currently the contribution of rbr_identity in the eq_kernel in get_custom_L2 is missing. Was this intentional? Is there a reference or ablation for why you would exclude it?
The text was updated successfully, but these errors were encountered:
My interpretation of get_custom_L2 is that L2 decay is applied not to the individual weights being trained, but instead to the deploy equivalent weights.
If this is the motivation, wouldn't the
eq_kernel
also incorporate the identity from the skip connection whenself.rbr_identity is not None
? Currently the contribution of rbr_identity in the eq_kernel in get_custom_L2 is missing. Was this intentional? Is there a reference or ablation for why you would exclude it?The text was updated successfully, but these errors were encountered: