-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformations in MiniViT paper #224
Comments
Hi @gudrb , thanks for your attention to our work! In Mini-DeiT, the transformation for MLP is the relative position encoding
In Mini-Swin, the transformation for MLP is the depth-wise convolution layer
|
On the MiniViT paper, We make several modifi�cations on DeiT: First, we remove the [class] token. The -> Does this mean that in MiniDeiT model, IRPE is utilized (for the value), and the MLP transformation is removed, leaving only the attention transformation? |
Yes. I correct my statement. There is no transformation for FFN in Mini-DeiT. iRPE is utilized for only the key.
|
Hello, I have a question regarding the implementation of layer normalization in the MiniViT paper and the corresponding code. Specifically, I am referring to how layer normalization is applied between transformer blocks. In the MiniViT paper, it is mentioned that layer normalization between transformer blocks is not shared, and I believe the code reflects this. However, I am confused about how the RepeatedModuleList applies layer normalization multiple times and how it ensures that the normalizations are not shared. Here is the relevant code snippet for the MiniBlock class:
thank you. |
Hi @gudrb , The following code creates a list of LayerNorm, where the number of LayerNorm is Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py Lines 145 to 146 in 4a13c40
RepeatedModuleList will select the Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py Lines 28 to 29 in 4a13c40
In Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py Lines 174 to 180 in 4a13c40
|
Hi @gudrb , here is the application of the weight transformation. Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py Lines 103 to 109 in 4a13c40
|
In the equation 7, we ignore the relative position encoding. |
Hello, I have a question about the transformations in the MiniViT paper.
I could find the first transformation (implemented in the MiniAttention class) in the code:
Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py
Line 104 in 4a13c40
However, i couldn't find the second transformation in the code (which should be before or inside the MLP in the MiniBlock class)
Cream/MiniViT/Mini-DeiT/mini_vision_transformer.py
Line 137 in 4a13c40
Could you please let me know where the second transformation is?
The text was updated successfully, but these errors were encountered: