-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about why the add&norm structure of the tranformer network differ from the typical transformer one #24
Comments
I used centerformer/det3d/models/utils/transformer.py Lines 218 to 238 in 96aa375
|
I see, but I wonder if you have tried Add&Norm after each layer, which means the residual skip connect input are the features already passed through the Norm. Is it possible that the results of these two structures do not differ much? |
Sorry, I haven't tried Add&Norm after each layer. Do you have experience with this before and would the results be better if you used this implementation? |
centerformer/det3d/models/utils/transformer.py
Lines 267 to 279 in 96aa375
In the code, the residual in transformer is only the input after add and does not pass through the norm layer. add and norm are not taken as a whole, which is different from the typical transformer structure (the result of add and norm in series as a new level of input). Is there any special consideration for the design here?
The text was updated successfully, but these errors were encountered: