-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why out_l = torch.cat((out_l_1, out_l), dim=1) ? #11
Comments
Actually, we explicitly concatenate the features along the channel to increase the embedding dimension of the sentence-like input. We think this operation can improve the distinguishability of spatial features and the experimental results also support this.
Nice question. We have chosen not to use the concat-based skip connection in order to optimize running efficiency. However, our experiments have shown that the addition-based skip connection actually results in worse place recognition performance, and this warrants further analysis. It's possible that the triplet loss function provides a relatively "soft" constraint for training the place recognition network. Although the addition-based skip connection can accelerate the reduction of triplet loss values during training, it may not lead to a corresponding increase in recall on the test set. Btw, thanks for your such interest in our work, and you can follow our latest place recognition work CVTNet which can provide even better recognition results. |
Thank you for your help with the features. I have learned a lot from your excellent SeqOT. Can I ask you three more questions? |
The NetVlad aggregation method you are using is more convenient than the original NetVlad algorithm because it does not require caching of features extracted from the model backbone, and therefore does not require clustering operations based on these features. However, what is the mathematical basis for your approach? |
SeqOT/modules/seqTransformerCat.py
Line 73 in d940882
out_l = torch.cat((out_l_1, out_l), dim=1)
This operation, out_l = torch.cat((out_l_1, out_l), dim=1), is actually using a concept similar to ResNet's skip connections, where features from previous layers are combined with features from later layers.Right?
Why not directly perform element-wise addition, which would also maintain the data dimensionality?
Why isn't there a similar skip connection for the output of
transformer_encoder2
at the same time?The text was updated successfully, but these errors were encountered: