Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different claims for the paper and the code on attention regularization #18

Open
tao0420 opened this issue Dec 18, 2019 · 2 comments
Open

Comments

@tao0420
Copy link

tao0420 commented Dec 18, 2019

Hi there,

Thanks for the contribution! After reading the code, I am kind of confused on the attention regularization part. Please correct me if there is some misunderstanding.

From the code, what I understand for the center loss part is that for every class(label), you have a center for the features and obviously those features are also used for softmax classification with multiplying a scale 100. However, what you claimed in the paper is that the center loss is used for the attention regularization which will assign each attention feature in the feature matrix a center. The equation you used in the paper for center loss is the sum of distance difference between those attention features ("with an distinguished M in the equation").

Is there any explanation of doing this?

@LawrenceXia2008
Copy link

LawrenceXia2008 commented Jun 7, 2020

I have the same question, can anyone help explain this Thank the future helpers~

@17314796423
Copy link

same question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants