Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible code difference with paper #1

Open
Markin-Wang opened this issue Oct 15, 2021 · 1 comment
Open

Possible code difference with paper #1

Markin-Wang opened this issue Oct 15, 2021 · 1 comment

Comments

@Markin-Wang
Copy link

Markin-Wang commented Oct 15, 2021

Hi, thanks for sharing your work.

Could you share your GPU card setting on the experiments?
Besides, I ran your code and was confused about the memory responding.
In the paper, you perform linear transformation on the queried vector first and then obtain the responses.
However, in you code, it seems that no linear transformation is performed on the queried vector, but a linear transformation
is applied to the reponses, as shown below.

selected_value = torch.gather(dummy_value, 3, dummy_idx) 

p_attn = F.softmax(selected_scores, dim=-1)  

if dropout is not None:
    p_attn = dropout(p_attn)
return torch.matmul(p_attn.unsqueeze(3), selected_value).squeeze(3), p_attn

\

    x = x.transpose(1, 2).contiguous() .view(nbatches, -1, self.h * self.d_k)

    if layer_past is not None:
        return self.linears[-1](x), present
    else:
        return self.linears[-1](x)

Another problem is about the encoder-decoder.
In the paper, the memory reponses are fed into the encoder, but it seems that you directly add the original att_feats and the reponses, and then send the updated att_feats to the encoder in your code.

att_feats = att_feats + responses in _prepare_feature_forward function.

I am grateful if you can provide further information.

Best Regards.
Jun

@wyh196646
Copy link

I have the same problem,the implemention is different with paper,which really make me confused,and I also want to know the model how to update the memory matrix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants