Possible code difference with paper #1

Markin-Wang · 2021-10-15T03:00:45Z

Hi, thanks for sharing your work.

Could you share your GPU card setting on the experiments?
Besides, I ran your code and was confused about the memory responding.
In the paper, you perform linear transformation on the queried vector first and then obtain the responses.
However, in you code, it seems that no linear transformation is performed on the queried vector, but a linear transformation
is applied to the reponses, as shown below.

selected_value = torch.gather(dummy_value, 3, dummy_idx) 

p_attn = F.softmax(selected_scores, dim=-1)  

if dropout is not None:
    p_attn = dropout(p_attn)
return torch.matmul(p_attn.unsqueeze(3), selected_value).squeeze(3), p_attn

\

    x = x.transpose(1, 2).contiguous() .view(nbatches, -1, self.h * self.d_k)

    if layer_past is not None:
        return self.linears[-1](x), present
    else:
        return self.linears[-1](x)

Another problem is about the encoder-decoder.
In the paper, the memory reponses are fed into the encoder, but it seems that you directly add the original att_feats and the reponses, and then send the updated att_feats to the encoder in your code.

att_feats = att_feats + responses in _prepare_feature_forward function.

I am grateful if you can provide further information.

Best Regards.
Jun

The text was updated successfully, but these errors were encountered:

wyh196646 · 2022-08-31T05:39:56Z

I have the same problem，the implemention is different with paper,which really make me confused,and I also want to know the model how to update the memory matrix?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible code difference with paper #1

Possible code difference with paper #1

Markin-Wang commented Oct 15, 2021 •

edited

Loading

wyh196646 commented Aug 31, 2022

Possible code difference with paper #1

Possible code difference with paper #1

Comments

Markin-Wang commented Oct 15, 2021 • edited Loading

wyh196646 commented Aug 31, 2022

Markin-Wang commented Oct 15, 2021 •

edited

Loading