You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you share your GPU card setting on the experiments?
Besides, I ran your code and was confused about the memory responding.
In the paper, you perform linear transformation on the queried vector first and then obtain the responses.
However, in you code, it seems that no linear transformation is performed on the queried vector, but a linear transformation
is applied to the reponses, as shown below.
selected_value = torch.gather(dummy_value, 3, dummy_idx)
p_attn = F.softmax(selected_scores, dim=-1)
if dropout is not None:
p_attn = dropout(p_attn)
return torch.matmul(p_attn.unsqueeze(3), selected_value).squeeze(3), p_attn
\
x = x.transpose(1, 2).contiguous() .view(nbatches, -1, self.h * self.d_k)
if layer_past is not None:
return self.linears[-1](x), present
else:
return self.linears[-1](x)
Another problem is about the encoder-decoder.
In the paper, the memory reponses are fed into the encoder, but it seems that you directly add the original att_feats and the reponses, and then send the updated att_feats to the encoder in your code.
att_feats = att_feats + responses in _prepare_feature_forward function.
I am grateful if you can provide further information.
Best Regards.
Jun
The text was updated successfully, but these errors were encountered:
I have the same problem,the implemention is different with paper,which really make me confused,and I also want to know the model how to update the memory matrix?
Hi, thanks for sharing your work.
Could you share your GPU card setting on the experiments?
Besides, I ran your code and was confused about the memory responding.
In the paper, you perform linear transformation on the queried vector first and then obtain the responses.
However, in you code, it seems that no linear transformation is performed on the queried vector, but a linear transformation
is applied to the reponses, as shown below.
\
Another problem is about the encoder-decoder.
In the paper, the memory reponses are fed into the encoder, but it seems that you directly add the original att_feats and the reponses, and then send the updated att_feats to the encoder in your code.
att_feats = att_feats + responses
in _prepare_feature_forward function.I am grateful if you can provide further information.
Best Regards.
Jun
The text was updated successfully, but these errors were encountered: