You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Q = self.attention_layernorms[i](seqs) ** #Why Q should be calculated this way, should not be seqs * w_q?**
mha_outputs, _ = self.attention_layers[i](Q, seqs, seqs,
attn_mask=attention_mask)
# key_padding_mask=timeline_mask
# need_weights=False) this arg do not work?
seqs = Q + mha_outputs **# This sentence is not very understandable**
seqs = torch.transpose(seqs, 0, 1)
seqs = self.forward_layernorms[i](seqs)
seqs = self.forward_layers[i](seqs)
seqs *= ~timeline_mask.unsqueeze(-1)
The text was updated successfully, but these errors were encountered:
It's just a layernorm layer, people use it randomly to normalize embedding values, guess what you mean by seqs * w_q is part of attention_layer, do not need to be coded separately in explicit way.
The text was updated successfully, but these errors were encountered: