Calculation of Q in the code #35

mengyangz86 · 2023-04-18T09:10:18Z

        Q = self.attention_layernorms[i](seqs)  ** #Why Q should be calculated this way, should not be seqs * w_q?**
        mha_outputs, _ = self.attention_layers[i](Q, seqs, seqs, 
                                        attn_mask=attention_mask)
                                        # key_padding_mask=timeline_mask
                                        # need_weights=False) this arg do not work?
        seqs = Q + mha_outputs  **# This sentence is not very understandable**
        seqs = torch.transpose(seqs, 0, 1)

        seqs = self.forward_layernorms[i](seqs)
        seqs = self.forward_layers[i](seqs)
        seqs *=  ~timeline_mask.unsqueeze(-1)

The text was updated successfully, but these errors were encountered:

pmixer · 2023-04-19T03:37:26Z

It's just a layernorm layer, people use it randomly to normalize embedding values, guess what you mean by seqs * w_q is part of attention_layer, do not need to be coded separately in explicit way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of Q in the code #35

Calculation of Q in the code #35

mengyangz86 commented Apr 18, 2023

pmixer commented Apr 19, 2023

Calculation of Q in the code #35

Calculation of Q in the code #35

Comments

mengyangz86 commented Apr 18, 2023

pmixer commented Apr 19, 2023