[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage? #983

janelu9 · 2024-06-08T14:39:25Z

janelu9
Jun 8, 2024

I know the hidden_states are the output of previous stage, but I don't understand the how the attention_mask is passed to the next transformer block.

necrophagists · 2024-12-26T07:49:10Z

Hi,do you how the attention_mask is passed to the middle pp stage now? I am facing this problem now.

0 replies