log2feats函数中有疑惑 #28

toyoululu · 2022-12-21T14:18:29Z

attention_mask = ~torch.tril(torch.ones((tl, tl), dtype=torch.bool, device=self.dev))
seqs的维度应该是(batch_size,seq_len,embedding)其中(tl, tl)怎么能保证batch_size=seq_len?
seqs = torch.transpose(seqs, 0, 1)为什么要transpose呀
期待你的答复

pmixer · 2022-12-22T02:39:42Z

@toyoululu 两件事，第一件事，没有地方要求过 batch size = seq len，tl, tl 也不对其提供保证；第二件事，做 transpose 是 torch 的 mha 层要求时间维提到最前面。疑问最终的源头可能是对多头注意力层(mha)不熟悉，建议观看 https://www.bilibili.com/video/BV1J441137V6/

toyoululu · 2022-12-22T06:47:55Z

谢谢回答，我仔细看了看api和代码，发现没有任何问题，我之前自己理解错误了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log2feats函数中有疑惑 #28

log2feats函数中有疑惑 #28

toyoululu commented Dec 21, 2022

pmixer commented Dec 22, 2022 •

edited

Loading

toyoululu commented Dec 22, 2022

log2feats函数中有疑惑 #28

log2feats函数中有疑惑 #28

Comments

toyoululu commented Dec 21, 2022

pmixer commented Dec 22, 2022 • edited Loading

toyoululu commented Dec 22, 2022

pmixer commented Dec 22, 2022 •

edited

Loading