Swapped to_seq_len/from_seq_len in comment

I'm pretty sure this comment:

https://github.com/huggingface/pytorch-pretrained-BERT/blob/2c5d993ba48841575d9c58f0754bca00b288431c/modeling.py#L339-L343

should instead say:
```
# Sizes are [batch_size, 1, 1, to_seq_length] 
# So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length] 
```

When masking out tokens for attention, it doesn't matter what happens to attention *from* padding tokens, only that there is no attention *to* padding tokens.

I don't believe the code is doing what the comment currently suggests because that would be an implementation flaw.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Swapped to_seq_len/from_seq_len in comment #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Swapped to_seq_len/from_seq_len in comment #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions