Question for masking strategy #990

toriving · 2022-02-28T06:59:22Z

toriving
Feb 28, 2022

When pretraining, the paper talked about packing data.
If so, I think all input data will be 512 tokens.
In the currently implemented code, the masking strategy is deterministic.
So it is assumed that there will always be the same number of masking for 512 tokens.
is this right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for masking strategy #990

{{title}}

Replies: 0 comments

Select a reply

Question for masking strategy #990

toriving Feb 28, 2022

Replies: 0 comments

toriving
Feb 28, 2022