Replies: 1 comment
-
We do concatenate across documents. The code is here: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have a question on the input construction.
I get the basic idea from the clear figure in the paper:
But I guess the "original text" is not actually a single sentence but rather consecutive tokens, presumably crossing the sentence boundary. One option described in the RoBERTa paper is to pack input sequence with consecutive max_seq_len(say, 512) tokens, which can cross sentence or document boundary.
Could you explain how input is actually made or point to the relevant code? This self-contained library looks great, but it is hard for me to pinpoint where input construction happens.
Beta Was this translation helpful? Give feedback.
All reactions