-
Hello, I want to ask why if there is overlapping in the dataset it causes overfitting? Isn't it better? So that the language model learns from each word present and its contextual relationship to the previous sentence? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi there, E.g., consider the following example: Input Sentence: with stride=6Batch Inputs:
Batch Targets (inputs shifted by +1):
with stride =1Batch Inputs:
Batch Targets:
Please let me know in case you have any follow-up questions |
Beta Was this translation helpful? Give feedback.
Hi there,
that's a good question, and it actually a bit of a tricky topic. But even with the stride > 1 the LLM sees every word in the text. It's just that it doesn't see each word multiple times.
E.g., consider the following example:
Input Sentence:
"Hello world, this is an example of a batch input sequence."
with stride=6
Batch Inputs:
Batch Targets (inputs shifted by +1):
with stride =1
Batch Inputs: