You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Most of the pretrained models available have a context length of
77
.(There's a few exceptions, like the
mt5
models discussed here)Is there a way to fine-tune a model that's got an existing context length of 77 with a larger context?
Say I want to increase the context length to
200
. Intuitively, I can think of two things to try:.positional_embedding
to be the same as the pretrained model, and start training with random init for positions 78-200.positional_embedding
from existing context length to the desired context lengthDo either of those make sense; I'm curious if anyone's tried something like this and what would be a good starting point for experimentation?
Beta Was this translation helpful? Give feedback.
All reactions