Increasing Context Length of Pretrained Models #473

rsomani95 started this conversation in General

rsomani95
Mar 24, 2023

Most of the pretrained models available have a context length of 77.
(There's a few exceptions, like the mt5 models discussed here)

Is there a way to fine-tune a model that's got an existing context length of 77 with a larger context?

Say I want to increase the context length to 200. Intuitively, I can think of two things to try:

Set the .positional_embedding to be the same as the pretrained model, and start training with random init for positions 78-200
Interpolate the .positional_embedding from existing context length to the desired context length

Do either of those make sense; I'm curious if anyone's tried something like this and what would be a good starting point for experimentation?

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment