Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More than 4096 context length? #22

Open
StoyanStAtanasov opened this issue Apr 19, 2023 · 8 comments
Open

More than 4096 context length? #22

StoyanStAtanasov opened this issue Apr 19, 2023 · 8 comments

Comments

@StoyanStAtanasov
Copy link

Is it possible to have larger context as this allows to do more complicated things with smaller models?
A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

@jon-tow
Copy link
Collaborator

jon-tow commented Apr 20, 2023

Sure; You just need to fine-tune to a longer context :)

@StoyanStAtanasov
Copy link
Author

@jon-tow You are joking right?

@jon-tow
Copy link
Collaborator

jon-tow commented Apr 20, 2023

Nope; see https://github.com/kyleliang919/Long-context-transformers

@NPap0
Copy link

NPap0 commented Apr 20, 2023

  1. Training code has to change
  2. Data that you fine tune the model with after training has to change

So no, nothing can be done user-side to change attention span. (But maybe you can summarize blocks of text with the model so you can then feed the already summarized text as a whole to do your thing)

@mallorbc
Copy link

Pretty sure the answer is no due to how positional encoding is done.

@jon-tow
Copy link
Collaborator

jon-tow commented Apr 20, 2023

@mallorbc The model uses RoPE - a relative attention mechanism. See https://blog.eleuther.ai/rotary-embeddings/

@mallorbc
Copy link

@jon-tow So models like GPTJ can be finetuned and generate more than their sequence length? Whenever I try to generate sequences for GPTJ I have issues. Maybe that is something else unrelated.

@Ph0rk0z
Copy link

Ph0rk0z commented Apr 21, 2023

A lora will not work for this? Would you have to retrain the model from scratch on long context as you have done?

@twmmason twmmason reopened this Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants