More than 4096 context length? #22

StoyanStAtanasov · 2023-04-19T23:11:01Z

Is it possible to have larger context as this allows to do more complicated things with smaller models?
A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

jon-tow · 2023-04-20T00:06:45Z

Sure; You just need to fine-tune to a longer context :)

StoyanStAtanasov · 2023-04-20T02:57:59Z

@jon-tow You are joking right?

jon-tow · 2023-04-20T05:12:40Z

Nope; see https://github.com/kyleliang919/Long-context-transformers

NPap0 · 2023-04-20T06:41:22Z

Training code has to change
Data that you fine tune the model with after training has to change

So no, nothing can be done user-side to change attention span. (But maybe you can summarize blocks of text with the model so you can then feed the already summarized text as a whole to do your thing)

mallorbc · 2023-04-20T19:38:23Z

Pretty sure the answer is no due to how positional encoding is done.

jon-tow · 2023-04-20T20:01:34Z

@mallorbc The model uses RoPE - a relative attention mechanism. See https://blog.eleuther.ai/rotary-embeddings/

mallorbc · 2023-04-20T20:08:59Z

@jon-tow So models like GPTJ can be finetuned and generate more than their sequence length? Whenever I try to generate sequences for GPTJ I have issues. Maybe that is something else unrelated.

Ph0rk0z · 2023-04-21T17:07:51Z

A lora will not work for this? Would you have to retrain the model from scratch on long context as you have done?

twmmason closed this as completed Apr 25, 2023

twmmason reopened this Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More than 4096 context length? #22

More than 4096 context length? #22

StoyanStAtanasov commented Apr 19, 2023

jon-tow commented Apr 20, 2023

StoyanStAtanasov commented Apr 20, 2023

jon-tow commented Apr 20, 2023

NPap0 commented Apr 20, 2023

mallorbc commented Apr 20, 2023

jon-tow commented Apr 20, 2023

mallorbc commented Apr 20, 2023

Ph0rk0z commented Apr 21, 2023

More than 4096 context length? #22

More than 4096 context length? #22

Comments

StoyanStAtanasov commented Apr 19, 2023

jon-tow commented Apr 20, 2023

StoyanStAtanasov commented Apr 20, 2023

jon-tow commented Apr 20, 2023

NPap0 commented Apr 20, 2023

mallorbc commented Apr 20, 2023

jon-tow commented Apr 20, 2023

mallorbc commented Apr 20, 2023

Ph0rk0z commented Apr 21, 2023