-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the T5 Architecture #30
Comments
T5 is an encoder-decoder transformer while GPT is decoder-only, so they differ in terms of their architecture. However, both models sample autoregressively, so it is expected that the inference time scales with the prediction length. |
That's because the provided context is capped at a pre-configured context length (in the current models this is 512), so anything longer than that won't impact inference speed. |
My context is shorter than 512. I presume it's because the Decoder is too slow, covering up the time it takes the Encoder to process the context. |
I got it. Thanks for your reply. |
Were you using the latest version of the code? We used to have an unnecessary padding to full context length (512) which was removed in #25. After the fix, you should see faster inference as your context shrinks significantly below 512 |
In my experiments, I have found that Chronos' inference time is significantly related to the prediction length, and not so much to the historical context length. I don't know much about NLP. I'm curious if T5 is an autoregressive architecture similar to GPT, where it has to generate sequentially one by one, or if it can output all the values at once in parallel (with the help of mask). Thanks!
The text was updated successfully, but these errors were encountered: