Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the T5 Architecture #30

Closed
ForestsKing opened this issue Mar 28, 2024 · 5 comments
Closed

About the T5 Architecture #30

ForestsKing opened this issue Mar 28, 2024 · 5 comments

Comments

@ForestsKing
Copy link

In my experiments, I have found that Chronos' inference time is significantly related to the prediction length, and not so much to the historical context length. I don't know much about NLP. I'm curious if T5 is an autoregressive architecture similar to GPT, where it has to generate sequentially one by one, or if it can output all the values at once in parallel (with the help of mask). Thanks!

@abdulfatir
Copy link
Contributor

abdulfatir commented Mar 28, 2024

T5 is an encoder-decoder transformer while GPT is decoder-only, so they differ in terms of their architecture. However, both models sample autoregressively, so it is expected that the inference time scales with the prediction length.

@lostella
Copy link
Contributor

and not so much to the historical context length

That's because the provided context is capped at a pre-configured context length (in the current models this is 512), so anything longer than that won't impact inference speed.

@ForestsKing
Copy link
Author

and not so much to the historical context length

That's because the provided context is capped at a pre-configured context length (in the current models this is 512), so anything longer than that won't impact inference speed.

My context is shorter than 512. I presume it's because the Decoder is too slow, covering up the time it takes the Encoder to process the context.

@ForestsKing
Copy link
Author

T5 is an encoder-decoder transformer while GPT is decoder-only, so they differ in terms of their architecture. However, both models sample autoregressively, so it is expected that the inference time scales with the prediction length.

I got it. Thanks for your reply.

@lostella
Copy link
Contributor

My context is shorter than 512. I presume it's because the Decoder is too slow, covering up the time it takes the Encoder to process the context.

Were you using the latest version of the code? We used to have an unnecessary padding to full context length (512) which was removed in #25. After the fix, you should see faster inference as your context shrinks significantly below 512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants