Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PlayGround] Long generating cannot be done #1349

Open
Dorish opened this issue Feb 1, 2024 · 6 comments
Open

[PlayGround] Long generating cannot be done #1349

Dorish opened this issue Feb 1, 2024 · 6 comments

Comments

@Dorish
Copy link

Dorish commented Feb 1, 2024

Describe the bug
In playground, when the generating is long, it cannot finish generating in one time, in this case I typied "continue" to let it keep going. But it didn't work, it repeated from the beginning or returned empty.

Information about your version
tabby 0.7.0

Information about your GPU

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:02:00.0 Off |                  N/A |
| 30%   26C    P8              21W / 350W |  22253MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:03:00.0 Off |                  N/A |
| 30%   26C    P8              28W / 350W |  13576MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Additional context
Add any other context about the problem here.

@wsxiaoys
Copy link
Member

wsxiaoys commented Feb 1, 2024

For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.

@Dorish
Copy link
Author

Dorish commented Feb 2, 2024

For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.

If the output > 1920 tokens (or lager maximum amount in the future), is there any way to let it continue to output all?

@carlosech
Copy link

For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.

Is there are reason as to why context is static for Tabby? @wsxiaoys

@wsxiaoys
Copy link
Member

wsxiaoys commented Feb 5, 2024

Hey - could you elaborate a bit? static context length is kind of a intrinsic thing to transformer based LLMs.

@carlosech
Copy link

Hey - could you elaborate a bit? static context length is kind of a intrinsic thing to transformer based LLMs.

Sorry. Dependent on the models, the context can be increased to a certain size. You state that there is a hard limitation of input and output tokens, is that hard coded in Tabby? Or variable due to the model being used?

@wsxiaoys
Copy link
Member

wsxiaoys commented Feb 7, 2024

Sorry. Dependent on the models, the context can be increased to a certain size.

Ah - I got you point. It does make sense to read this value from either the registry or from gguf files directly. Filing #1402 to track

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants