Skip to content

Init prompts are truncated to --batch-size (max 512 tokens) #1403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MystCaster opened this issue May 11, 2023 · 6 comments
Closed

Init prompts are truncated to --batch-size (max 512 tokens) #1403

MystCaster opened this issue May 11, 2023 · 6 comments

Comments

@MystCaster
Copy link

MystCaster commented May 11, 2023

Prompt should not have a max size of --batch-size.
if ((int) embd.size() >= params.n_batch) { break; } seems to have no particular usage and should be removed to allow prompting more than 512 tokens.

A side effect is that with the current version, initial prompts are truncated to 512 tokens no matter the --ctx_size or the --batch-size parameter.

@MystCaster MystCaster changed the title Batch_size parameter incorrectly handled --batch-size parameter incorrectly handled May 11, 2023
@MystCaster MystCaster changed the title --batch-size parameter incorrectly handled Init prompts are truncated to --batch-size (max 512 tokens) May 11, 2023
@slaren
Copy link
Member

slaren commented May 11, 2023

This seems to be a misunderstanding of what the batch size means. The prompt isn't truncated, it is just processed in chunks of batch-size tokens. This is meant to improve performance, but it requires more memory, which is why it is limited to 512.

@MystCaster
Copy link
Author

MystCaster commented May 12, 2023

Removing that break does not interfer with the processing of llama_eval by batches of --batch-size tokens.

Currently an initial prompt of more than --batch-size (maxed out at 512 in common.cpp by the way...) give back control to the user before the initial prompt is fully processed.

You can reproduce the issue by increasing the init prompt size of the Miku.sh example above 512 tokens or by lowering it's batch-size parameter below the actual init prompt token count.

@slaren
Copy link
Member

slaren commented May 12, 2023

The break is there to prevent using batch sizes larger than n_batch. Using batch sizes larges than 512 currently is not safe because the amount of memory allocated assumes a max batch size of 512.

I am not really sure if I understand what issue you are seeing, If you have a reliable way of reproducing it, please post clear step-by-step instructions.

@MystCaster
Copy link
Author

Yes: configure batch-size of Miku.sh example to 100. You'll get the control back before the end of the initial prompt.

@slaren
Copy link
Member

slaren commented May 13, 2023

I am not able to reproduce this issue.

@MystCaster
Copy link
Author

I am not able to reproduce this issue.

Ok, I tested it again and I get it now, I just didn't have enough patience : I forgot each batch takes time to process...
I can confirm that long initial prompts are not truncated, one just have to wait for every batches to process.
Note that I'm still not sure about the relevance of that "break" since it seems to only affect display and not evaluation, but it is another question.

Thanks for your patience, I'll close the issue which is not one :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants