Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is it needed to set max_batch_size to 1 under interactive mode? #143

Open
zhypku opened this issue Jun 12, 2023 · 0 comments
Open

Why is it needed to set max_batch_size to 1 under interactive mode? #143

zhypku opened this issue Jun 12, 2023 · 0 comments

Comments

@zhypku
Copy link

zhypku commented Jun 12, 2023

Hi there,

I'm new to the FasterTransformer backend, and I'm curious about why we need to set max_batch_size to 1 when the interactive mode is enabled.

The documentation says that this is to guarantee that requests belonging to the same session are directed to the same model instance exclusively. I understand that the requests must be directed to the same model instance, but why exclusively? If we use the Direct mode of the sequence batcher, the requests would be directed to a unique batch slot. Is this sufficient to guarantee the correctness of the inference?

It would be appreciated if someone can give me some clue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant