-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support continuous batching in sequence batch streaming case #3160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Please add continuous batching to the test to make sure this part works as well.
examples/stateful/sequence_continuous_batching/stateful_handler.py
Outdated
Show resolved
Hide resolved
examples/stateful/sequence_continuous_batching/stateful_handler.py
Outdated
Show resolved
Hide resolved
examples/stateful/sequence_continuous_batching/stateful_handler.py
Outdated
Show resolved
Hide resolved
frontend/server/src/main/java/org/pytorch/serve/util/messages/RequestInput.java
Show resolved
Hide resolved
@Override | ||
protected void pollInferJob() throws InterruptedException { | ||
// TBD: Temporarily hard code the continuous batch size is 2 * batchSize | ||
model.pollInferJob(jobs, model.getBatchSize() * 2 - jobs.size(), jobsQueue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document how the jobs will be interleaved when they land in the handler and the batch_size > 1. Is is [Job_1_1, Job_1_2,Job_2_1, Job_2_2] or [Job_1_1, Job_2_1,Job_1_2, Job_2_2]? Or will it be scrambled and the handler needs to sort this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not necessary for backend handler to sort b/c backend handler preprocessing is able to mark the previous running request as "cancel"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxning So you mean the preprocessing in the handler will need to sort this out as there can be multiple requests from multiple sequences and in no specific pattern. In the end the preprocessing in the handler will need to go though the batch and see if besides the previous request another cancel request is in the batch as well and than cancel it/clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not necessary to sort. Frontend guarantee the order of the requests in a sequence, and pass them to backend. Currently the use case is synchronized b/w client and TS, and the continuous batch size is fixed as 2 for each sequence. This means a cancel cmd applies on all requests of this sequence at backend. So here is based on "num_requests"
frontend/server/src/main/java/org/pytorch/serve/wlm/WorkLoadManager.java
Outdated
Show resolved
Hide resolved
test/pytest/test_example_stateful_sequence_continuous_batching_http.py
Outdated
Show resolved
Hide resolved
test/pytest/test_example_stateful_sequence_continuous_batching_http.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
frontend/server/src/main/java/org/pytorch/serve/wlm/WorkLoadManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Please read our CONTRIBUTING.md prior to creating your first pull request.
Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Checklist: