-
Notifications
You must be signed in to change notification settings - Fork 676
feat: async + parallel spec dec trtllm example #2091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d4a419b to
103a3ad
Compare
810b3d8 to
c6ad300
Compare
0d66e21 to
7868600
Compare
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
…dynamo into rjain/trtllm-spec-dec
6deadce to
9e929ab
Compare
9e929ab to
e9b5a41
Compare
| draft_tokens.extend(chunk_data.get("token_ids", [])) | ||
| if len(draft_tokens) >= self.max_draft_len: | ||
| break | ||
| print(f"DRAFTER: {draft_tokens}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging instead of print.
components/backends/trtllm/src/dynamo/trtllm/utils/api_drafter.py
Outdated
Show resolved
Hide resolved
| ) | ||
| namespace, component, endpoint = parts | ||
|
|
||
| # create minimal runtime for client access only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? why not create an endpoint client like this?
| next_client = ( |
| export NUM_DRAFTERS=2 | ||
| export DRAFTER_CUDA_VISIBLE_DEVICES:-"1,2" | ||
| ./launch/spec_dec.sh | ||
| ``` No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the request flow in this setup as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be a mermaid diagram?
a2a6f28 to
16a418e
Compare
16a418e to
cc7b21a
Compare
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Chores