Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove token budget from decode buckets #241

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

kzawora-intel
Copy link

This PR prevents max_num_batched_tokens from limiting decode buckets, as decode buckets should be limited by number of blocks, not by max_num_batched_tokens.

Copy link

@madamczykhabana madamczykhabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kzawora-intel kzawora-intel merged commit 7cd226c into habana_main Sep 5, 2024
13 checks passed
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 5, 2024
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024
This PR prevents max_num_batched_tokens from limiting decode buckets, as
decode buckets should be limited by number of blocks, not by
max_num_batched_tokens.
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
This PR prevents max_num_batched_tokens from limiting decode buckets, as
decode buckets should be limited by number of blocks, not by
max_num_batched_tokens.
@kzawora-intel kzawora-intel deleted the private/kzawora/rm_decode_token_budget branch October 7, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants