-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support string-based stopping conditions #92
Labels
Comments
WoosukKwon
changed the title
Support stopping conditions with multiple tokens.
Support string-based stopping conditions
May 10, 2023
yukavio
pushed a commit
to yukavio/vllm
that referenced
this issue
Jul 3, 2024
SUMMARY: Trigger minimal benchmarking on remote-push jobs. TEST PLAN: Jobs on this PR Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
dllehr-amd
pushed a commit
to dllehr-amd/vllm
that referenced
this issue
Jul 22, 2024
* Reading the shapes csv only once and writing only if a new shape is deicovered * fix lint --------- Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
jikunshang
pushed a commit
to jikunshang/vllm
that referenced
this issue
Aug 15, 2024
* Cleanup AttentionMetadata on HPU * Flat PA - POC * Decode warmup overhaul * Debugging OOM * Experimental profiling * Fix input_hash calculation * Block bucket size 32 -> 16 * Improve host time * Skip UTs * Add GQA/MQA * Add mask instead of filling * 2d block mapping * Optional flipping in PA * Runner updated for 2d block mapping * Restore mark_step * Eliminate physical transposes * Disable warmup_mode * Revert changes to test_attention.py * POC: build block_bias on device * Cleanup * Fix seq_len calculation * Experimental profiling * Add missing call to kv_matmul_op * Fix block_usage calculation * Change default block bucket step for decode to 128 * Fix max decode block bucket calculation * Fix block_usage calculations * Cleanup * Cleanup profiler code * Print values for bucketing vars * Pass block size do HpuModelAdapter --------- Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: