Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support string-based stopping conditions #92

Closed
WoosukKwon opened this issue May 10, 2023 · 0 comments · Fixed by #114
Closed

Support string-based stopping conditions #92

WoosukKwon opened this issue May 10, 2023 · 0 comments · Fixed by #114
Assignees
Labels

Comments

@WoosukKwon
Copy link
Collaborator

No description provided.

@WoosukKwon WoosukKwon self-assigned this May 10, 2023
@WoosukKwon WoosukKwon changed the title Support stopping conditions with multiple tokens. Support string-based stopping conditions May 10, 2023
@WoosukKwon WoosukKwon added the P1 label May 10, 2023
@WoosukKwon WoosukKwon added P0 and removed P1 labels May 17, 2023
yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
SUMMARY:
Trigger minimal benchmarking on remote-push jobs. 

TEST PLAN:
Jobs on this PR

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this issue Jul 22, 2024
* Reading the shapes csv only once and writing only if a new shape is deicovered

* fix lint

---------

Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
jikunshang pushed a commit to jikunshang/vllm that referenced this issue Aug 15, 2024
* Cleanup AttentionMetadata on HPU

* Flat PA - POC

* Decode warmup overhaul

* Debugging OOM

* Experimental profiling

* Fix input_hash calculation

* Block bucket size 32 -> 16

* Improve host time

* Skip UTs

* Add GQA/MQA

* Add mask instead of filling

* 2d block mapping

* Optional flipping in PA

* Runner updated for 2d block mapping

* Restore mark_step

* Eliminate physical transposes

* Disable warmup_mode

* Revert changes to test_attention.py

* POC: build block_bias on device

* Cleanup

* Fix seq_len calculation

* Experimental profiling

* Add missing call to kv_matmul_op

* Fix block_usage calculation

* Change default block bucket step for decode to 128

* Fix max decode block bucket calculation

* Fix block_usage calculations

* Cleanup

* Cleanup profiler code

* Print values for bucketing vars

* Pass block size do HpuModelAdapter

---------

Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant