Skip to content

Conversation

@coolcloudcol
Copy link
Contributor

No description provided.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your contribution!

@zhuohan123 zhuohan123 merged commit 7717d08 into vllm-project:main Jul 3, 2023
@coolcloudcol coolcloudcol deleted the fix-endless-loop branch July 4, 2023 01:37
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
This PR updates the benchmarking performed in remote-push and nightly
runs according to the first set of deliverables from our recent meeting:

* Only the `benchmark_serving.json` config is run
* This is accomplished with a new list,
`nm_benchmark_base_config_list.txt`, other lists are untouched
* The `benchmark_serving.json` has various reductions:
* Model list reduced to `facebook/opt-350m` and
`meta-llama/Meta-Llama-3-8B-Instruct`
  * `nr-qps` list reduced to `300,1`
* Metric tracking reduced to mean TPOT and mean TTFT (other metrics
still recorded/logged per usual)

There is also a small fix related to server startup (changing from
`localhost` to `127.0.0.1` because `localhost` on the machines is mapped
to the IPv6 `::1` which something in the server stack doesn’t seem to
like).

In a commit prior to opening the PR with all functional changes, the
full `benchmark` job took <30 min:

https://github.com/neuralmagic/nm-vllm/actions/runs/9669361155/job/26709082658
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 30, 2024
…th LoRA (vllm-project#339)

This PR has following fixes,
- Increase size of indices tensors used to maintain multi-lora state
information from max_num_batched_tokens to 3*max_num_batched_tokens.
This increase is done to provide buffer for padding done in batch &
sequence dimensions.
- Move logic to remove padding from lora_logits from execute_model()
back to Class LogitsProcessorWithLoRA, this is done to fix race
condition caused by updating multi-lora state information directly.

FIX HabanaAI#237
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 19, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 25, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 26, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 26, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 28, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 30, 2025
yma11 added a commit to yma11/vllm that referenced this pull request Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants