Skip to content

Commit e120ccb

Browse files
authored
Merge branch 'main' into pcp_pr
2 parents d09cda7 + 5550ff9 commit e120ccb

File tree

98 files changed

+793
-1711
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+793
-1711
lines changed
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# For hf script, without -t option (tensor parallel size).
2-
# bash .buildkite/lm-eval-harness/run-lm-eval-chartqa-vllm-vlm-baseline.sh -m meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -b 32 -l 100 -t 8
2+
# bash .buildkite/lm-eval-harness/run-lm-eval-chartqa-vllm-vlm-baseline.sh -m meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -l 100 -t 8
33
model_name: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
44
backend: "vllm-vlm"
55
tasks:
66
- name: "chartqa"
77
metrics:
88
- name: "relaxed_accuracy,none"
9-
value: 0.90
9+
# TODO(zhewenl): model card is 0.90, but the actual score is 0.80.
10+
value: 0.80
1011
limit: 100
1112
num_fewshot: 0

.buildkite/lm-eval-harness/configs/Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# For hf script, without -t option (tensor parallel size).
2-
# bash .buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh -m meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -b 32 -l 250 -t 8 -f 5
2+
# bash .buildkite/lm-eval-harness/run-lm-eval-mmlupro-vllm-baseline.sh -m meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -l 250 -t 8 -f 5
33
model_name: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
4-
backend: "vllm-vlm"
54
tasks:
65
- name: "mmlu_pro"
76
metrics:

.buildkite/scripts/hardware_ci/run-cpu-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ function cpu_tests() {
7070
docker exec cpu-test-"$NUMA_NODE" bash -c "
7171
set -e
7272
pytest -x -s -v \
73-
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_logprobs[False-10-32-neuralmagic/Llama-3.2-1B-quantized.w8a8]"
73+
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_logprobs"
7474

7575
# Note: disable it until supports V1
7676
# Run AWQ test

docs/configuration/conserving_memory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ llm = LLM(model="ibm-granite/granite-3.1-8b-instruct", tensor_parallel_size=2)
2323
!!! note
2424
With tensor parallelism enabled, each process will read the whole model and split it into chunks, which makes the disk reading time even longer (proportional to the size of tensor parallelism).
2525

26-
You can convert the model checkpoint to a sharded checkpoint using <gh-file:examples/offline_inference/save_sharded_state.py>. The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
26+
You can convert the model checkpoint to a sharded checkpoint using [examples/offline_inference/save_sharded_state.py](../../examples/offline_inference/save_sharded_state.py). The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
2727

2828
## Quantization
2929

docs/configuration/optimization.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -174,14 +174,14 @@ Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to u
174174

175175
Known supported models (with corresponding benchmarks):
176176

177-
- dots_ocr (<gh-pr:25466>)
178-
- GLM-4.1V or above (<gh-pr:23168>)
179-
- InternVL (<gh-pr:23909>)
180-
- Kimi-VL (<gh-pr:23817>)
181-
- Llama4 (<gh-pr:18368>)
182-
- MiniCPM-V-2.5 or above (<gh-pr:23327>, <gh-pr:23948>)
183-
- Qwen2-VL or above (<gh-pr:22742>, <gh-pr:24955>, <gh-pr:25445>)
184-
- Step3 (<gh-pr:22697>)
177+
- dots_ocr (<https://github.com/vllm-project/vllm/pull/25466>)
178+
- GLM-4.1V or above (<https://github.com/vllm-project/vllm/pull/23168>)
179+
- InternVL (<https://github.com/vllm-project/vllm/pull/23909>)
180+
- Kimi-VL (<https://github.com/vllm-project/vllm/pull/23817>)
181+
- Llama4 (<https://github.com/vllm-project/vllm/pull/18368>)
182+
- MiniCPM-V-2.5 or above (<https://github.com/vllm-project/vllm/pull/23327>, <https://github.com/vllm-project/vllm/pull/23948>)
183+
- Qwen2-VL or above (<https://github.com/vllm-project/vllm/pull/22742>, <https://github.com/vllm-project/vllm/pull/24955>, <https://github.com/vllm-project/vllm/pull/25445>)
184+
- Step3 (<https://github.com/vllm-project/vllm/pull/22697>)
185185

186186
## Input Processing
187187

docs/configuration/tpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Although it’s common to do this with GPUs, don't try to fragment 2 or 8 differ
9696

9797
### Tune your workloads
9898

99-
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](gh-file:benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
99+
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](../../benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
100100

101101
### Future Topics We'll Cover
102102

docs/contributing/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Unsure on where to start? Check out the following links for tasks to work on:
2222

2323
## License
2424

25-
See <gh-file:LICENSE>.
25+
See [LICENSE](../../LICENSE).
2626

2727
## Developing
2828

@@ -54,7 +54,7 @@ For more details about installing from source and installing for other hardware,
5454
For an optimized workflow when iterating on C++/CUDA kernels, see the [Incremental Compilation Workflow](./incremental_build.md) for recommendations.
5555

5656
!!! tip
57-
vLLM is compatible with Python versions 3.10 to 3.13. However, vLLM's default [Dockerfile](gh-file:docker/Dockerfile) ships with Python 3.12 and tests in CI (except `mypy`) are run with Python 3.12.
57+
vLLM is compatible with Python versions 3.10 to 3.13. However, vLLM's default [Dockerfile](../../docker/Dockerfile) ships with Python 3.12 and tests in CI (except `mypy`) are run with Python 3.12.
5858

5959
Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
6060

@@ -88,7 +88,7 @@ vLLM's `pre-commit` hooks will now run automatically every time you commit.
8888

8989
### Documentation
9090

91-
MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, <gh-file:mkdocs.yaml>.
91+
MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, [mkdocs.yaml](../../mkdocs.yaml).
9292

9393
Get started with:
9494

@@ -152,7 +152,7 @@ pytest -s -v tests/test_logger.py
152152
If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
153153

154154
!!! important
155-
If you discover a security vulnerability, please follow the instructions [here](gh-file:SECURITY.md#reporting-a-vulnerability).
155+
If you discover a security vulnerability, please follow the instructions [here](../../SECURITY.md).
156156

157157
## Pull Requests & Code Reviews
158158

@@ -162,7 +162,7 @@ code quality and improve the efficiency of the review process.
162162

163163
### DCO and Signed-off-by
164164

165-
When contributing changes to this project, you must agree to the <gh-file:DCO>.
165+
When contributing changes to this project, you must agree to the [DCO](../../DCO).
166166
Commits must include a `Signed-off-by:` header which certifies agreement with
167167
the terms of the DCO.
168168

docs/contributing/benchmarks.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -822,7 +822,7 @@ you should set `--endpoint /v1/embeddings` to use the Embeddings API. The backen
822822
- CLIP: `--backend openai-embeddings-clip`
823823
- VLM2Vec: `--backend openai-embeddings-vlm2vec`
824824

825-
For other models, please add your own implementation inside <gh-file:vllm/benchmarks/lib/endpoint_request_func.py> to match the expected instruction format.
825+
For other models, please add your own implementation inside [vllm/benchmarks/lib/endpoint_request_func.py](../../vllm/benchmarks/lib/endpoint_request_func.py) to match the expected instruction format.
826826

827827
You can use any text or multi-modal dataset to benchmark the model, as long as the model supports it.
828828
For example, you can use ShareGPT and VisionArena to benchmark vision-language embeddings.
@@ -962,7 +962,7 @@ For more results visualization, check the [visualizing the results](https://gith
962962

963963
The latest performance results are hosted on the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm).
964964

965-
More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
965+
More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](../../.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
966966

967967
### Continuous Benchmarking
968968

@@ -996,4 +996,4 @@ These compare vLLM's performance against alternatives (`tgi`, `trt-llm`, and `lm
996996

997997
The latest nightly benchmark results are shared in major release blog posts such as [vLLM v0.6.0](https://blog.vllm.ai/2024/09/05/perf-update.html).
998998

999-
More information on the nightly benchmarks and their parameters can be found [here](gh-file:.buildkite/nightly-benchmarks/nightly-descriptions.md).
999+
More information on the nightly benchmarks and their parameters can be found [here](../../.buildkite/nightly-benchmarks/nightly-descriptions.md).

docs/contributing/ci/failures.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Download the full log file from Buildkite locally.
6464
6565
Strip timestamps and colorization:
6666
67-
<gh-file:.buildkite/scripts/ci-clean-log.sh>
67+
[.buildkite/scripts/ci-clean-log.sh](../../../.buildkite/scripts/ci-clean-log.sh)
6868
6969
```bash
7070
./ci-clean-log.sh ci.log
@@ -87,7 +87,7 @@ tail -525 ci_build.log | wl-copy
8787

8888
CI test failures may be flaky. Use a bash loop to run repeatedly:
8989

90-
<gh-file:.buildkite/scripts/rerun-test.sh>
90+
[.buildkite/scripts/rerun-test.sh](../../../.buildkite/scripts/rerun-test.sh)
9191

9292
```bash
9393
./rerun-test.sh tests/v1/engine/test_engine_core_client.py::test_kv_cache_events[True-tcp]

docs/contributing/ci/update_pytorch_version.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ release in CI/CD. It is standard practice to submit a PR to update the
55
PyTorch version as early as possible when a new [PyTorch stable
66
release](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence) becomes available.
77
This process is non-trivial due to the gap between PyTorch
8-
releases. Using <gh-pr:16859> as an example, this document outlines common steps to achieve this
8+
releases. Using <https://github.com/vllm-project/vllm/pull/16859> as an example, this document outlines common steps to achieve this
99
update along with a list of potential issues and how to address them.
1010

1111
## Test PyTorch release candidates (RCs)
@@ -85,7 +85,7 @@ and timeout. Additionally, since vLLM's fastcheck pipeline runs in read-only mod
8585
it doesn't populate the cache, so re-running it to warm up the cache
8686
is ineffective.
8787
88-
While ongoing efforts like [#17419](gh-issue:17419)
88+
While ongoing efforts like <https://github.com/vllm-project/vllm/issues/17419>
8989
address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH`
9090
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
9191
when manually triggering a build on Buildkite. This branch accomplishes two things:
@@ -138,5 +138,5 @@ to handle some platforms separately. The separation of requirements and Dockerfi
138138
for different platforms in vLLM CI/CD allows us to selectively choose
139139
which platforms to update. For instance, updating XPU requires the corresponding
140140
release from [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) by Intel.
141-
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
142-
<gh-pr:17444> completed the update for XPU.
141+
While <https://github.com/vllm-project/vllm/pull/16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
142+
<https://github.com/vllm-project/vllm/pull/17444> completed the update for XPU.

0 commit comments

Comments
 (0)