Skip to content

Commit 8231fd9

Browse files
authored
Merge branch 'main' into extendmediaconnector
2 parents 923a964 + a9fe079 commit 8231fd9

File tree

78 files changed

+1750
-478
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1750
-478
lines changed

.buildkite/test-amd.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ steps:
3838
- label: Pytorch Nightly Dependency Override Check # 2min
3939
# if this test fails, it means the nightly torch version is not compatible with some
4040
# of the dependencies. Please check the error message and add the package to whitelist
41-
# in /vllm/tools/generate_nightly_torch_test.py
41+
# in /vllm/tools/pre_commit/generate_nightly_torch_test.py
4242
mirror_hardwares: [amdexperimental]
4343
agent_pool: mi325_1
4444
# grade: Blocking
@@ -286,7 +286,7 @@ steps:
286286

287287
- label: Engine Test # 25min
288288
timeout_in_minutes: 40
289-
mirror_hardwares: [amdexperimental]
289+
mirror_hardwares: [amdexperimental, amdproduction]
290290
agent_pool: mi325_1
291291
#grade: Blocking
292292
source_file_dependencies:
@@ -908,7 +908,7 @@ steps:
908908

909909
- label: Quantized Models Test # 45 min
910910
timeout_in_minutes: 60
911-
mirror_hardwares: [amdexperimental]
911+
mirror_hardwares: [amdexperimental, amdproduction]
912912
agent_pool: mi325_1
913913
# grade: Blocking
914914
source_file_dependencies:

.buildkite/test-pipeline.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ steps:
3838
- label: Pytorch Nightly Dependency Override Check # 2min
3939
# if this test fails, it means the nightly torch version is not compatible with some
4040
# of the dependencies. Please check the error message and add the package to whitelist
41-
# in /vllm/tools/generate_nightly_torch_test.py
41+
# in /vllm/tools/pre_commit/generate_nightly_torch_test.py
4242
soft_fail: true
4343
source_file_dependencies:
4444
- requirements/nightly_torch_test.txt
@@ -498,6 +498,8 @@ steps:
498498
- tests/kernels/moe
499499
- vllm/model_executor/layers/fused_moe/
500500
- vllm/distributed/device_communicators/
501+
- vllm/envs.py
502+
- vllm/config
501503
commands:
502504
- pytest -v -s kernels/moe --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT
503505
parallelism: 2

.pre-commit-config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ repos:
4545
- id: format-torch-nightly-test
4646
name: reformat nightly_torch_test.txt to be in sync with test.in
4747
language: python
48-
entry: python tools/generate_nightly_torch_test.py
48+
entry: python tools/pre_commit/generate_nightly_torch_test.py
4949
files: ^requirements/test\.(in|txt)$
5050
- id: mypy-local
5151
name: Run mypy locally for lowest supported Python version
@@ -78,12 +78,12 @@ repos:
7878
stages: [manual] # Only run in CI
7979
- id: shellcheck
8080
name: Lint shell scripts
81-
entry: tools/shellcheck.sh
81+
entry: tools/pre_commit/shellcheck.sh
8282
language: script
8383
types: [shell]
8484
- id: png-lint
8585
name: Lint PNG exports from excalidraw
86-
entry: tools/png-lint.sh
86+
entry: tools/pre_commit/png-lint.sh
8787
language: script
8888
types: [png]
8989
- id: signoff-commit
@@ -100,12 +100,12 @@ repos:
100100
stages: [commit-msg]
101101
- id: check-spdx-header
102102
name: Check SPDX headers
103-
entry: python tools/check_spdx_header.py
103+
entry: python tools/pre_commit/check_spdx_header.py
104104
language: python
105105
types: [python]
106106
- id: check-root-lazy-imports
107107
name: Check root lazy imports
108-
entry: python tools/check_init_lazy_imports.py
108+
entry: python tools/pre_commit/check_init_lazy_imports.py
109109
language: python
110110
types: [python]
111111
- id: check-filenames
@@ -119,19 +119,19 @@ repos:
119119
pass_filenames: false
120120
- id: update-dockerfile-graph
121121
name: Update Dockerfile dependency graph
122-
entry: tools/update-dockerfile-graph.sh
122+
entry: tools/pre_commit/update-dockerfile-graph.sh
123123
language: script
124124
- id: enforce-import-regex-instead-of-re
125125
name: Enforce import regex as re
126-
entry: python tools/enforce_regex_import.py
126+
entry: python tools/pre_commit/enforce_regex_import.py
127127
language: python
128128
types: [python]
129129
pass_filenames: false
130130
additional_dependencies: [regex]
131131
# forbid directly import triton
132132
- id: forbid-direct-triton-import
133133
name: "Forbid direct 'import triton'"
134-
entry: python tools/check_triton_import.py
134+
entry: python tools/pre_commit/check_triton_import.py
135135
language: python
136136
types: [python]
137137
pass_filenames: false
@@ -144,7 +144,7 @@ repos:
144144
additional_dependencies: [regex]
145145
- id: validate-config
146146
name: Validate configuration has default values and that each field has a docstring
147-
entry: python tools/validate_config.py
147+
entry: python tools/pre_commit/validate_config.py
148148
language: python
149149
additional_dependencies: [regex]
150150
# Keep `suggestion` last

docs/cli/.nav.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ nav:
55
- complete.md
66
- run-batch.md
77
- vllm bench:
8-
- bench/*.md
8+
- bench/**/*.md

docs/cli/bench/sweep/plot.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench sweep plot
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_sweep_plot.md"

docs/cli/bench/sweep/serve.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench sweep serve
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_sweep_serve.md"

docs/cli/bench/sweep/serve_sla.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench sweep serve_sla
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_sweep_serve_sla.md"

docs/contributing/benchmarks.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1061,7 +1061,7 @@ Follow these steps to run the script:
10611061
Example command:
10621062

10631063
```bash
1064-
python -m vllm.benchmarks.sweep.serve \
1064+
vllm bench sweep serve \
10651065
--serve-cmd 'vllm serve meta-llama/Llama-2-7b-chat-hf' \
10661066
--bench-cmd 'vllm bench serve --model meta-llama/Llama-2-7b-chat-hf --backend vllm --endpoint /v1/completions --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
10671067
--serve-params benchmarks/serve_hparams.json \
@@ -1109,7 +1109,7 @@ For example, to ensure E2E latency within different target values for 99% of req
11091109
Example command:
11101110

11111111
```bash
1112-
python -m vllm.benchmarks.sweep.serve_sla \
1112+
vllm bench sweep serve_sla \
11131113
--serve-cmd 'vllm serve meta-llama/Llama-2-7b-chat-hf' \
11141114
--bench-cmd 'vllm bench serve --model meta-llama/Llama-2-7b-chat-hf --backend vllm --endpoint /v1/completions --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
11151115
--serve-params benchmarks/serve_hparams.json \
@@ -1138,7 +1138,7 @@ The algorithm for adjusting the SLA variable is as follows:
11381138
Example command:
11391139

11401140
```bash
1141-
python -m vllm.benchmarks.sweep.plot benchmarks/results/<timestamp> \
1141+
vllm bench sweep plot benchmarks/results/<timestamp> \
11421142
--var-x max_concurrency \
11431143
--row-by random_input_len \
11441144
--col-by random_output_len \

docs/mkdocs/hooks/generate_argparse.py

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -56,15 +56,20 @@ def auto_mock(module, attr, max_mocks=50):
5656
)
5757

5858

59-
latency = auto_mock("vllm.benchmarks", "latency")
60-
serve = auto_mock("vllm.benchmarks", "serve")
61-
throughput = auto_mock("vllm.benchmarks", "throughput")
59+
bench_latency = auto_mock("vllm.benchmarks", "latency")
60+
bench_serve = auto_mock("vllm.benchmarks", "serve")
61+
bench_sweep_plot = auto_mock("vllm.benchmarks.sweep.plot", "SweepPlotArgs")
62+
bench_sweep_serve = auto_mock("vllm.benchmarks.sweep.serve", "SweepServeArgs")
63+
bench_sweep_serve_sla = auto_mock(
64+
"vllm.benchmarks.sweep.serve_sla", "SweepServeSLAArgs"
65+
)
66+
bench_throughput = auto_mock("vllm.benchmarks", "throughput")
6267
AsyncEngineArgs = auto_mock("vllm.engine.arg_utils", "AsyncEngineArgs")
6368
EngineArgs = auto_mock("vllm.engine.arg_utils", "EngineArgs")
6469
ChatCommand = auto_mock("vllm.entrypoints.cli.openai", "ChatCommand")
6570
CompleteCommand = auto_mock("vllm.entrypoints.cli.openai", "CompleteCommand")
66-
cli_args = auto_mock("vllm.entrypoints.openai", "cli_args")
67-
run_batch = auto_mock("vllm.entrypoints.openai", "run_batch")
71+
openai_cli_args = auto_mock("vllm.entrypoints.openai", "cli_args")
72+
openai_run_batch = auto_mock("vllm.entrypoints.openai", "run_batch")
6873
FlexibleArgumentParser = auto_mock(
6974
"vllm.utils.argparse_utils", "FlexibleArgumentParser"
7075
)
@@ -114,6 +119,9 @@ def add_arguments(self, actions):
114119
self._markdown_output.append(f"{action.help}\n\n")
115120

116121
if (default := action.default) != SUPPRESS:
122+
# Make empty string defaults visible
123+
if default == "":
124+
default = '""'
117125
self._markdown_output.append(f"Default: `{default}`\n\n")
118126

119127
def format_help(self):
@@ -150,17 +158,23 @@ def on_startup(command: Literal["build", "gh-deploy", "serve"], dirty: bool):
150158

151159
# Create parsers to document
152160
parsers = {
161+
# Engine args
153162
"engine_args": create_parser(EngineArgs.add_cli_args),
154163
"async_engine_args": create_parser(
155164
AsyncEngineArgs.add_cli_args, async_args_only=True
156165
),
157-
"serve": create_parser(cli_args.make_arg_parser),
166+
# CLI
167+
"serve": create_parser(openai_cli_args.make_arg_parser),
158168
"chat": create_parser(ChatCommand.add_cli_args),
159169
"complete": create_parser(CompleteCommand.add_cli_args),
160-
"bench_latency": create_parser(latency.add_cli_args),
161-
"bench_throughput": create_parser(throughput.add_cli_args),
162-
"bench_serve": create_parser(serve.add_cli_args),
163-
"run-batch": create_parser(run_batch.make_arg_parser),
170+
"run-batch": create_parser(openai_run_batch.make_arg_parser),
171+
# Benchmark CLI
172+
"bench_latency": create_parser(bench_latency.add_cli_args),
173+
"bench_serve": create_parser(bench_serve.add_cli_args),
174+
"bench_sweep_plot": create_parser(bench_sweep_plot.add_cli_args),
175+
"bench_sweep_serve": create_parser(bench_sweep_serve.add_cli_args),
176+
"bench_sweep_serve_sla": create_parser(bench_sweep_serve_sla.add_cli_args),
177+
"bench_throughput": create_parser(bench_throughput.add_cli_args),
164178
}
165179

166180
# Generate documentation for each parser

requirements/cuda.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,3 @@ torchvision==0.24.0 # Required for phi3v processor. See https://github.com/pytor
1313
# xformers==0.0.32.post1; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.8
1414
# FlashInfer should be updated together with the Dockerfile
1515
flashinfer-python==0.4.1
16-
# Triton Kernels are needed for mxfp4 fused moe. (Should be updated alongside torch)
17-
triton_kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels

0 commit comments

Comments
 (0)