Skip to content

Commit ef9676a

Browse files
[Doc] ruff format some Python examples (#26767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
1 parent 70b1b33 commit ef9676a

20 files changed

+341
-290
lines changed

docs/configuration/conserving_memory.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ The following code splits the model across 2 GPUs.
1111
```python
1212
from vllm import LLM
1313

14-
llm = LLM(model="ibm-granite/granite-3.1-8b-instruct",
15-
tensor_parallel_size=2)
14+
llm = LLM(model="ibm-granite/granite-3.1-8b-instruct", tensor_parallel_size=2)
1615
```
1716

1817
!!! warning
@@ -43,9 +42,7 @@ and the maximum batch size (`max_num_seqs` option).
4342
```python
4443
from vllm import LLM
4544

46-
llm = LLM(model="adept/fuyu-8b",
47-
max_model_len=2048,
48-
max_num_seqs=2)
45+
llm = LLM(model="adept/fuyu-8b", max_model_len=2048, max_num_seqs=2)
4946
```
5047

5148
## Reduce CUDA Graphs
@@ -78,8 +75,7 @@ You can disable graph capturing completely via the `enforce_eager` flag:
7875
```python
7976
from vllm import LLM
8077

81-
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct",
82-
enforce_eager=True)
78+
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct", enforce_eager=True)
8379
```
8480

8581
## Adjust cache size
@@ -97,8 +93,10 @@ You can allow a smaller number of multi-modal items per prompt to reduce the mem
9793
from vllm import LLM
9894

9995
# Accept up to 3 images and 1 video per prompt
100-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
101-
limit_mm_per_prompt={"image": 3, "video": 1})
96+
llm = LLM(
97+
model="Qwen/Qwen2.5-VL-3B-Instruct",
98+
limit_mm_per_prompt={"image": 3, "video": 1},
99+
)
102100
```
103101

104102
You can go a step further and disable unused modalities completely by setting its limit to zero.
@@ -108,8 +106,10 @@ For example, if your application only accepts image input, there is no need to a
108106
from vllm import LLM
109107

110108
# Accept any number of images but no videos
111-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
112-
limit_mm_per_prompt={"video": 0})
109+
llm = LLM(
110+
model="Qwen/Qwen2.5-VL-3B-Instruct",
111+
limit_mm_per_prompt={"video": 0},
112+
)
113113
```
114114

115115
You can even run a multi-modal model for text-only inference:
@@ -118,8 +118,10 @@ You can even run a multi-modal model for text-only inference:
118118
from vllm import LLM
119119

120120
# Don't accept images. Just text.
121-
llm = LLM(model="google/gemma-3-27b-it",
122-
limit_mm_per_prompt={"image": 0})
121+
llm = LLM(
122+
model="google/gemma-3-27b-it",
123+
limit_mm_per_prompt={"image": 0},
124+
)
123125
```
124126

125127
### Configurable options
@@ -173,14 +175,14 @@ Here are some examples:
173175
from vllm import LLM
174176

175177
# Available for Qwen2-VL series models
176-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
177-
mm_processor_kwargs={
178-
"max_pixels": 768 * 768, # Default is 1280 * 28 * 28
179-
})
178+
llm = LLM(
179+
model="Qwen/Qwen2.5-VL-3B-Instruct",
180+
mm_processor_kwargs={"max_pixels": 768 * 768}, # Default is 1280 * 28 * 28
181+
)
180182

181183
# Available for InternVL series models
182-
llm = LLM(model="OpenGVLab/InternVL2-2B",
183-
mm_processor_kwargs={
184-
"max_dynamic_patch": 4, # Default is 12
185-
})
184+
llm = LLM(
185+
model="OpenGVLab/InternVL2-2B",
186+
mm_processor_kwargs={"max_dynamic_patch": 4}, # Default is 12
187+
)
186188
```

docs/configuration/optimization.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ from vllm import LLM
100100
llm = LLM(
101101
model="meta-llama/Llama-3.3-70B-Instruct,
102102
tensor_parallel_size=4,
103-
pipeline_parallel_size=2
103+
pipeline_parallel_size=2,
104104
)
105105
```
106106

@@ -257,18 +257,24 @@ Examples:
257257

258258
```python
259259
# Use a larger cache
260-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
261-
mm_processor_cache_gb=8)
260+
llm = LLM(
261+
model="Qwen/Qwen2.5-VL-3B-Instruct",
262+
mm_processor_cache_gb=8,
263+
)
262264

263265
# Use a shared-memory based IPC cache
264-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
265-
tensor_parallel_size=2,
266-
mm_processor_cache_type="shm",
267-
mm_processor_cache_gb=8)
266+
llm = LLM(
267+
model="Qwen/Qwen2.5-VL-3B-Instruct",
268+
tensor_parallel_size=2,
269+
mm_processor_cache_type="shm",
270+
mm_processor_cache_gb=8,
271+
)
268272

269273
# Disable the cache
270-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
271-
mm_processor_cache_gb=0)
274+
llm = LLM(
275+
model="Qwen/Qwen2.5-VL-3B-Instruct",
276+
mm_processor_cache_gb=0,
277+
)
272278
```
273279

274280
### Cache Placement

docs/contributing/model/basic.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ def forward(
7373
self,
7474
input_ids: torch.Tensor,
7575
positions: torch.Tensor,
76-
intermediate_tensors: Optional[IntermediateTensors] = None,
77-
inputs_embeds: Optional[torch.Tensor] = None,
76+
intermediate_tensors: IntermediateTensors | None = None,
77+
inputs_embeds: torch.Tensor | None = None,
7878
) -> torch.Tensor:
7979
...
8080
```

docs/contributing/model/multimodal.md

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Further update the model as follows:
1616
...
1717

1818
@classmethod
19-
def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
19+
def get_placeholder_str(cls, modality: str, i: int) -> str | None:
2020
if modality.startswith("image"):
2121
return "<image>"
2222

@@ -45,14 +45,14 @@ Further update the model as follows:
4545
...
4646

4747
def _process_image_input(self, image_input: YourModelImageInputs) -> torch.Tensor:
48-
4948
assert self.vision_encoder is not None
5049
image_features = self.vision_encoder(image_input)
5150
return self.multi_modal_projector(image_features)
5251

5352
def get_multimodal_embeddings(
54-
self, **kwargs: object) -> Optional[MultiModalEmbeddings]:
55-
53+
self,
54+
**kwargs: object,
55+
) -> MultiModalEmbeddings | None:
5656
# Validate the multimodal input keyword arguments
5757
image_input = self._parse_and_validate_image_input(**kwargs)
5858
if image_input is None:
@@ -110,7 +110,7 @@ to return the maximum number of input items for each modality supported by the m
110110
For example, if the model supports any number of images but only one video per prompt:
111111

112112
```python
113-
def get_supported_mm_limits(self) -> Mapping[str, Optional[int]]:
113+
def get_supported_mm_limits(self) -> Mapping[str, int | None]:
114114
return {"image": None, "video": 1}
115115
```
116116

@@ -258,7 +258,7 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
258258
self,
259259
seq_len: int,
260260
mm_counts: Mapping[str, int],
261-
mm_options: Optional[Mapping[str, BaseDummyOptions]] = None,
261+
mm_options: Mapping[str, BaseDummyOptions] | None = None,
262262
) -> MultiModalDataDict:
263263
num_images = mm_counts.get("image", 0)
264264

@@ -421,8 +421,10 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
421421
```python
422422
def get_image_size_with_most_features(self) -> ImageSize:
423423
image_processor = self.get_image_processor()
424-
return ImageSize(width=image_processor.size["width"],
425-
height=image_processor.size["height"])
424+
return ImageSize(
425+
width=image_processor.size["width"],
426+
height=image_processor.size["height"],
427+
)
426428
```
427429

428430
Fuyu does not expect image placeholders in the inputs to HF processor, so
@@ -452,10 +454,12 @@ Assuming that the memory usage increases with the number of tokens, the dummy in
452454

453455
return {
454456
"image":
455-
self._get_dummy_images(width=target_width,
456-
height=target_height,
457-
num_images=num_images,
458-
overrides=image_overrides)
457+
self._get_dummy_images(
458+
width=target_width,
459+
height=target_height,
460+
num_images=num_images,
461+
overrides=image_overrides,
462+
)
459463
}
460464
```
461465

@@ -744,8 +748,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
744748
image_width=image_size.width,
745749
image_height=image_size.height,
746750
)
747-
image_tokens = ([_IMAGE_TOKEN_ID] * ncols +
748-
[_NEWLINE_TOKEN_ID]) * nrows
751+
image_tokens = ([_IMAGE_TOKEN_ID] * ncols + [_NEWLINE_TOKEN_ID]) * nrows
749752

750753
return PromptUpdateDetails.select_token_id(
751754
image_tokens + [bos_token_id],
@@ -781,8 +784,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies
781784
image_width=image_size.width,
782785
image_height=image_size.height,
783786
)
784-
image_tokens = ([_IMAGE_TOKEN_ID] * ncols +
785-
[_NEWLINE_TOKEN_ID]) * nrows
787+
image_tokens = ([_IMAGE_TOKEN_ID] * ncols + [_NEWLINE_TOKEN_ID]) * nrows
786788

787789
return PromptUpdateDetails.select_token_id(
788790
image_tokens + [bos_token_id],
@@ -810,9 +812,11 @@ to register them to the multi-modal registry:
810812
from vllm.model_executor.models.interfaces import SupportsMultiModal
811813
+ from vllm.multimodal import MULTIMODAL_REGISTRY
812814

813-
+ @MULTIMODAL_REGISTRY.register_processor(YourMultiModalProcessor,
814-
+ info=YourProcessingInfo,
815-
+ dummy_inputs=YourDummyInputsBuilder)
815+
+ @MULTIMODAL_REGISTRY.register_processor(
816+
+ YourMultiModalProcessor,
817+
+ info=YourProcessingInfo,
818+
+ dummy_inputs=YourDummyInputsBuilder,
819+
+ )
816820
class YourModelForImage2Seq(nn.Module, SupportsMultiModal):
817821
```
818822

docs/contributing/model/registration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def register():
4242

4343
ModelRegistry.register_model(
4444
"YourModelForCausalLM",
45-
"your_code:YourModelForCausalLM"
45+
"your_code:YourModelForCausalLM",
4646
)
4747
```
4848

docs/contributing/model/transcription.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Declare supported languages and capabilities:
1515
- Set `supports_transcription_only=True` if the model should not serve text generation (eg Whisper).
1616

1717
??? code "supported_languages and supports_transcription_only"
18+
1819
```python
1920
from typing import ClassVar, Mapping, Literal
2021
import numpy as np
@@ -43,6 +44,7 @@ Provide an ASR configuration via [get_speech_to_text_config][vllm.model_executor
4344
This is for controlling general behavior of the API when serving your model:
4445

4546
??? code "get_speech_to_text_config()"
47+
4648
```python
4749
class YourASRModel(nn.Module, SupportsTranscription):
4850
...
@@ -71,6 +73,7 @@ Implement the prompt construction via [get_generation_prompt][vllm.model_executo
7173
Return a dict containing `multi_modal_data` with the audio, and either a `prompt` string or `prompt_token_ids`:
7274

7375
??? code "get_generation_prompt()"
76+
7477
```python
7578
class YourASRModel(nn.Module, SupportsTranscription):
7679
...
@@ -107,6 +110,7 @@ Return a dict containing `multi_modal_data` with the audio, and either a `prompt
107110
Return a dict with separate `encoder_prompt` and `decoder_prompt` entries:
108111

109112
??? code "get_generation_prompt()"
113+
110114
```python
111115
class YourASRModel(nn.Module, SupportsTranscription):
112116
...
@@ -148,12 +152,16 @@ Language validation via [validate_language][vllm.model_executor.models.interface
148152
If your model requires a language and you want a default, override this method (see Whisper):
149153

150154
??? code "validate_language()"
155+
151156
```python
152157
@classmethod
153158
def validate_language(cls, language: str | None) -> str | None:
154159
if language is None:
155160
logger.warning(
156-
"Defaulting to language='en'. If you wish to transcribe audio in a different language, pass the `language` field.")
161+
"Defaulting to language='en'. If you wish to transcribe "
162+
"audio in a different language, pass the `language` field "
163+
"in the TranscriptionRequest."
164+
)
157165
language = "en"
158166
return super().validate_language(language)
159167
```
@@ -165,6 +173,7 @@ Token accounting for streaming via [get_num_audio_tokens][vllm.model_executor.mo
165173
Provide a fast duration→token estimate to improve streaming usage statistics:
166174

167175
??? code "get_num_audio_tokens()"
176+
168177
```python
169178
class YourASRModel(nn.Module, SupportsTranscription):
170179
...
@@ -191,6 +200,7 @@ The API server takes care of basic audio I/O and optional chunking before buildi
191200
Relevant server logic:
192201

193202
??? code "_preprocess_speech_to_text()"
203+
194204
```python
195205
# vllm/entrypoints/openai/speech_to_text.py
196206
async def _preprocess_speech_to_text(...):

docs/deployment/frameworks/cerebrium.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ If successful, you should be returned a CURL command that you can call inference
6363

6464
??? console "Command"
6565

66-
```python
66+
```bash
6767
curl -X POST https://api.cortex.cerebrium.ai/v4/p-xxxxxx/vllm/run \
6868
-H 'Content-Type: application/json' \
6969
-H 'Authorization: <JWT TOKEN>' \
@@ -81,7 +81,7 @@ You should get a response like:
8181

8282
??? console "Response"
8383

84-
```python
84+
```json
8585
{
8686
"run_id": "52911756-3066-9ae8-bcc9-d9129d1bd262",
8787
"result": {

docs/deployment/frameworks/dstack.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ After the provisioning, you can interact with the model by using the OpenAI SDK:
8383

8484
client = OpenAI(
8585
base_url="https://gateway.<gateway domain>",
86-
api_key="<YOUR-DSTACK-SERVER-ACCESS-TOKEN>"
86+
api_key="<YOUR-DSTACK-SERVER-ACCESS-TOKEN>",
8787
)
8888

8989
completion = client.chat.completions.create(
@@ -93,7 +93,7 @@ After the provisioning, you can interact with the model by using the OpenAI SDK:
9393
"role": "user",
9494
"content": "Compose a poem that explains the concept of recursion in programming.",
9595
}
96-
]
96+
],
9797
)
9898

9999
print(completion.choices[0].message.content)

docs/deployment/frameworks/haystack.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ pip install vllm haystack-ai
3434
api_key=Secret.from_token("VLLM-PLACEHOLDER-API-KEY"),
3535
model="mistralai/Mistral-7B-Instruct-v0.1",
3636
api_base_url="http://{your-vLLM-host-ip}:{your-vLLM-host-port}/v1",
37-
generation_kwargs = {"max_tokens": 512}
37+
generation_kwargs={"max_tokens": 512},
3838
)
3939
4040
response = generator.run(

0 commit comments

Comments
 (0)