Skip to content

Commit 569c0aa

Browse files
authored
Disable exponential bucketing (vllm-project#1469)
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
1 parent 96393d5 commit 569c0aa

File tree

4 files changed

+5
-5
lines changed

4 files changed

+5
-5
lines changed

.jenkins/requirements-test-hpu.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
lm_eval
1+
lm_eval==0.4.8
22
pytest
33
pytest_asyncio

README_GAUDI.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ $ python setup.py develop
129129
| vLLM v1 architecture (early release) | V1 architecture is now available for the HPU backend, and will gradually enable it for every use case we plan to support. | [Documentation](https://docs.vllm.ai/en/latest/serving/distributed_serving.html) |
130130
| Guided decode | vLLM HPU supports a guided decoding backend for generating structured outputs. | [Documentation](https://docs.vllm.ai/en/latest/features/structured_outputs.html) |
131131
| Delayed Sampling (experimental) | vLLM HPU supports delayed sampling scheduling for asynchronous execution, enabled by `VLLM_DELAYED_SAMPLING=true` environment variable. | N/A |
132-
| Exponential bucketing | vLLM HPU supports exponential bucketing spacing instead of linear to automate configuration of bucketing mechanism, enabled by default. It can be disabled via `VLLM_EXPONENTIAL_BUCKETING=false` environment variable. | N/A |
132+
| Exponential bucketing | vLLM HPU supports exponential bucketing spacing instead of linear to automate configuration of bucketing mechanism, enabled by `VLLM_EXPONENTIAL_BUCKETING=true` environment variable. | N/A |
133133

134134
> [!NOTE]
135135
> All specified features are also supported with the `-- enforce-eager` flag.
@@ -388,7 +388,7 @@ INFO 08-02 17:38:43 hpu_executor.py:91] init_cache_engine took 37.92 GiB of devi
388388
- `VLLM_GRAPH_PROMPT_RATIO`: percentage of reserved graph memory dedicated to prompt graphs. The default is `0.3`.
389389
- `VLLM_GRAPH_PROMPT_STRATEGY`: strategy determining order of prompt graph capture, `min_tokens` or `max_bs`. The default is `min_tokens`.
390390
- `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`. The default is `max_bs`.
391-
- `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`.
391+
- `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `false`.
392392
- `VLLM_{phase}_{dim}_BUCKET_{param}`: collection of 12 environment variables configuring ranges of bucketing mechanism (linear bucketing only).
393393
- `{phase}` is either `PROMPT` or `DECODE`
394394
- `{dim}` is either `BS`, `SEQ` or `BLOCK`

docs/source/getting_started/installation/ai_accelerator/hpu-gaudi.inc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ INFO 08-02 17:38:43 hpu_executor.py:91] init_cache_engine took 37.92 GiB of devi
337337

338338
- `VLLM_GRAPH_DECODE_STRATEGY`: strategy determining order of decode graph capture, `min_tokens` or `max_bs`, `max_bs` by default.
339339

340-
- `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `true`.
340+
- `VLLM_EXPONENTIAL_BUCKETING`: if `true`, enables exponential bucket spacing instead of linear. The default is `false`.
341341

342342
- `VLLM_{phase}_{dim}_BUCKET_{param}` - collection of 12 environment variables configuring ranges of bucketing mechanism.
343343

requirements/hpu.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ numpy==1.26.4
99
tabulate
1010
setuptools>=77.0.3,<80.0.0
1111
setuptools-scm>=8
12-
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@75b612e
12+
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@29bc7a4

0 commit comments

Comments
 (0)