Skip to content

Commit 3422877

Browse files
authored
Load test setup from config files, enable plugin in V1 (#25)
This PR adds to major features 1. loading micro-benchmark configurations from files: ``` python3 /scripts/benchmark.py prefix -x /scripts/setups/prefix_correctness_rocm.conf ``` However, environment variables for `MY_IUT`, `MY_METHODS`, and `TRITON_BACKEND_DEBUG` would overwrite the values from the config files. 2. enable the vllm-triton-backend plugin for vLLM V1 (and stop supporting vLLM V0): So, after installation of ibm-triton-lib, the plugin can be used with just: ``` vllm serve meta-llama/Llama-3.1-8B-Instruct ``` Additionally, it fixes a correctness issue with the prefix_prefill micro-benchmark and allows a finer grained composition of batches for prefix prefill. --------- Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
1 parent 5fbc630 commit 3422877

File tree

16 files changed

+705
-1028
lines changed

16 files changed

+705
-1028
lines changed

Makefile

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ MAX_JOBS := 64
33

44
SHELL := /bin/bash
55

6-
.PHONY: all build clean format dev rocm rocm-upstream pyupdate nightly bm-rocm
6+
.PHONY: all build clean format dev rocm rocm-upstream pyupdate nightly bm-rocm spelling
77

88
all: build
99

@@ -65,3 +65,7 @@ else
6565
format:
6666
python -m black --check --verbose scripts ibm-triton-lib third_party
6767
endif
68+
69+
spelling:
70+
codespell ./ibm-triton-lib ./triton-dejavu ./scripts
71+

ibm-triton-lib/ibm_triton_lib/backend/__init__.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,4 @@
1919

2020
def register():
2121
"""Register the triton attention platform."""
22-
23-
VLLM_USE_V1 = int(os.environ.get("VLLM_USE_V1", "0"))
24-
25-
# backend only works with v0 currently
26-
if VLLM_USE_V1:
27-
return None
28-
else:
29-
return "ibm_triton_lib.backend.platform.TritonPlatform"
22+
return "ibm_triton_lib.backend.platform.TritonPlatform"

ibm-triton-lib/ibm_triton_lib/backend/platform.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,4 +61,6 @@ def get_attn_backend_cls(
6161
use_v1,
6262
use_mla,
6363
) -> str:
64+
if not envs.VLLM_USE_V1:
65+
raise RuntimeError("vllm-triton-backend plugin only supports vLLM V1")
6466
return "ibm_triton_lib.backend.triton_attn.TritonAttentionBackend"

0 commit comments

Comments
 (0)