Skip to content

Commit de9fde5

Browse files
andy-neumaandy-neuma
andauthored
additions for bump to v0.3.2 (vllm-project#50)
SUMMARY: * "remote push" job for multi-gpu runner. * "remote push" job for single gpu runner. * patches for re-initialization of "ray". found other places in `vllm` where they are passing in `ignore_reinit_error=True`, it just looked like they missed a couple of places. * patch "find" command to only find *.py files starting with "test_". TEST PLAN: runs on remote push --------- Co-authored-by: andy-neuma <andy@neuralmagic.com>
1 parent 757e48a commit de9fde5

File tree

7 files changed

+37
-8
lines changed

7 files changed

+37
-8
lines changed

.github/actions/nm-set-env/action.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ runs:
1515
NUM_THREADS=$(./.github/scripts/determine-threading -G ${{ inputs.Gi_per_thread }})
1616
echo "MAX_JOBS=${NUM_THREADS}" >> $GITHUB_ENV
1717
echo "VLLM_INSTALL_PUNICA_KERNELS=1" >> $GITHUB_ENV
18+
echo "NCCL_IGNORE_DISABLED_P2P=1" >> $GITHUB_ENV
1819
echo "PYENV_ROOT=/usr/local/apps/pyenv" >> $GITHUB_ENV
1920
echo "XDG_CONFIG_HOME=/usr/local/apps" >> $GITHUB_ENV
2021
WHOAMI=$(whoami)

.github/scripts/run-tests

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ if [ ! -d "${TEST_DIR}" ]; then
4848
fi
4949

5050
# run tests serially
51-
TESTS_DOT_PY=$(find ${TEST_DIR} -not -name "__init__.py" -name "*.py")
51+
TESTS_DOT_PY=$(find ${TEST_DIR} -name "test*.py")
5252
TESTS_TO_RUN=($TESTS_DOT_PY)
5353
SUCCESS=0
5454
for TEST in "${TESTS_TO_RUN[@]}"

.github/workflows/remote-push.yml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,28 @@ concurrency:
1111

1212
jobs:
1313

14-
# TODO: expand python matrix later, once CI system has
15-
# matured.
14+
# TODO: expand python matrix later, once CI system has matured.
1615

17-
# TODO: enable this later
18-
AWS-AVX2-32G-A10G-24G:
16+
# multi-gpu
17+
AWS-AVX2-192G-4-A10G-96G:
1918
strategy:
2019
matrix:
2120
python: [3.10.12]
2221
uses: ./.github/workflows/build-test.yml
22+
with:
23+
label: aws-avx2-192G-4-a10g-96G
24+
timeout: 180
25+
gitref: '${{ github.ref }}'
26+
Gi_per_thread: 4
27+
python: ${{ matrix.python }}
28+
secrets: inherit
29+
30+
# single gpu
31+
AWS-AVX2-32G-A10G-24G:
32+
strategy:
33+
matrix:
34+
python: [3.11.4]
35+
uses: ./.github/workflows/build-test.yml
2336
with:
2437
label: aws-avx2-32G-a10g-24G
2538
timeout: 180

tests/distributed/test_basic_distributed_correctness.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55
import pytest
66
import torch
77

8+
9+
# TODO: just picking one, need to update test runner to selectively use "--forked"
810
MODELS = [
911
"facebook/opt-125m",
10-
"meta-llama/Llama-2-7b-hf",
12+
# "meta-llama/Llama-2-7b-hf",
1113
]
1214

1315

tests/distributed/test_custom_all_reduce.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,19 @@ def eager_allreduce(world_size, rank, distributed_init_port):
7373
assert torch.allclose(out, inp * world_size)
7474

7575

76+
def P2P_disabled():
77+
num_gpus = torch.cuda.device_count()
78+
for kk in range(num_gpus):
79+
for jj in range(kk, num_gpus):
80+
if torch.cuda.can_device_access_peer(
81+
device=torch.device(f"cuda:{kk}"),
82+
peer_device=torch.device(f"cuda:{jj}")):
83+
return False
84+
return True
85+
86+
87+
@pytest.mark.skipif(P2P_disabled(),
88+
reason="Cuda failure 'peer access is not supported between these two devices'")
7689
@pytest.mark.skipif(torch.cuda.device_count() < 2,
7790
reason="Need at least 2 GPUs to run the test.")
7891
@pytest.mark.parametrize("tensor_parallel_size", [2])

tests/entrypoints/test_openai_server.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def zephyr_lora_files():
6262

6363
@pytest.fixture(scope="session")
6464
def server(zephyr_lora_files):
65-
ray.init()
65+
ray.init(ignore_reinit_error=True)
6666
server_runner = ServerRunner.remote([
6767
"--model",
6868
MODEL_NAME,

vllm/test_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def multi_process_tensor_parallel(
2828
) -> None:
2929
# Using ray helps debugging the error when it failed
3030
# as compared to multiprocessing.
31-
ray.init()
31+
ray.init(ignore_reinit_error=True)
3232

3333
distributed_init_port = get_open_port()
3434
refs = []

0 commit comments

Comments
 (0)