We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed comm ops test failed with below stacktrace. Buildkite
[2024-06-25T12:58:33Z] distributed/test_shm_broadcast.py:72: -- | [2024-06-25T12:58:33Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | [2024-06-25T12:58:33Z] | [2024-06-25T12:58:33Z] fn = <function worker_fn_wrapper.<locals>.wrapped_fn at 0x7f8cc92afa30> | [2024-06-25T12:58:33Z] world_size = 4 | [2024-06-25T12:58:33Z] | [2024-06-25T12:58:33Z] def distributed_run(fn, world_size): | [2024-06-25T12:58:33Z] number_of_processes = world_size | [2024-06-25T12:58:33Z] processes = [] | [2024-06-25T12:58:33Z] for i in range(number_of_processes): | [2024-06-25T12:58:33Z] env = {} | [2024-06-25T12:58:33Z] env['RANK'] = str(i) | [2024-06-25T12:58:33Z] env['LOCAL_RANK'] = str(i) | [2024-06-25T12:58:33Z] env['WORLD_SIZE'] = str(number_of_processes) | [2024-06-25T12:58:33Z] env['LOCAL_WORLD_SIZE'] = str(number_of_processes) | [2024-06-25T12:58:33Z] env['MASTER_ADDR'] = 'localhost' | [2024-06-25T12:58:33Z] env['MASTER_PORT'] = '12345' | [2024-06-25T12:58:33Z] p = multiprocessing.Process(target=fn, args=(env, )) | [2024-06-25T12:58:33Z] processes.append(p) | [2024-06-25T12:58:33Z] p.start() | [2024-06-25T12:58:33Z] | [2024-06-25T12:58:33Z] for p in processes: | [2024-06-25T12:58:33Z] p.join() | [2024-06-25T12:58:33Z] | [2024-06-25T12:58:33Z] for p in processes: | [2024-06-25T12:58:33Z] > assert p.exitcode == 0 | [2024-06-25T12:58:33Z] E AssertionError: assert 1 == 0 | [2024-06-25T12:58:33Z] E + where 1 = <Process name='Process-1' pid=15885 parent=7 stopped exitcode=1>.exitcode
The text was updated successfully, but these errors were encountered:
FYI @youkaichao
Sorry, something went wrong.
@cadedaniel should be fixed in #5801
awesome :)
Successfully merging a pull request may close this issue.
Anything you want to discuss about vllm.
Distributed comm ops test failed with below stacktrace. Buildkite
The text was updated successfully, but these errors were encountered: