Skip to content

[Bug]: Distributed Tests (4 GPUs) failing in main branch CI #20138

@njhill

Description

@njhill

This is now consistently failing with CUDA OOM: https://buildkite.com/vllm/ci/builds/22221#01977f3a-71ea-41cb-bbeb-a43340a10124

I narrowed this down to #19572 which appears to have introduced the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions