This is now consistently failing with CUDA OOM: https://buildkite.com/vllm/ci/builds/22221#01977f3a-71ea-41cb-bbeb-a43340a10124 I narrowed this down to https://github.com/vllm-project/vllm/pull/19572 which appears to have introduced the issue.