Skip to content

[Feature][RL]: Fix Fp8 Weight Loading for RL #28425

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

🚀 The feature, motivation and pitch

Feedback from RL community that vLLM weight loading in fp8 is bad for RL

The cause is clear: in fp8.py in process_weights_after_loading there is a lot of parameter wrapping that drops .weight_loader attribute.

There's a patch from the Moonshot team that fixes this issue and there's a PR with this patch that never got any comments. The patch only works on top of v0.10.2rc1. Shortly after that tag, this PR made fp8 weight updates even trickier by transposing weight_inv_scale parameter for CUTLASS.

I don't know how to patch any vLLM version after this PR to be able to call model.load_weights after the engine has started. It is a bummer, because DeepSeek wide EP inference is quite a bit faster in v0.11.0.

We need to fix this ASAP

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions