[Feature][RL]: Fix Fp8 Weight Loading for RL

### 🚀 The feature, motivation and pitch

Feedback from RL community that vLLM weight loading in fp8 is bad for RL
- https://vllm-dev.slack.com/archives/C07UUL8E61Z/p1762811441757529

The cause is clear: in [fp8.py](https://github.com/vllm-project/vllm/blob/bf6a3d0ff5a69e0a30567f2ad417530c002eaa4e/vllm/model_executor/layers/quantization/fp8.py#L490) in process_weights_after_loading there is a lot of parameter wrapping that drops .weight_loader attribute. 

There's a patch from the Moonshot team that fixes this issue and there's a [PR](https://github.com/vllm-project/vllm/pull/24488) with this patch that never got any comments. The [patch](https://github.com/MoonshotAI/checkpoint-engine/blob/main/patches/vllm_fp8.patch) only works on top of v0.10.2rc1. Shortly after that tag, this [PR](https://github.com/vllm-project/vllm/pull/23280) made fp8 weight updates even trickier by transposing weight_inv_scale  parameter for CUTLASS. 

I don't know how to patch any vLLM version after this PR to be able to call  model.load_weights  after the engine has started. It is a bummer, because DeepSeek wide EP inference is quite a bit faster in v0.11.0.

We need to fix this ASAP

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][RL]: Fix Fp8 Weight Loading for RL #28425

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature][RL]: Fix Fp8 Weight Loading for RL #28425

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions