[FlashRL 3/N] Add example for FP8 training with FlashRL by SumanthRH · Pull Request #169 · NovaSky-AI/SkyRL

SumanthRH · 2025-08-20T01:34:40Z

What does this PR do?

WIP PR to add FP8 training with FlashRL.

Note that we currently only support online FP8 quantization. Support for pre-quantized fp8 and in8 will follow soon - it's a bit more involved given that you need to calibrate scaling

uses a custom vllm wheel. Found this to be the simplest way to manage the custom vllm patches in flashRL. Wheel is pre-packaged build from branch: https://github.com/SumanthRH/vllm/tree/flashrl . Specifying the git url directly led to uv building the cpu-only version of vllm for some reason. We'll use this wheel for now

TODO:

Verify E2E run on Deepspeed and FSDP
Verify training on qwen3 14B and 32B
upload wheel to github releases and use the link from Github
add docs

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH · 2025-08-20T18:10:59Z

skyrl-train/docs/examples/flash_rl.rst

+
+.. warning::
+
+   FlashRL integration is experimental. While generation times can improve for large models with quantization, we've observed that the time spent in weight syncing is much higher with FlashRL for fp8. This negates most of the benefits of fp8 inference. The slowdown is primarily due to slow weight quantization in vLLM's ``process_weights_after_loading`` function. We are actively working on improving this.


This is an important warning. I've already improved weight syncing with the batching impl + fixes for FlashRL's patch_load_weights method, but it is still not good enough. We will revisit the fp8 slowdown, and meanwhile also see if int8 can provide good overall step time improvements

erictang000 · 2025-08-20T18:08:51Z

skyrl-train/examples/flash_rl/flash_rl_engine.py

+    """
+    from skyrl_train.utils import ray_noset_visible_devices, get_all_env_variables, get_ray_pg_ready_with_timeout
+
+    assert not async_engine, "`async_engine` is not supported for FlashRL"


just to confirm - we can only use the offline engine for flash-rl, so only single turn rollouts?

maybe worth a clarification in the doc, I didn't realize until i hit this line of code

Yeah let me add a warning

skyrl-train/examples/flash_rl/main_dapo_flashrl.py

erictang000 · 2025-08-20T18:19:59Z

skyrl-train/docs/examples/flash_rl.rst

+How does it work?
+~~~~~~~~~~~~~~~~~~
+
+We pass `quantization=fp8`  flag to the vLLM engine at initialization time. This means that the weights are loaded as usual in half precision and then quantized down to fp8. During training, generations are sampled as usual, and in this case, sampled from quantized weights. Since we use online quantization, the scale factor used for quantizing activations are computed on the fly by vLLM internally. 


Suggested change

We pass `quantization=fp8` flag to the vLLM engine at initialization time. This means that the weights are loaded as usual in half precision and then quantized down to fp8. During training, generations are sampled as usual, and in this case, sampled from quantized weights. Since we use online quantization, the scale factor used for quantizing activations are computed on the fly by vLLM internally.

We pass the `quantization=fp8` flag to the vLLM engine at initialization time. This means that the weights are loaded as usual in half precision and then quantized down to FP8. During training, generations are sampled as usual, but now from the quantized weights. Since vLLM uses online quantization, the scale factors used for quantizing activations are computed dynamically during runtime.

skyrl-train/docs/examples/flash_rl.rst

skyrl-train/examples/flash_rl/run_dapo_flashrl.sh

skyrl-train/examples/flash_rl/run_dapo_flashrl_32b.sh

skyrl-train/skyrl_train/workers/deepspeed/deepspeed_worker.py

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

# What does this PR do? WIP PR to add FP8 training with FlashRL. Note that we currently only support online FP8 quantization. Support for pre-quantized fp8 and in8 will follow soon - it's a bit more involved given that you need to calibrate scaling uses a custom vllm wheel. Found this to be the simplest way to manage the custom vllm patches in flashRL. Wheel is pre-packaged build from branch: https://github.com/SumanthRH/vllm/tree/flashrl . Specifying the git url directly led to uv building the cpu-only version of vllm for some reason. We'll use this wheel for now TODO: - [x] Verify E2E run on Deepspeed and FSDP - [x] Verify training on qwen3 14B and 32B - [x] upload wheel to github releases and use the link from Github - [x] add docs --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH added 5 commits August 19, 2025 22:51

moving files over

e2fc6d5

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

init flashrl integration

a768356

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

373d77a

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

8f5daa7

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

a8f564c

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH assigned tyler-griggs Aug 20, 2025

SumanthRH and others added 2 commits August 20, 2025 04:08

testing

a541732

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

local commits

7bb4e7c

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH force-pushed the sumanthrh/its-flashrl-time branch from 04e753a to 7bb4e7c Compare August 20, 2025 17:43

SumanthRH added 2 commits August 20, 2025 17:44

x

9c3ec64

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

x

8ebd1a7

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH marked this pull request as ready for review August 20, 2025 18:00

SumanthRH commented Aug 20, 2025

View reviewed changes

SumanthRH changed the title ~~[FlashRL N/N] Add example for FP8 training with FlashRL~~ [FlashRL 3/N] Add example for FP8 training with FlashRL Aug 20, 2025

SumanthRH assigned erictang000 Aug 20, 2025

erictang000 reviewed Aug 20, 2025

View reviewed changes

x

2e90cac

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH force-pushed the sumanthrh/its-flashrl-time branch from 523f734 to 2e90cac Compare August 20, 2025 20:01

x

035651f

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH merged commit 825f2e8 into NovaSky-AI:main Aug 20, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FlashRL 3/N] Add example for FP8 training with FlashRL#169

[FlashRL 3/N] Add example for FP8 training with FlashRL#169
SumanthRH merged 11 commits intoNovaSky-AI:mainfrom
SumanthRH:sumanthrh/its-flashrl-time

SumanthRH commented Aug 20, 2025 •

edited

Loading

Uh oh!

SumanthRH Aug 20, 2025 •

edited

Loading

Uh oh!

erictang000 Aug 20, 2025

Uh oh!

erictang000 Aug 20, 2025

Uh oh!

SumanthRH Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		.. warning::

		FlashRL integration is experimental. While generation times can improve for large models with quantization, we've observed that the time spent in weight syncing is much higher with FlashRL for fp8. This negates most of the benefits of fp8 inference. The slowdown is primarily due to slow weight quantization in vLLM's ``process_weights_after_loading`` function. We are actively working on improving this. No newline at end of file

Conversation

SumanthRH commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SumanthRH Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erictang000 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

erictang000 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SumanthRH commented Aug 20, 2025 •

edited

Loading

SumanthRH Aug 20, 2025 •

edited

Loading