-
Notifications
You must be signed in to change notification settings - Fork 477
Description
One of the key design goals of slime is to stay compatible with the latest releases of sglang. However, to support certain reinforcement learning (RL) features, we occasionally introduce minor modifications to sglang itself.
This issue serves as a centralized place to track relevant pull requests—both ongoing and merged—that affect or are affected by slime. It aims to facilitate collaboration and maintain alignment between the slime and sglang communities.
Ultimately, our goal is to eliminate the need for maintaining a separate sglang.patch file by upstreaming necessary changes directly into sglang.
As of sglang 0.4.9, the core features from slime have been merged into sglang. The remaining pull requests are primarily enhancements to sglang itself.
We’ll keep this list updated as development progresses.
🔄 Open / Pending PRs
- [bugfix] Add add_special_tokens=False to TokenizerManager sgl-project/sglang#8291
- [RL] fix true_on_policy_target sgl-project/sglang#14300
✅ Merged PRs
- [router] support http2 in router sgl-project/sglang#6487
- [router] Add /list_workers endpoint to router sgl-project/sglang#6366
- [RL] allow weight updation with dp attention enabled sgl-project/sglang#6311
- [RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… sgl-project/sglang#6308
- Add endpoint /abort sgl-project/sglang#5966
or Fix request abortion sgl-project/sglang#6184 - [fix][RL] Remove the incorrect barrier in init_weights_update_group sgl-project/sglang#5914
- [fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight sgl-project/sglang#6265
- [RL] support update_weights_from_distributed with different group and multiple weights sgl-project/sglang#7292
- [RL] add --skip-warmup sgl-project/sglang#7416
- [router] add --log-level to sgl-router sgl-project/sglang#6512
- [RL] support abort all and fix abort on waiting queue sgl-project/sglang#6855
or Support updating weights at once by stopping all requests sgl-project/sglang#6698 - [RL] Add --nccl-port to prevent port conflict sgl-project/sglang#7418
- [RL] add pause and continue generation for async rl training sgl-project/sglang#7419
- [router] make request_timeout_secs configurable sgl-project/sglang#8525
or [router] migrate router from actix to axum sgl-project/sglang#8479 - [RL] fix update weight for FusedMoE with EP sgl-project/sglang#8676
- [RL] fix skip_server_warmup and rl health_generate logic sgl-project/sglang#8757
- [WIP][RL] fix fp8 update weight sgl-project/sglang#7421
- [RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded sgl-project/sglang#10152
- Manually flip deepep_mode for cuda_graph sgl-project/sglang#11666
- [RL] support weight update with DP attention sgl-project/sglang#11669
- Add more statistics for spec decoding sgl-project/sglang#13317
- [RL] support update_weights_from_tensor for mtp sgl-project/sglang#7415
- [RL] support only do cpu backup on draft model sgl-project/sglang#13318
- [RL] Allow bypassing /health check sgl-project/sglang#13320
- [RL] enable offloading hybrid linear attn model sgl-project/sglang#13336
- Add SGLANG_ENABLE_REQ_POOL_LEAK_STRICT_CHECK to bypass mem leak check sgl-project/sglang#13339
- [RL] re-abort_request when model_update_lock is still locked sgl-project/sglang#13338
- [RL] Allow passing tensors of different dtypes for FlattenedTensorBucket sgl-project/sglang#13413