[ci] CI coverage tracking

As slime grows, we need more thorough CI coverage for an increasing number of backends, deployment patterns, and training features.  
This issue serves as a living index of our existing CI tests so that **both users and contributors** can quickly see:

- which backends are currently exercised by CI  
- which parallelism / optimizer / algorithmic features are being tested  
- where coverage gaps still exist

Over time this document should evolve into a compact “CI coverage map” of slime.

## Megatron Backend CI

All Megatron-backend CI tests currently check at least the following invariants:

- `kl_loss == 0` on step 0  
- `ppo_kl == 0` on step 0 of each rollout  

These are used as sanity checks that the initial policy/value setup and reference policy wiring are correct.

[`tests/test_quick_start_glm4_9B.py`](https://github.com/THUDM/slime/blob/main/tests/test_quick_start_glm4_9B.py)
  - tp2, cp2
  - 3 steps, rollout 8 × 8, gbs=32
  - disaggregated
  - algo: **GRPO**

[`tests/test_qwen3_30B_A3B.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen3_30B_A3B.py)
- tp4, pp, cp, ep, cpu adam
- 3 steps, rollout 8 × 8, gbs=32
- colocated
- algo:
  - **GSPO**
  - **Routing replay**
  - **TIS**

[`tests/test_moonlight_16B_A3B.py`](https://github.com/THUDM/slime/blob/main/tests/test_moonlight_16B_A3B.py)
- mla, tp4, pp, cp, ep, cpu adam
- 3 steps, rollout 8 × 8, gbs=32
- colocated

[`tests/test_qwen3_4B_ppo.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen3_4B_ppo.py)
- tp4, pp, cp, ep, cpu adam
- 3 steps, rollout 8 × 8, gbs=32
- colocated
- algo:
  - **PPO**

[`tests/test_qwen2.5_0.5B_gsm8k.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen2.5_0.5B_gsm8k.py)
- no parallism
- algo: grpo

[`tests/test_qwen2.5_0.5B_gsm8k_async.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen2.5_0.5B_gsm8k_async.py)
- no parallism
- algo:
  - **GSPO**
  - **true on-policy**

## FSDP Backend CI

All FSDP-backend CI tests share a common set of checks (e.g., basic training sanity, loss decreasing, etc.).  
More detailed invariants will be documented here as they are standardized across tests.

> **TODO:** enumerate and unify the exact common assertions for FSDP tests.

[`tests/test_qwen3_4B_fsdp_true_on_policy.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen3_4B_fsdp_true_on_policy.py)
- algo: grpo

[`tests/test_qwen3_0.6B_fsdp_distributed.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen3_0.6B_fsdp_distributed.py)


[`tests/test_qwen3_0.6B_fsdp_colocated_2xGPU.py`](https://github.com/THUDM/slime/blob/main/tests/test_qwen3_0.6B_fsdp_colocated_2xGPU.py)

---

## TODO
- [x] PPO
- [ ] rollout routing replay
- [x] deterministic training
- [ ] distributed post
- [ ] partial rollout
- [ ] fault tolerance
- [ ] mtp training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] CI coverage tracking #777

Megatron Backend CI

FSDP Backend CI

TODO

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ci] CI coverage tracking #777

Description

Megatron Backend CI

FSDP Backend CI

TODO

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions