Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/en/get_started/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ In some customized Megatron implementations, special operations need to be perfo

slime also support FSDP2 as the training backend, docs [here](https://lmsys.org/blog/2025-12-03-miles-fsdp/).

> FSDP automatically reads all architecture information via `AutoModelForCausalLM.from_pretrained()`, without manual specification. Megatron requires manual configuration of parameters to read model architecture information, or automatic inference via `--use-hf-config-for-megatron`. FSDP can read entirely from `config.json`, directly avoiding the weight format conversion step.
> FSDP automatically reads all architecture information via `AutoModelForCausalLM.from_pretrained()`, without manual specification. Megatron requires manual configuration of parameters to read model architecture information. FSDP can read entirely from `config.json`, directly avoiding the weight format conversion step.

To run FSDP as the training backend, pass `--train-backend fsdp` to enable.

Expand All @@ -325,7 +325,7 @@ Parameters that FSDP used are shown as below in comparison to Megatron, more sup

| Configuration Category | Megatron Parameter | FSDP Parameter | Description |
| --- | --- | --- | --- |
| **Model Loading** | `--load` (Megatron checkpoint) + architecture args (`--num-layers`, `--hidden-size` etc.) or `--use-hf-config-for-megatron` | `--hf-checkpoint` (Required) | **FSDP**: Directly uses HuggingFace format, no weight conversion needed, architecture inferred via `AutoConfig` |
| **Model Loading** | `--load` (Megatron checkpoint) + architecture args (`--num-layers`, `--hidden-size` etc.) | `--hf-checkpoint` (Required) | **FSDP**: Directly uses HuggingFace format, no weight conversion needed, architecture inferred via `AutoConfig` |
| **Tensor Parallel** | `--tensor-model-parallel-size` | Coming Soon | |
| **Pipeline Parallel** | `--pipeline-model-parallel-size` | Coming Soon | |
| **Expert Parallel** | `--expert-model-parallel-size` | Coming Soon | |
Expand Down
4 changes: 2 additions & 2 deletions docs/zh/get_started/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ if __name__ == "__main__":

slime 同样也支持FSDP2作为训练后端,可以参考[文档](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md)。

> FSDP 通过 `AutoModelForCausalLM.from_pretrained()` 自动读取所有架构信息,无需手动指定。Megatron 需要手动配置参数读取 model 架构信息,或者通过 `--use-hf-config-for-megatron` 实现自动推断, FSDP可以全部从 `config.json` 自动读取,可以直接避免权重格式转换步骤。
> FSDP 通过 `AutoModelForCausalLM.from_pretrained()` 自动读取所有架构信息,无需手动指定。Megatron 需要手动配置参数读取 model 架构信息,FSDP可以全部从 `config.json` 自动读取,可以直接避免权重格式转换步骤。

可以通过在命令行传递 `--train-backend fsdp` 来启动 FSDP 作为训练后端。

Expand All @@ -324,7 +324,7 @@ FSDP和Megatron后端支持的参数的对比如下表所示,接下来FSDP会

| 配置类别 | Megatron 参数 | FSDP 参数 | 说明 |
| --- | --- | --- | --- |
| **模型加载** | `--load` (Megatron checkpoint) + 架构参数 (`--num-layers`, `--hidden-size` 等) 或 `--use-hf-config-for-megatron` | `--hf-checkpoint` (必需) | **FSDP**: 直接使用 HuggingFace 格式,无需转换权重,通过 `AutoConfig` 自动推断架构 |
| **模型加载** | `--load` (Megatron checkpoint) + 架构参数 (`--num-layers`, `--hidden-size` 等) | `--hf-checkpoint` (必需) | **FSDP**: 直接使用 HuggingFace 格式,无需转换权重,通过 `AutoConfig` 自动推断架构 |
| **张量并行** | `--tensor-model-parallel-size` | Coming Soon | |
| **流水线并行** | `--pipeline-model-parallel-size` | Coming Soon | |
| **专家并行** | `--expert-model-parallel-size` | Coming Soon | |
Expand Down
12 changes: 0 additions & 12 deletions slime/backends/megatron_utils/config_mapping/__init__.py

This file was deleted.

This file was deleted.

55 changes: 0 additions & 55 deletions slime/backends/megatron_utils/config_mapping/registry.py

This file was deleted.

20 changes: 0 additions & 20 deletions slime/utils/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,11 +170,6 @@ def add_rollout_arguments(parser):
"It doesn't necessary need to contain the most up-to-date parameters."
),
)
parser.add_argument(
"--use-hf-config-for-megatron",
action="store_true",
help="Whether to use HF config for Megatron core to define the model architecture.",
)
parser.add_argument(
"--model-name",
type=str,
Expand Down Expand Up @@ -1295,12 +1290,6 @@ def parse_args(add_custom_arguments=None):
args = megatron_parse_args(extra_args_provider=add_slime_arguments)
if args.hf_checkpoint:
hf_config = AutoConfig.from_pretrained(args.hf_checkpoint, trust_remote_code=True)
if args.use_hf_config_for_megatron:
from slime.backends.megatron_utils.config_mapping import get_mapper

megatron_config_from_hf = get_mapper(hf_config.model_type)(hf_config)
_validate_and_update_megatron_args_from_hf(args, megatron_config_from_hf.transformer_config)
_validate_and_update_megatron_args_from_hf(args, megatron_config_from_hf.gpt_model_args)
hf_validate_args(args, hf_config)

args.rank = 0
Expand Down Expand Up @@ -1614,12 +1603,3 @@ def equal(x, y):

if len(errors) > 0:
raise AssertionError("hf_validate_args failed: " + "; ".join(errors))


def _validate_and_update_megatron_args_from_hf(args, args_from_hf_config: dict[str, Any]):
for key, value in args_from_hf_config.items():
if hasattr(args, key) and getattr(args, key) != value:
raise ValueError(
f"Argument {key} is not consistent. {key} in args is {getattr(args, key)}, but from HF config is {value}."
)
setattr(args, key, value)
Loading