THUDM · zhuzilin · Dec 21, 2025 · Dec 21, 2025
diff --git a/docs/en/get_started/usage.md b/docs/en/get_started/usage.md
@@ -315,7 +315,7 @@ In some customized Megatron implementations, special operations need to be perfo
 
 slime also support FSDP2 as the training backend, docs [here](https://lmsys.org/blog/2025-12-03-miles-fsdp/). 
 
-> FSDP automatically reads all architecture information via `AutoModelForCausalLM.from_pretrained()`, without manual specification. Megatron requires manual configuration of parameters to read model architecture information, or automatic inference via `--use-hf-config-for-megatron`. FSDP can read entirely from `config.json`, directly avoiding the weight format conversion step.
+> FSDP automatically reads all architecture information via `AutoModelForCausalLM.from_pretrained()`, without manual specification. Megatron requires manual configuration of parameters to read model architecture information. FSDP can read entirely from `config.json`, directly avoiding the weight format conversion step.
 
 To run FSDP as the training backend, pass `--train-backend fsdp` to enable.
 
@@ -325,7 +325,7 @@ Parameters that FSDP used are shown as below in comparison to Megatron, more sup
 
 | Configuration Category | Megatron Parameter | FSDP Parameter | Description |
 | --- | --- | --- | --- |
-| **Model Loading**         | `--load` (Megatron checkpoint) + architecture args (`--num-layers`, `--hidden-size` etc.) or `--use-hf-config-for-megatron` | `--hf-checkpoint` (Required)                           | **FSDP**: Directly uses HuggingFace format, no weight conversion needed, architecture inferred via `AutoConfig` |
+| **Model Loading**         | `--load` (Megatron checkpoint) + architecture args (`--num-layers`, `--hidden-size` etc.) | `--hf-checkpoint` (Required)                           | **FSDP**: Directly uses HuggingFace format, no weight conversion needed, architecture inferred via `AutoConfig` |
 | **Tensor Parallel**       | `--tensor-model-parallel-size`                               | Coming Soon                                            |                                                              |
 | **Pipeline Parallel**     | `--pipeline-model-parallel-size`                             | Coming Soon                                            |                                                              |
 | **Expert Parallel**       | `--expert-model-parallel-size`                               | Coming Soon                                            |                                                              |

diff --git a/docs/zh/get_started/usage.md b/docs/zh/get_started/usage.md
@@ -314,7 +314,7 @@ if __name__ == "__main__":
 
 slime 同样也支持FSDP2作为训练后端，可以参考[文档](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/slime/fsdp/readme.md)。
 
-> FSDP 通过 `AutoModelForCausalLM.from_pretrained()` 自动读取所有架构信息，无需手动指定。Megatron 需要手动配置参数读取 model 架构信息，或者通过 `--use-hf-config-for-megatron` 实现自动推断， FSDP可以全部从 `config.json` 自动读取，可以直接避免权重格式转换步骤。
+> FSDP 通过 `AutoModelForCausalLM.from_pretrained()` 自动读取所有架构信息，无需手动指定。Megatron 需要手动配置参数读取 model 架构信息，FSDP可以全部从 `config.json` 自动读取，可以直接避免权重格式转换步骤。
 
 可以通过在命令行传递 `--train-backend fsdp` 来启动 FSDP 作为训练后端。
 
@@ -324,7 +324,7 @@ FSDP和Megatron后端支持的参数的对比如下表所示，接下来FSDP会
 
 | 配置类别 | Megatron 参数 | FSDP 参数 | 说明 |
 | --- | --- | --- | --- |
-| **模型加载** | `--load` (Megatron checkpoint) + 架构参数 (`--num-layers`, `--hidden-size` 等) 或 `--use-hf-config-for-megatron` | `--hf-checkpoint` (必需) | **FSDP**: 直接使用 HuggingFace 格式，无需转换权重，通过 `AutoConfig` 自动推断架构 |
+| **模型加载** | `--load` (Megatron checkpoint) + 架构参数 (`--num-layers`, `--hidden-size` 等) | `--hf-checkpoint` (必需) | **FSDP**: 直接使用 HuggingFace 格式，无需转换权重，通过 `AutoConfig` 自动推断架构 |
 | **张量并行** | `--tensor-model-parallel-size` | Coming Soon |  |
 | **流水线并行** | `--pipeline-model-parallel-size` | Coming Soon |  |
 | **专家并行** | `--expert-model-parallel-size` | Coming Soon |  |

diff --git a/slime/backends/megatron_utils/config_mapping/__init__.py b/slime/backends/megatron_utils/config_mapping/__init__.py
diff --git a/slime/backends/megatron_utils/config_mapping/predefined_config_mappers.py b/slime/backends/megatron_utils/config_mapping/predefined_config_mappers.py
diff --git a/slime/backends/megatron_utils/config_mapping/registry.py b/slime/backends/megatron_utils/config_mapping/registry.py
diff --git a/slime/utils/arguments.py b/slime/utils/arguments.py
@@ -170,11 +170,6 @@ def add_rollout_arguments(parser):
                     "It doesn't necessary need to contain the most up-to-date parameters."
                 ),
             )
-            parser.add_argument(
-                "--use-hf-config-for-megatron",
-                action="store_true",
-                help="Whether to use HF config for Megatron core to define the model architecture.",
-            )
             parser.add_argument(
                 "--model-name",
                 type=str,
@@ -1295,12 +1290,6 @@ def parse_args(add_custom_arguments=None):
         args = megatron_parse_args(extra_args_provider=add_slime_arguments)
         if args.hf_checkpoint:
             hf_config = AutoConfig.from_pretrained(args.hf_checkpoint, trust_remote_code=True)
-            if args.use_hf_config_for_megatron:
-                from slime.backends.megatron_utils.config_mapping import get_mapper
-
-                megatron_config_from_hf = get_mapper(hf_config.model_type)(hf_config)
-                _validate_and_update_megatron_args_from_hf(args, megatron_config_from_hf.transformer_config)
-                _validate_and_update_megatron_args_from_hf(args, megatron_config_from_hf.gpt_model_args)
             hf_validate_args(args, hf_config)
 
         args.rank = 0
@@ -1614,12 +1603,3 @@ def equal(x, y):
 
     if len(errors) > 0:
         raise AssertionError("hf_validate_args failed: " + "; ".join(errors))
-
-
-def _validate_and_update_megatron_args_from_hf(args, args_from_hf_config: dict[str, Any]):
-    for key, value in args_from_hf_config.items():
-        if hasattr(args, key) and getattr(args, key) != value:
-            raise ValueError(
-                f"Argument {key} is not consistent. {key} in args is {getattr(args, key)}, but from HF config is {value}."
-            )
-        setattr(args, key, value)