Skip to content

Conversation

@yiming0416
Copy link
Contributor

@yiming0416 yiming0416 commented Oct 10, 2025

#1850 removed name field in TrainSpec. The experiments in simple_fsdp should also be updated. Otherwise it won't run.

#1776 added use_flex_attn field to apply_non_moe_tp(), which is missing in simple_fsdp experiments

NGPU=8 CONFIG_FILE=./torchtitan/models/llama3/train_configs/debug_model.toml ./run_train.sh --model.name simple_fsdp.llama3 --compile.enable
NGPU=8 CONFIG_FILE=./torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name simple_fsdp.deepseek_v3 --compile.enable

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025
@yiming0416 yiming0416 force-pushed the fix_simple_fsdp_train_spec branch from 0d75b04 to 541d08b Compare October 11, 2025 00:12
@yiming0416 yiming0416 changed the title Fix TrainSpec in simple_fsdp experiments Minor fixes in simple_fsdp experiments Oct 11, 2025
Copy link
Contributor

@ruisizhang123 ruisizhang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the fix.

@ruisizhang123 ruisizhang123 merged commit dfd0a59 into pytorch:main Oct 11, 2025
5 checks passed
githubsgi pushed a commit to githubsgi/torchtitan that referenced this pull request Oct 13, 2025
pytorch#1850 removed `name` field in
`TrainSpec`. The experiments in simple_fsdp should also be updated.
Otherwise it won't run.

pytorch#1776 added `use_flex_attn`
field to `apply_non_moe_tp()`, which is missing in simple_fsdp
experiments

```
NGPU=8 CONFIG_FILE=./torchtitan/models/llama3/train_configs/debug_model.toml ./run_train.sh --model.name simple_fsdp.llama3 --compile.enable
```

```
NGPU=8 CONFIG_FILE=./torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name simple_fsdp.deepseek_v3 --compile.enable
```
@yiming0416 yiming0416 deleted the fix_simple_fsdp_train_spec branch October 13, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants