[refactor] dynamically import TrainSpec #1740

tianyu-l · 2025-09-23T04:36:21Z

which handles import more efficiently and avoids accidental failure.

After this PR, each new model or experiment need to define a get_train_spec function in their __init__.py file.

fegin

left some comments, stamp to unblock critical fix

fegin · 2025-09-23T05:13:16Z

torchtitan/experiments/forge/train_spec.py

+
+    from torchtitan.experiments import _supported_experiments
+    from torchtitan.models import _supported_models
+


We should do a sanity check here.

assert _supported_models. isdisjoint(_supported_experiments)

This allows us to avoid having duplicated name. You can change _supported_models and _supported_experiments to be set.

torchtitan/protocols/train_spec.py

fegin · 2025-09-23T05:19:14Z

Both tests failure are real.

wwwjn

Need to fix model name here:

torchtitan/torchtitan/experiments/simple_fsdp/tests/integration_tests.py

Line 46 in 85d92de

"--model.name llama3_simple_fsdp",

My bad that forgot to update qwen3.

ruisizhang123 · 2025-09-24T00:08:48Z

hmmm get a question for this PR: I'm trying to import either deepseek_simple_fsdp or llama_simple_fsdp from simple_fsdp/__init__.py. I understand you want a simple_fsdp indentifier to import this folder in torchtitan/protocols/train_spec.py.

Wonder if there is a way that I could specific one of the two model(deepseek or llama) Train_Spec in simple_fsdp folder without significantly refactor the codebase?

tianyu-l requested review from allenwang28, ebsmothers, fegin, joecummings, pbontrager, wconstab and wwwjn as code owners September 23, 2025 04:36

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 23, 2025

tianyu-l mentioned this pull request Sep 23, 2025

Circular imports #1383

Closed

tianyu-l force-pushed the import branch from 5338212 to 08fa31e Compare September 23, 2025 04:42

fegin approved these changes Sep 23, 2025

View reviewed changes

tianyu-l force-pushed the import branch from 08fa31e to 6b671b5 Compare September 23, 2025 05:27

wwwjn approved these changes Sep 23, 2025

View reviewed changes

[refactor] dynamically import TrainSpec

47bb79c

tianyu-l force-pushed the import branch from 6b671b5 to 47bb79c Compare September 23, 2025 05:55

tianyu-l merged commit 5a8256c into main Sep 23, 2025
11 checks passed

tianyu-l deleted the import branch September 23, 2025 06:40

This was referenced Sep 23, 2025

Add einops to requirements.txt #1734

Closed

Add missing etp to ParallelDims constructor. #1742

Closed

tianyu-l added a commit that referenced this pull request Sep 23, 2025

followup fix to #1740

49ffdfe

tianyu-l added a commit that referenced this pull request Sep 23, 2025

followup fix to #1740 (#1747)

8d20f02

My bad that forgot to update qwen3.

ruisizhang123 mentioned this pull request Sep 24, 2025

add support for simplefsdp+ep #1529

Merged

This was referenced Oct 8, 2025

[VLM] Refactor special tokens by subclassing Tokenizer #1802

Open

refactor TrainSpec to remove the name field #1850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] dynamically import TrainSpec #1740

[refactor] dynamically import TrainSpec #1740

Uh oh!

tianyu-l commented Sep 23, 2025

Uh oh!

fegin left a comment

Uh oh!

fegin Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

fegin commented Sep 23, 2025

Uh oh!

wwwjn left a comment

Uh oh!

Uh oh!

ruisizhang123 commented Sep 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		from torchtitan.experiments import _supported_experiments
		from torchtitan.models import _supported_models

[refactor] dynamically import TrainSpec #1740

[refactor] dynamically import TrainSpec #1740

Uh oh!

Conversation

tianyu-l commented Sep 23, 2025

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

fegin Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fegin commented Sep 23, 2025

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ruisizhang123 commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ruisizhang123 commented Sep 24, 2025 •

edited

Loading