Skip to content

Conversation

@tianyu-l
Copy link
Contributor

as titled

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 24, 2025
The folder should be organized as follows
- `model` folder: a self-contained folder of model definition and args
- `args.py`
- Inherit [`BaseModelArgs`](/torchtitan/protocols/model.py) and implement the interfaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the url is not pointing to correct place, should be train_spec.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesleytruong is changing it in #1441

Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note covers all the details I've met during adding dsv3. Nice documentation!

- `model.py`
- NOTE: Please adhere to the guiding principles and write single-device model code.
- NOTE: We prioritize readability over flexibility. The preferred style is to not share modules among different models, except for the most common and complicated ones.
- Inherit [`ModelProtocol`](/torchtitan/protocols/model.py) and implement the interfaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the url points to a wrong place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer to above


## Testing and Benchmarking
- Numerics testing
- One way of doing this E2E is to load the same model checkpoint into the `torchtitan` model and the HF model, and compare the model output given the same input. This assumes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a brief explanation of to_hf and from_hf functions. These function implementations are not super intuitive and reader could benefit from reading how these two functions are used to load checkpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanation to it. This is still evolving, will revisit once we have a sound solution.

@tianyu-l tianyu-l merged commit 38a9d30 into main Jul 25, 2025
4 of 7 checks passed
@tianyu-l tianyu-l deleted the model branch July 25, 2025 01:29
idoh pushed a commit to idoh/torchtitan that referenced this pull request Jul 28, 2025
bentherien pushed a commit to bentherien/torchtitan_ that referenced this pull request Aug 5, 2025
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants