-
Notifications
You must be signed in to change notification settings - Fork 602
publish instructions on adding a new model #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The folder should be organized as follows | ||
| - `model` folder: a self-contained folder of model definition and args | ||
| - `args.py` | ||
| - Inherit [`BaseModelArgs`](/torchtitan/protocols/model.py) and implement the interfaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the url is not pointing to correct place, should be train_spec.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesleytruong is changing it in #1441
wwwjn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note covers all the details I've met during adding dsv3. Nice documentation!
| - `model.py` | ||
| - NOTE: Please adhere to the guiding principles and write single-device model code. | ||
| - NOTE: We prioritize readability over flexibility. The preferred style is to not share modules among different models, except for the most common and complicated ones. | ||
| - Inherit [`ModelProtocol`](/torchtitan/protocols/model.py) and implement the interfaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, the url points to a wrong place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same answer to above
|
|
||
| ## Testing and Benchmarking | ||
| - Numerics testing | ||
| - One way of doing this E2E is to load the same model checkpoint into the `torchtitan` model and the HF model, and compare the model output given the same input. This assumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a brief explanation of to_hf and from_hf functions. These function implementations are not super intuitive and reader could benefit from reading how these two functions are used to load checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some explanation to it. This is still evolving, will revisit once we have a sound solution.
as titled