-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLM training example #248
Add CLM training example #248
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hi @carzh, thanks for adding the training example! Can you reformat the changed files by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @carzh, thanks for the example!
I left some small nits in the example. For the PyTorch baseline, are ort folks planning to use it for benchmarking purposes? Since we prefer to just use ORTTrainer as it is an ort training example.
P.S. For the code quality check, can you do pip install ".[quality]"
before make style
to ensure that you are using the correct version of black and isort? This should help the scripts pass the quality check, thx!
examples/onnxruntime/training/language-modeling/clm_requirements.txt
Outdated
Show resolved
Hide resolved
Hi @JingyaHuang thank you so much for taking a look at the PR & adding the helpful comments! To summarize some of the changes / comments responses:
|
protobuf > 3.21.x will break the training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @carzh, thanks for iterating on this, LGTM!
For information, currently, the script will only work with the latest transformers version as demanded by ORTModels dependencies, thus gpt2 mixed-precision training will fail in this case. To solve the issue, we are working on:
- transformers PR #18017 - Fix gpt2 fp16 training which breaks after transformers>4.16.0 (with the fix users can do gpt2 fp16 training with the latest transformers)
- an upcoming PR Associated import lead to import fails #265 to decouple the import of different APIs (with the fix users can do gpt2 fp16 training with transformers 4.16.0)
Hello, just wanted to confirm -- this PR is on standby until the GPT2 fp16 issue is resolved? Or can it be merged in sooner and is pending other checks? Thanks & sorry for any inconvenience! |
Hi @carzh, actually there are two possible modifications to the clm example:
But I think that it would be better to add these modifications after these two PRs make their way to the main branch. I will merge this PR first! |
What does this PR do?
Built off of work from the transformers/examples repo.