Add CLM training example #248

carzh · 2022-06-30T17:53:02Z

What does this PR do?

adds CLM training example + brief README related to it

Built off of work from the transformers/examples repo.

… file

HuggingFaceDocBuilderDev · 2022-06-30T21:32:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

JingyaHuang · 2022-07-01T12:58:01Z

Hi @carzh, thanks for adding the training example!

Can you reformat the changed files by make style, in the meantime, I will work on the review, thanks!

JingyaHuang

Hi @carzh, thanks for the example!

I left some small nits in the example. For the PyTorch baseline, are ort folks planning to use it for benchmarking purposes? Since we prefer to just use ORTTrainer as it is an ort training example.

P.S. For the code quality check, can you do pip install ".[quality]" before make style to ensure that you are using the correct version of black and isort? This should help the scripts pass the quality check, thx!

examples/onnxruntime/training/language-modeling/clm_requirements.txt

examples/onnxruntime/training/language-modeling/README.md

examples/onnxruntime/training/language-modeling/run_clm.py

… run_clm.py

carzh · 2022-07-06T23:22:38Z

Hi @JingyaHuang thank you so much for taking a look at the PR & adding the helpful comments!

To summarize some of the changes / comments responses:

ORT team was originally planning on using it for benchmarking purposes, but we removed the ORT flag and will use your suggestion of comparing with transformers/examples for benchmarking.
updated the requirements file according to your suggestions.
didn't add preprocess_logits_for_metrics argument back in because ORTTrainer doesn't have that argument.

protobuf > 3.21.x will break the training.

JingyaHuang

Hi @carzh, thanks for iterating on this, LGTM!

For information, currently, the script will only work with the latest transformers version as demanded by ORTModels dependencies, thus gpt2 mixed-precision training will fail in this case. To solve the issue, we are working on:

transformers PR #18017 - Fix gpt2 fp16 training which breaks after transformers>4.16.0 (with the fix users can do gpt2 fp16 training with the latest transformers)
an upcoming PR Associated import lead to import fails #265 to decouple the import of different APIs (with the fix users can do gpt2 fp16 training with transformers 4.16.0)

carzh · 2022-07-12T21:47:04Z

Hello, just wanted to confirm -- this PR is on standby until the GPT2 fp16 issue is resolved? Or can it be merged in sooner and is pending other checks? Thanks & sorry for any inconvenience!

JingyaHuang · 2022-07-13T22:13:55Z

Hello, just wanted to confirm -- this PR is on standby until the GPT2 fp16 issue is resolved? Or can it be merged in sooner and is pending other checks? Thanks & sorry for any inconvenience!

Hi @carzh, actually there are two possible modifications to the clm example:

The fix of gpt2(the PR is waiting for the approval of transformers' core maintainers) -> need to change the minimum transformers version in the example.
In the Add ORT fused adam optimizer #295, new ORT training arguments were added which enable the use of ort fused adam optimizer -> need to replace training args by ort training args.

But I think that it would be better to add these modifications after these two PRs make their way to the main branch. I will merge this PR first!

carzh added 3 commits June 30, 2022 17:45

copied run_clm script from transformers/examples + added requirements…

f44315f

… file

added optimum support + ort flag

e15848a

added brief readme + moved files into language-modeling folder oops

2ac23ab

reformatted run_clm.py

a16d9fc

JingyaHuang reviewed Jul 4, 2022

View reviewed changes

JingyaHuang mentioned this pull request Jul 4, 2022

ORT-OPTIMUM #229

Merged

carzh added 3 commits July 5, 2022 22:04

changed requirements text file according to suggestions + reformatted…

fe8ef1d

… run_clm.py

removed ORT flag from run_clm.py script and uses ORTTrainer by default

9860d70

updated README to match

90dcb25

Update requirements.txt

f6517c9

protobuf > 3.21.x will break the training.

JingyaHuang approved these changes Jul 7, 2022

View reviewed changes

Update the min version of trfrs

803972f

JingyaHuang merged commit 7c1a621 into huggingface:main Jul 13, 2022

JingyaHuang mentioned this pull request Jan 13, 2023

Compute Loss inside the training step. #686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLM training example #248

Add CLM training example #248

carzh commented Jun 30, 2022

HuggingFaceDocBuilderDev commented Jun 30, 2022

JingyaHuang commented Jul 1, 2022 •

edited

Loading

JingyaHuang left a comment •

edited

Loading

carzh commented Jul 6, 2022

JingyaHuang left a comment

carzh commented Jul 12, 2022

JingyaHuang commented Jul 13, 2022

Add CLM training example #248

Add CLM training example #248

Conversation

carzh commented Jun 30, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 30, 2022

JingyaHuang commented Jul 1, 2022 • edited Loading

JingyaHuang left a comment • edited Loading

Choose a reason for hiding this comment

carzh commented Jul 6, 2022

JingyaHuang left a comment

Choose a reason for hiding this comment

carzh commented Jul 12, 2022

JingyaHuang commented Jul 13, 2022

JingyaHuang commented Jul 1, 2022 •

edited

Loading

JingyaHuang left a comment •

edited

Loading