Skip to content

Commit

Permalink
Add CLM training example (#248)
Browse files Browse the repository at this point in the history
* copied run_clm script from transformers/examples + added requirements file

* added optimum support + ort flag

* added brief readme + moved files into language-modeling folder oops

* reformatted run_clm.py

* changed requirements text file according to suggestions + reformatted run_clm.py

* removed ORT flag from run_clm.py script and uses ORTTrainer by default

* updated README to match

* Update requirements.txt

protobuf > 3.21.x will break the training.

* Update the min version of trfrs

Co-authored-by: carzh <t-carzhu@microsoft.com>
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
  • Loading branch information
3 people authored Jul 13, 2022
1 parent 6911cc6 commit 7c1a621
Show file tree
Hide file tree
Showing 3 changed files with 629 additions and 0 deletions.
45 changes: 45 additions & 0 deletions examples/onnxruntime/training/language-modeling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!---
Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Language Modeling

## Language Modeling Training

By running the scripts [`run_clm.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/language-modeling/run_clm.py)
and [`run_mlm.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/language-modeling/run_mlm.py),
we will be able to leverage the [`ONNX Runtime`](https://github.com/microsoft/onnxruntime) accelerator to train the language models from the
[HuggingFace hub](https://huggingface.co/models).


__The following example applies the acceleration features powered by ONNX Runtime.__


### ONNX Runtime Training

The following example trains GPT2 on wikitext-2 with mixed precision (fp16).

```bash
python run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--do_train \
--output_dir /tmp/test-clm \
--fp16
```

__Note__
> *To enable ONNX Runtime training, your devices need to be equipped with GPU. Install the dependencies either with our prepared*
*[Dockerfiles](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/docker/) or follow the instructions*
*in [`torch_ort`](https://github.com/pytorch/ort/blob/main/torch_ort/docker/README.md).*
---
10 changes: 10 additions & 0 deletions examples/onnxruntime/training/language-modeling/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
datasets >= 1.8.0
sentencepiece != 0.1.92
scipy
scikit-learn
protobuf <= 3.20.1
torch >= 1.9.0
transformers>=4.16.0
onnx>=1.9.0
onnxruntime-training>=1.9.0
torch-ort
Loading

0 comments on commit 7c1a621

Please sign in to comment.