Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_glue_tpu.py that trains models on TPUs #3702

Merged
merged 24 commits into from
Apr 10, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
5a44823
Initial commit to get BERT + run_glue.py on TPU
jysohn23 Nov 6, 2019
837fac2
Add README section for TPU and address comments.
jysohn23 Nov 18, 2019
e056eff
Initial commit to get GLUE (BERT) on TPU (#2)
jysohn23 Nov 18, 2019
b421758
Cleanup TPU bits from run_glue.py (#3)
jysohn23 Nov 20, 2019
6ef1edd
Cleanup TPU bits from run_glue.py
jysohn23 Nov 20, 2019
3129ad3
No need to call `xm.mark_step()` explicitly (#4)
jysohn23 Nov 21, 2019
295190f
Resolve R/W conflicts from multiprocessing (#5)
jysohn23 Nov 25, 2019
bb3fcee
Add XLNet in list of models for `run_glue_tpu.py` (#6)
jysohn23 Dec 3, 2019
c5c8293
Add RoBERTa to list of models in TPU GLUE (#7)
jysohn23 Dec 9, 2019
4ba47e5
Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)
jysohn23 Jan 10, 2020
6d17e91
Use barriers to reduce duplicate work/resources (#9)
jysohn23 Apr 1, 2020
6e20572
Shard eval dataset and aggregate eval metrics (#10)
jysohn23 Apr 2, 2020
14a0da3
Only use tb_writer from master (#11)
jysohn23 Apr 2, 2020
13cea37
Merge remote-tracking branch 'upstream-hf/master' into tpu
jysohn23 Apr 2, 2020
3feb8e1
Apply huggingface black code formatting
jysohn23 Apr 8, 2020
54438f7
Style
LysandreJik Apr 8, 2020
23829c0
Remove `--do_lower_case` as example uses cased
jysohn23 Apr 8, 2020
3e45ae3
Add option to specify tensorboard logdir
jysohn23 Apr 9, 2020
8296b1a
Using configuration for `xla_device`
LysandreJik Apr 9, 2020
306851c
Merge pull request #1 from jysohn23/tpu-with-config
jysohn23 Apr 9, 2020
1eb47c5
Prefix TPU specific comments.
jysohn23 Apr 9, 2020
10f5b9a
num_cores clarification and namespace eval metrics
jysohn23 Apr 10, 2020
6e959fd
Cache features file under `args.cache_dir`
jysohn23 Apr 10, 2020
1e62165
Rename `run_glue_tpu` to `run_tpu_glue`
LysandreJik Apr 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 46 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pip install -r ./examples/requirements.txt
| Section | Description |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------
| [TensorFlow 2.0 models on GLUE](#TensorFlow-2.0-Bert-models-on-GLUE) | Examples running BERT TensorFlow 2.0 model on the GLUE tasks. |
| [Running on TPUs](#running-on-tpus) | Examples on running fine-tuning tasks on Google TPUs to accelerate workloads. |
| [Language Model training](#language-model-training) | Fine-tuning (or training from scratch) the library models for language modeling on a text dataset. Causal language modeling for GPT/GPT-2, masked language modeling for BERT/RoBERTa. |
| [Language Generation](#language-generation) | Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. |
| [GLUE](#glue) | Examples running BERT/XLM/XLNet/RoBERTa on the 9 GLUE tasks. Examples feature distributed training as well as half-precision. |
Expand Down Expand Up @@ -48,12 +49,55 @@ Quick benchmarks from the script (no other modifications):

Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).

## Running on TPUs

You can accelerate your workloads on Google's TPUs. For information on how to setup your TPU environment refer to this
[README](https://github.com/pytorch/xla/blob/master/README.md).

The following are some examples of running the `*_tpu.py` finetuning scripts on TPUs. All steps for data preparation are
identical to your normal GPU + Huggingface setup.

### GLUE

Before running anyone of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.

For running your GLUE task on MNLI dataset you can run something like the following:

```
export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470"
export GLUE_DIR=/path/to/glue
export TASK_NAME=MNLI

python run_glue_tpu.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--do_lower_case \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's present in other example codes (and should be changed), but should we keep the --do_lower_case option with cased models?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed.

--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 3e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME \
--overwrite_output_dir \
--logging_steps 50 \
--save_steps 200 \
--num_cores=8 \
--only_log_master
```


## Language model training

Based on the script [`run_language_modeling.py`](https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py).

Fine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
Fine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
are fine-tuned using a masked language modeling (MLM) loss.

Before running the following example, you should get a file that contains text on which the language model will be
Expand Down
Loading