NLP benchmark for multiple GPUs

This is a simple script to quickly benchmark a multiple-GPU system on an NLP task.

Specifically, it fine-tunes an English BERT-Large language model on three GLUE tasks:

QQP
MNLI
QNLI

Instructions

Simply run ai_nlp_benchmark.sh. It will automatically download the BERT model and the GLUE datasets.

Then it will sequentially fine-tune the model on each dataset and save the resulting models and their evaluation results to corresponding sub-directories

The code uses 4 GPUs by default, one can change it in the accelerate_config.yaml file (num_processes). The default per-device batch size is 32, can be decreased (in the ai_nlp_benchmark.sh script) if it is too large for the devices under evaluation.

Every task should take about 1-3 hours.

Results

The evaluation results can be found in the corresponding sub-directories (for example, qnli_results/all_results.json).

The scores should not be significantly different from the following:

QQP: accuracy 0.91, F1 0.88
MNLI: accuracy 0.87
QNLI: accuracy 0.92

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
ai_nlp_benchmark.sh		ai_nlp_benchmark.sh
glue_finetune.py		glue_finetune.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP benchmark for multiple GPUs

Instructions

Results

About

Releases

Packages

Languages

ltgoslo/ai_nlp_benchmark

Folders and files

Latest commit

History

Repository files navigation

NLP benchmark for multiple GPUs

Instructions

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages