Problem when speeding up fine-tuning bert-base-uncased on ReCoRD #1099

ThangPM · 2020-06-27T16:53:13Z

Hello,

I am trying to reproduce result for ReCoRD task by fine-tuning bert-base-uncased model but it takes days for 1 GPU (Tesla V100) because the training set is quite big (~1.13M examples).

python main.py --config jiant/config/superglue_bert.conf --overrides random_seed = 42, cuda = 0, run_name = record, pretrain_tasks = "record", target_tasks = "record", do_pretrain = 1, do_target_task_training = 0, do_full_eval = 1, batch_size = 8, val_interval = 10000, val_data_limit = -1

06/26 12:38:54 PM: Update 340556: task record, steps since last val 556 (total steps = 340556): f1: 0.5516, em: 0.5403, avg: 0.5459, record_loss: 0.1962
06/26 12:39:04 PM: Update 340603: task record, steps since last val 603 (total steps = 340603): f1: 0.5577, em: 0.5446, avg: 0.5512, record_loss: 0.1899

It takes 10 secs for (340603 - 340556) = 47 steps

I decided to speed up this process by using 8 GPUs (still Tesla V100) and update batch_size from 8 to 128 but it seems to take longer than 1 GPU.

python main.py --config jiant/config/superglue_bert.conf --overrides random_seed = 42, cuda = auto, run_name = record, pretrain_tasks = "record", target_tasks = "record", do_pretrain = 1, do_target_task_training = 0, do_full_eval = 1, batch_size = 128, val_interval = 10000, val_data_limit = -1

06/27 12:50:54 PM: Update 452155: task record, steps since last val 2155 (total steps = 452155): f1: 0.2494, em: 0.2414, avg: 0.2454, record_loss: 0.3956
06/27 12:51:08 PM: Update 452170: task record, steps since last val 2170 (total steps = 452170): f1: 0.2492, em: 0.2413, avg: 0.2453, record_loss: 0.3953

Now it takes around 14 secs for only 15 steps. Am I doing anything wrong or is this an issue?

Any comments would be appreciated.

sleepinyourhat · 2020-06-30T19:09:29Z

@phu-pmh, @pruksmhc - Any guess what's up here?

zphang · 2020-10-16T04:29:41Z

This is an automatically generated comment.

As we update jiant to v2.x, jiant v1.x has been migrated to https://github.com/nyu-mll/jiant-v1-legacy. As such, we are closing all issues relating to jiant v1.x in this repository.

If this issue is still affecting you in jiant v1.x, please follow up at nyu-mll/jiant-v1-legacy#1099.

If this issue is still affecting you in jiant v2.x, reopen this issue or create a new one.

jeswan mentioned this issue Sep 17, 2020

Problem when speeding up fine-tuning bert-base-uncased on ReCoRD nyu-mll/jiant-v1-legacy#1099

Open

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

zphang closed this as completed Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem when speeding up fine-tuning bert-base-uncased on ReCoRD #1099

Problem when speeding up fine-tuning bert-base-uncased on ReCoRD #1099

ThangPM commented Jun 27, 2020

sleepinyourhat commented Jun 30, 2020

zphang commented Oct 16, 2020

Problem when speeding up fine-tuning bert-base-uncased on ReCoRD #1099

Problem when speeding up fine-tuning bert-base-uncased on ReCoRD #1099

Comments

ThangPM commented Jun 27, 2020

sleepinyourhat commented Jun 30, 2020

zphang commented Oct 16, 2020