the eval_acc on RTE dataset is only 55% #27

leoozy · 2022-07-21T09:23:37Z

Hello, thank you for your code. I tired to run your code with the following commond:
aim=pretraining_experiment-bert-mlm--23000
deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 64000 run_pretraining.py
--model_type bert-mlm --tokenizer_name bert-base-uncased
--hidden_act gelu
--hidden_size 1024
--num_hidden_layers 24
--num_attention_heads 16
--intermediate_size 4096
--hidden_dropout_prob 0.1
--attention_probs_dropout_prob 0.1
--encoder_ln_mode pre-ln
--lr 1e-3
--train_batch_size 4096
--train_micro_batch_size_per_gpu 128
--lr_schedule step
--curve linear
--warmup_proportion 0.06
--gradient_clipping 0.0
--optimizer_type adamw
--weight_decay 0.01
--adam_beta1 0.9
--adam_beta2 0.98
--adam_eps 1e-6
--total_training_time 24.0
--early_exit_time_marker 24.0
--dataset_path path_to_dataset
--output_dir path_to_output
--print_steps 100
--num_epochs_between_checkpoints 10000
--job_name ${aim}
--project_name budget-bert-pretraining
--validation_epochs 3
--validation_epochs_begin 1
--validation_epochs_end 1
--validation_begin_proportion 0.05
--validation_end_proportion 0.01
--validation_micro_batch 16
--deepspeed
--data_loader_type dist
--do_validation
--use_early_stopping
--early_stop_time 180
--early_stop_eval_loss 6
--seed 42
--fp16
--max_steps 23000
--finetune_checkpoint_at_end

I did not change your code. But the eval_acc on RTE is only 55%, which is significantly lower than bert-baseline (~65%). Could you give some advices?

peteriz · 2022-07-27T06:19:59Z

I don't know what backend you ran this experiment on but one issue that might cause an under-trained model is that your training session didn't reach 23k updates within 24 hours (see command that has a hard limit == will stop training after 1 day).
Try running the same command but without early stopping or time limit (just for 23k steps).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the eval_acc on RTE dataset is only 55% #27

the eval_acc on RTE dataset is only 55% #27

leoozy commented Jul 21, 2022

peteriz commented Jul 27, 2022

the eval_acc on RTE dataset is only 55% #27

the eval_acc on RTE dataset is only 55% #27

Comments

leoozy commented Jul 21, 2022

peteriz commented Jul 27, 2022