run_qa.py script does not compute eval_loss and gives KeyError: 'eval_loss' with load_best_model_at_end #29801

ftesser · 2024-03-22T08:40:03Z

System Info

transformers version: 4.39.0
Platform: Linux-5.15.0-100-lowlatency-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.0+cu121 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

The official QA training script run_qa.py return the following error with --load_best_model_at_end and --metric_for_best_model "loss".

Traceback (most recent call last):
  File "/home/fabio/repos/transformers/examples/pytorch/question-answering/run_qa.py", line 716, in <module>
    main()
  File "/home/fabio/repos/transformers/examples/pytorch/question-answering/run_qa.py", line 657, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/fabio/repos/transformers/src/transformers/trainer.py", line 1780, in train
    return inner_training_loop(
  File "/home/fabio/repos/transformers/src/transformers/trainer.py", line 2213, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
  File "/home/fabio/repos/transformers/src/transformers/trainer.py", line 2588, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/fabio/repos/transformers/src/transformers/trainer.py", line 2669, in _save_checkpoint
    metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'

full_log.txt

@ArthurZucker @sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

To reproduce the error in quick way you can use a distilled model and limit the max_train_samples and max_eval_samples:

python examples/pytorch/question-answering/run_qa.py \
--model_name_or_path deepset/roberta-base-squad2-distilled  \
--dataset_name squad \
--do_train \
--do_eval \
--max_seq_length 384 \
--doc_stride 128 \
--max_train_samples 5 \
--max_eval_samples 2 \
--num_train_epochs 3 \
--load_best_model_at_end \
--metric_for_best_model "loss" \
--evaluation_strategy "epoch" \
--save_strategy "epoch" \
--overwrite_output_dir \
--output_dir ~/tmp/debug_squad/

I have tested this bug also with a normal model and a full dataset but the error is always there.

Expected behavior

During the evaluation phase the eval_loss should be computed and the best model should be saved using the loss metric.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-03-25T09:24:20Z

that is more a trainer issue cc @muellerzr and @SunMarc. Given that it is an example and we usually don't maintain examples, I'll set this as a good second issue.

ftesser · 2024-03-26T17:12:21Z

@ArthurZucker @jla524 I saw your pull request #29867 , and I agree that adding the message for unsupported metrics is a great idea.

However, I don't understand why loss is not supported in the case of squad2 (#29867 (comment)): the use case is to have the loss metric available in the validation set in order to use that metric to determine the best model.

Furthermore, the loss is calculated in the training set . Why isn't it possible to calculate it in the valuation set too?

ArthurZucker · 2024-03-30T16:01:55Z

@ftesser that is a question for evaluate! It's probably that the loss is not really metric like f1 and it's log elsewhere.

amyeroberts added Examples Which is related to examples in general bug labels Mar 22, 2024

ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Mar 25, 2024

jla524 mentioned this issue Mar 26, 2024

Add warning message for run_qa.py #29867

Merged

ArthurZucker closed this as completed in #29867 Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_qa.py script does not compute eval_loss and gives KeyError: 'eval_loss' with load_best_model_at_end #29801

run_qa.py script does not compute eval_loss and gives KeyError: 'eval_loss' with load_best_model_at_end #29801

ftesser commented Mar 22, 2024

ArthurZucker commented Mar 25, 2024

ftesser commented Mar 26, 2024

ArthurZucker commented Mar 30, 2024

run_qa.py script does not compute eval_loss and gives KeyError: 'eval_loss' with load_best_model_at_end #29801

run_qa.py script does not compute eval_loss and gives KeyError: 'eval_loss' with load_best_model_at_end #29801

Comments

ftesser commented Mar 22, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Mar 25, 2024

ftesser commented Mar 26, 2024

ArthurZucker commented Mar 30, 2024