changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

prathikr · 2024-02-26T22:58:38Z

System Info

transformers & optimum installed from source on 2/26/2024

Who can help?

@amyeroberts @JingyaHuang @regisss

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced by running the current image-classification finetune example under optimum/examples/onnxruntime/training/image-classification/run_image_classification.py with the following run command:

torchrun run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name beans \
--output_dir ./beans_outputs/ \
--remove_unused_columns False \
--label_column_name labels \
--do_train \
--do_eval \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--seed 1337

Expected behavior

Recently there was a change introduced by huggingface/transformers#27326 to log gradient norm in transformer's trainer. These changes are not reflected in optimum repo resulting in the following error:

Traceback (most recent call last):
  File "run_image_classification.py", line 451, in <module>
    main()
  File "run_image_classification.py", line 425, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 392, in train
    return inner_training_loop(
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 774, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
TypeError: _maybe_log_save_evaluate() missing 1 required positional argument: 'ignore_keys_for_eval'

WORKAROUND: adjust trainer.py to pass None where group_norm input is expected as that is the default setting.

- self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
+ self._maybe_log_save_evaluate(tr_loss, None, model, trial, epoch, ignore_keys_for_eval)

The text was updated successfully, but these errors were encountered:

jingyanwangms · 2024-02-27T18:08:08Z

It's caused by huggingface/transformers@4f09d0f
Fixed in PR #1730

prathikr added the bug Something isn't working label Feb 26, 2024

JingyaHuang self-assigned this Feb 27, 2024

jingyanwangms mentioned this issue Feb 27, 2024

Follow grad_norm changes in transformers #27326 #1730

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

prathikr commented Feb 26, 2024 •

edited

Loading

jingyanwangms commented Feb 27, 2024

changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

Comments

prathikr commented Feb 26, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

jingyanwangms commented Feb 27, 2024

prathikr commented Feb 26, 2024 •

edited

Loading