Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes to _maybe_log_save_evaluate() not reflected in optimum repo #1723

Open
2 of 4 tasks
prathikr opened this issue Feb 26, 2024 · 1 comment
Open
2 of 4 tasks
Assignees
Labels
bug Something isn't working

Comments

@prathikr
Copy link
Contributor

prathikr commented Feb 26, 2024

System Info

transformers & optimum installed from source on 2/26/2024

Who can help?

@amyeroberts @JingyaHuang @regisss

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced by running the current image-classification finetune example under optimum/examples/onnxruntime/training/image-classification/run_image_classification.py with the following run command:

torchrun run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name beans \
--output_dir ./beans_outputs/ \
--remove_unused_columns False \
--label_column_name labels \
--do_train \
--do_eval \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--seed 1337

Expected behavior

Recently there was a change introduced by huggingface/transformers#27326 to log gradient norm in transformer's trainer. These changes are not reflected in optimum repo resulting in the following error:

Traceback (most recent call last):
  File "run_image_classification.py", line 451, in <module>
    main()
  File "run_image_classification.py", line 425, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 392, in train
    return inner_training_loop(
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 774, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
TypeError: _maybe_log_save_evaluate() missing 1 required positional argument: 'ignore_keys_for_eval'

WORKAROUND: adjust trainer.py to pass None where group_norm input is expected as that is the default setting.

- self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
+ self._maybe_log_save_evaluate(tr_loss, None, model, trial, epoch, ignore_keys_for_eval)
@prathikr prathikr added the bug Something isn't working label Feb 26, 2024
@JingyaHuang JingyaHuang self-assigned this Feb 27, 2024
@jingyanwangms
Copy link
Contributor

It's caused by huggingface/transformers@4f09d0f
Fixed in PR #1730

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants