[finetune_trainer] enhancements and fixes #9042

stas00 · 2020-12-10T21:56:50Z

The main need was to add speed metrics to perform speed performance regressions. But on the way a bunch of other things got worked on. Hopefully you will find the proposed changes useful.

This PR change trainer

adds an optional metric_key_prefix for evaluate and predict functions to return metrics with a prefix key set by the user rather than the default eval_.

This PR change finetune_trainer

utils: sort json keys when dumping to filesystem
renames s/eval/val/ for the validation dataset results
adds speed metrics for all: train/eval/test (samples_per_second/runtime/n_objs)
refactors logging/saving code for each mode
renames internal vars to tell which is metrics and which is output that is more than just metrics
fixes a bug where all_results.json wasn't getting saved in the right place
rounds up loss values to 4 decimals - before it was "eval_loss": 368.2950744628906, - not sure if it's better done upstream in the trainer?

Here is a sample of all_results.json after this change:

{
    "epoch": 1.0,
    "test_bleu": 22.8548,
    "test_gen_len": 35.9,
    "test_loss": 734.8612,
    "test_n_ojbs": 10,
    "test_runtime": 2.5185,
    "test_samples_per_second": 3.971,
    "train_n_ojbs": 200,
    "train_runtime": 24.9101,
    "train_samples_per_second": 8.029
    "val_bleu": 26.581,
    "val_gen_len": 31.4,
    "val_loss": 738.3438,
    "val_n_ojbs": 200,
    "val_runtime": 33.9329,
    "val_samples_per_second": 5.894,
}

@sgugger, @patil-suraj, @patrickvonplaten

sgugger · 2020-12-10T22:16:27Z

Unfortunately the naming has been done a long time ago and even if it's not ideal, we can't break it like this as people rely on the names of the keys in their code. I would advocate for the renaming to be done in the script directly and not inside Trainer.

If there is really a lot of people asking for it, we can think of a strategy to rename those keys progressively with some kind of deprecation warning, but since it's merely cosmetic, I would leave that for scripts using Trainer.

stas00 · 2020-12-10T22:22:10Z

I see what you mean that someone relying on "eval_loss" when doing predict would have their code broken. Yes, we can't do that.
I moved this fix back into the finetune trainer as it was originally.

Could we set a a target for when we could do breaking changes and fix this bug?

I also find it strange that we use --n_val but eval_

And then predict vs test_.

The callbacks are inconsistent too :(

I'd plan a design session where we first collect all the rough edges and inputs on what needs to be polished and then adjust the trainer so that it's not limping for the rest of its life. Can this be done?

sgugger · 2020-12-10T22:30:57Z

Could we set a a target for when we could do breaking changes and fix this bug?

Like I said, unless there is strong demand for it, I think we're just going to leave it as it. It's not the ideal naming choice but we have to deal with it now (kind of like PretrainedConfig vs PreTrainedConfig).

I also find it strange that we use --n_val but eval_

And then predict vs test_.

I don't understand that part. Also predict could be used for test or evaluation, so predict does not mean test.

The callbacks are inconsistent too

Could you elaborate? If it's the evaluate vs predict you mentioned, there is a reason. prediction_step is called both in predict and evaluate whereas the on_evaluate is only called at evaluate.

stas00 · 2020-12-10T22:37:54Z

Could we set a a target for when we could do breaking changes and fix this bug?

Like I said, unless there is strong demand for it, I think we're just going to leave it as it. It's not the ideal naming choice but we have to deal with it now (kind of like PretrainedConfig vs PreTrainedConfig).

I'm not sure how this is similar. I call trainer.predict() and get in return eval_ metrics - this is very confusing.

I also find it strange that we use --n_val but eval_
And then predict vs test_.

I don't understand that part. Also predict could be used for test or evaluation, so predict does not mean test.

I suppose from the perspective of the existing trainer like finetune they are the same. But surely this is much less of an issue than val vs eval.

The callbacks are inconsistent too

Could you elaborate? If it's the evaluate vs predict you mentioned, there is a reason. prediction_step is called both in predict and evaluate whereas the on_evaluate is only called at evaluate.

Ah, I see, thank you for clarifying that - then why is there no on_predict to match on_evaluate? I assumed it was the former.

sgugger · 2020-12-10T23:22:44Z

There is no on_predict event because the training loop never calls Trainer.predict. It does however call Trainer.evaluate. I guess we could add the on_predict event that would be called at the end of a Trainer.predict method.

But surely this is much less of an issue than val vs eval.

Could you please clarifying that part? I'm not sure what you mean by this.

I'm not sure how this is similar. I call trainer.predict() and get in return eval_ metrics - this is very confusing.

If we go down that road, trainer.predict should only return predictions and not even the metrics (which we won't do either as it's a bigger breaking change but it would definitely make sense to me). Predict and evaluate do not mean test vs evaluation, it's really a matter of getting the predictions of the model vs evaluating the metrics on a given dataset (which could be train/eval/test).

I can get behind adding a prefix argument to those method that defaults to None and will be used to prefix the metrics. If one is passed, it's used (so it's easier to get the test_ prefix you want and does not require ugly post-processing) otherwise eval_ is used to avoid any breaking changes. Would that work for you?

stas00 · 2020-12-11T01:12:54Z

But surely this is much less of an issue than val vs eval.
Could you please clarifying that part? I'm not sure what you mean by this.

Of course, we have --n_val (mnemonic validation), but then we return eval_(foo|bar) as the metrics for "validation". But see below.

So now that you have further expanded on eval+predict not being correlated to validation+testing (thank you!), I think I'm not approaching the problem in the right way.

Really, then there is no bug with both predict and evaluate returning metrics with eval_-prefixed keys and the bug is really in the end use in finetune_reader.py. Here is what I'm thinking:

It shouldn't be eval_bleu and test_bleu, it should be val_bleu and test_bleu - because these are both evaluation report on 2 different splits so --n_val dataset should lead to val_bleu metrics, and --n_test to test_bleu (not sure of valid or val - probably val to match --n_val)
Ideally that whole eval_ prefix should be removed altogether, since it just has a potential at being confused with val as in validation, and there are no other metrics in that context - the trainer code forcefully adds eval_ to all metrics - but as we said it's not possible to do w/o a breaking change, and it's not really a problem anyway. these are just evaluation metrics - no problem here.
What the interface could use then is getting a split argument which it could prepend to the metrics keys, so if someone is doing evaluation on the validation dataset the metrics will be returned could start with val_eval_.

So if my logic makes sense practically we can either:

leave trainer alone and recode finetune_reader.py to prefix eval_ with the split name - so it'll be val_eval_bleu and test_eval_bleu
add an optional trainer argument split for evaluate and predict and have the trainer arrange the split name prefixed in the metrics as in the option above.

Probably the 1st one is simpler, since it gives the user full flexibility.

The same should happen to the results file naming - we need to choose whether those are (val|test )_results.jsonor (eval|predict)_results.json - and not the currently confusing pair eval_results.json, but test_results.json.

sgugger · 2020-12-11T12:42:33Z

If you're happy with val_eval_bleu and test_eval_bleu, it's fine by me. I'd rather name split prefix in solution 2 unless I understand badly what you mean by it. It's also fine by me and could be a feature other users find useful (if they don't want eval_xxx as names).

stas00 · 2020-12-11T17:24:20Z

If you're happy with val_eval_bleu and test_eval_bleu, it's fine by me. I'd rather name split prefix in solution 2 unless I understand badly what you mean by it. It's also fine by me and could be a feature other users find useful (if they don't want eval_xxx as names).

OK, 3 follow up questions:

I suggested split since it's typically either train|val|test, but prefix works just as well. Except it's unclear then in the function API - prefix to what? metrics_key_prefix?
So we are discussing to optionally prefix eval_bleu, etc. with something and not replace eval_, yes? So the end result is f"{prefix}_eval_bleu", etc.
If so, should the prefix include the separator _ (test_) or just be (test) and trainer will "_".join([prefix, key])? I suppose the latter

What do you think @patrickvonplaten + @patil-suraj? I think @sgugger's priority is the trainer itself, but what do you think about what would be ideal for seq2seq examples domain?

sgugger · 2020-12-11T17:50:15Z

For 1, yes metric_key_prefix sounds better. For 2, I was thinking of replacing the eval_ actually, which goes with 3, the prefix should not have the _.

stas00 · 2020-12-11T18:26:58Z

@sgugger, please have a look - do we want None as the default and use eval in the code or eval in the function signature - I suppose the latter, right? I'm a bit confused here with the optional, having non-None default and keeping the API unbroken. Help?

stas00 · 2020-12-11T18:32:37Z

And so then the only remaining thing that I'm stuck with - do we call the results of evaluate in finetune trainer val or eval? Since we call it test for predict - so confusing. Are those results run on dataset splits and then should be val and test or results on functionality they check and then they should be eval and predict - but predict doesn't work, since the results are evaluation results.

I think they should be val and test since both sets are evaluation results on 2 different splits.

sgugger

Looking good, thanks a lot! I have a few comments but nothing really important.

examples/seq2seq/finetune_trainer.py

src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger

Trying to get clearer, let me know if I managed or if I should just call it a day for today ;-)

examples/seq2seq/finetune_trainer.py

stas00 · 2020-12-11T20:46:10Z

As the comments issue is unrelated to this PR - how about I just let you edit those comments as you think would be the best, @sgugger. Anything you choose works for me. Thank you.

stas00 · 2020-12-14T18:53:50Z

It doesn't look like others are going to review this PR. I didn't want to force anybody by asking to review, just tagging. Is it better to ask for a review explicitly?

@sgugger, please let me know if you still want to adjust unrelated to this PR comments or should I merge it and you will deal with it later.

Thank you!

src/transformers/trainer.py

patrickvonplaten

Left some nits

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

stas00 · 2020-12-14T20:47:14Z

Left some nits

Thank you, @patrickvonplaten!

I went on and removed the optional word in the docs section as well to match the function signature. You haven't suggested I do that, so just want to make sure I did the right thing.

src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 · 2020-12-15T01:39:57Z

I went on and removed the optional word in the docs section as well to match the function signature. You haven't suggested I do that, so just want to make sure I did the right thing.

So that was wrong - thank you for fixing that, @sgugger

So we are removing Optional from the function signature because Optional == Union[..., None] and we have no None here
but we are documenting that the argument is optional to the user

stas00 added 2 commits December 10, 2020 13:49

trainer and finetune_trainer enhancements and fixes

a0a02fd

add fallback default

4592c97

move the fixing of incorrect keys back into finetune trainer

887e5b9

stas00 changed the title ~~[trainer and finetune_trainer] enhancements and fixes~~ [finetune_trainer] enhancements and fixes Dec 10, 2020

stas00 added 3 commits December 11, 2020 10:04

s/eval/val/ to match the split

2f1aca9

trainer can now use a different prefix than eval_ for metrics

072481a

document new arg

d2caee0

sgugger approved these changes Dec 11, 2020

View reviewed changes

stas00 and others added 5 commits December 11, 2020 10:42

Apply suggestions from code review

ed28e07

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

use 'eval' as the default for metric_key_prefix

66f0d42

complete adjust var names + disambiguate

ac6e4e5

fix logger

9c3ad0f

add clarifying comment

fe53fb8

sgugger reviewed Dec 11, 2020

View reviewed changes

examples/seq2seq/finetune_trainer.py Show resolved Hide resolved

examples/seq2seq/finetune_trainer.py Outdated Show resolved Hide resolved

add clarifying comment

06f5ba8

style

625cab6

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 14, 2020

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Dec 14, 2020

View reviewed changes

stas00 and others added 3 commits December 14, 2020 11:58

Apply suggestions from code review

254f773

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/transformers/trainer.py

d381f7b

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

complete removal of optional for metric_key_prefix

a2d988d

Merge remote-tracking branch 'origin/master' into speed

c24d127

sgugger approved these changes Dec 15, 2020

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

Apply suggestions from code review

8fc50c2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 merged commit c19d046 into master Dec 15, 2020

stas00 deleted the speed branch December 15, 2020 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[finetune_trainer] enhancements and fixes #9042

[finetune_trainer] enhancements and fixes #9042

stas00 commented Dec 10, 2020 •

edited

Loading

sgugger commented Dec 10, 2020

stas00 commented Dec 10, 2020 •

edited

Loading

sgugger commented Dec 10, 2020

stas00 commented Dec 10, 2020

sgugger commented Dec 10, 2020

stas00 commented Dec 11, 2020 •

edited

Loading

sgugger commented Dec 11, 2020

stas00 commented Dec 11, 2020 •

edited

Loading

sgugger commented Dec 11, 2020

stas00 commented Dec 11, 2020

stas00 commented Dec 11, 2020 •

edited

Loading

sgugger left a comment

sgugger left a comment

stas00 commented Dec 11, 2020 •

edited

Loading

stas00 commented Dec 14, 2020 •

edited

Loading

patrickvonplaten left a comment

stas00 commented Dec 14, 2020 •

edited

Loading

stas00 commented Dec 15, 2020 •

edited

Loading

[finetune_trainer] enhancements and fixes #9042

[finetune_trainer] enhancements and fixes #9042

Conversation

stas00 commented Dec 10, 2020 • edited Loading

sgugger commented Dec 10, 2020

stas00 commented Dec 10, 2020 • edited Loading

sgugger commented Dec 10, 2020

stas00 commented Dec 10, 2020

sgugger commented Dec 10, 2020

stas00 commented Dec 11, 2020 • edited Loading

sgugger commented Dec 11, 2020

stas00 commented Dec 11, 2020 • edited Loading

sgugger commented Dec 11, 2020

stas00 commented Dec 11, 2020

stas00 commented Dec 11, 2020 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Dec 11, 2020 • edited Loading

stas00 commented Dec 14, 2020 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

stas00 commented Dec 14, 2020 • edited Loading

stas00 commented Dec 15, 2020 • edited Loading

stas00 commented Dec 10, 2020 •

edited

Loading

stas00 commented Dec 10, 2020 •

edited

Loading

stas00 commented Dec 11, 2020 •

edited

Loading

stas00 commented Dec 11, 2020 •

edited

Loading

stas00 commented Dec 11, 2020 •

edited

Loading

stas00 commented Dec 11, 2020 •

edited

Loading

stas00 commented Dec 14, 2020 •

edited

Loading

stas00 commented Dec 14, 2020 •

edited

Loading

stas00 commented Dec 15, 2020 •

edited

Loading