[Trainer] add --max_train_samples --max_val_samples --max_test_samples #10437

stas00 · 2021-02-27T17:23:58Z

As we were planning to add --max_train_samples --max_val_samples --max_test_samples to all examples #10423, I thought is there any reason why we don't expand the Trainer to handle that?

It surely would be useful to be able to truncate the dataset at the point of Trainer to enable quick testing.

Another plus is that the metrics can then automatically include the actual number of samples run, rather than how it is done at the moment in examples.

That way this functionality would be built-in and examples will get it for free.

TODO:

port --max_train_samples --max_val_samples --max_test_samples to Trainer and remove the then unneeded code in run_seq2seq.py
extend metrics to report the number of samples as it's done now in:

transformers/examples/seq2seq/run_seq2seq.py

Line 590 in aca6288

metrics["train_samples"] = min(max_train_samples, len(train_dataset))

so that all scripts automatically get this metric reported. Most likely it should be done here:

transformers/src/transformers/trainer_utils.py

Line 224 in aca6288

def speed_metrics(split, start_time, num_samples=None):

@sgugger

The text was updated successfully, but these errors were encountered:

sgugger · 2021-02-27T23:59:31Z

Yes, that would be a nice refactor in Trainer! I believe this can be done when we create the dataloaders, to keep the original datasets untouched.

stas00 · 2021-02-28T00:09:12Z

@bhadreshpsavani, would you like to try this slightly more complex task?

Step 1 is to take run_seq2seq.py and move the functionality that handles --max_train_samples --max_val_samples --max_test_samples (args and logic) to training_args.py and trainer.py correspondingly. And to ensure it works the same way. Please see @sgugger's note above to where to move the logic to.

Step 2 - no step, every other script should just work with these now-Trainer-level cl args.

and then later it'd be great to have the metrics updated with the actual number of samples run, like it's done manually right now in run_seq2seq.py - I added the full details to the OP.

bhadreshpsavani · 2021-02-28T03:54:32Z

Ya @stas00,
I can do that

stas00 · 2021-02-28T04:08:32Z

Awesome! Please don't hesitate to ask question if you run into any uncertainties.

Thank you!

patil-suraj · 2021-03-01T09:30:37Z

I agree with the proposed solution here. But we pre-process the datasets in scripts before passing them to the Trainer. Now if I just want to use say a 100 validation examples, the script would unnecessary process the whole dataset and then Trainer will drop the extra examples.

bhadreshpsavani · 2021-03-01T09:52:33Z

Hi @patil-suraj,
You are right about that
I was thinking to add the below code here

if data_args.max_train_samples is not None:
    train_dataset = train_dataset.select(range(data_args.max_train_samples))

But ya, it will select sample from processed dataset only

sgugger · 2021-03-01T14:19:27Z

Ah, in that case we sadly can't really add this in the Trainer as it would be inefficient. I was also thinking more and having the functionality in Trainer will require us to support all kinds of datasets (even iterable datasets) and it's going to be very painful, so I think we should just copy the code in all the scripts.

bhadreshpsavani · 2021-03-01T14:31:08Z

Hi @sgugger,
Shall we add this arguments --max_train_samples --max_val_samples --max_test_samples and code accordingly in all the scripts like implemented in run_seq2seq.py?

sgugger · 2021-03-01T14:35:05Z

Yes, I think it's the best solution.

bhadreshpsavani · 2021-03-02T17:48:48Z

Hi @stas00,
I was going through the code of run_seq2seq and trying to make changes in other scripts
I came across result={}. We are not storing anything inside this dictionary and we are returning it at the end as an empty dictionary. Is it necessary?

In other scripts like run_qa.py we are using results instead of metrics in eval and test section. Should we unify this behaviour in the scripts? I mean either use metrics like in run_seq2seq or use results like other scripts

I also want to ask that in many scripts we are only doing train and eval things, Not test/predict things. Should we also create a separate issue for that? we might not be adding --max_test_samples since we are not doing testing in the script?

sgugger · 2021-03-02T18:06:01Z

The test thing should definitely have its separate issue (and --max_test_samples can be added when the test/predict is added for those scripts).

stas00 · 2021-03-02T18:08:02Z

Good analysis, @bhadreshpsavani!

I was going through the code of run_seq2seq and trying to make changes in other scripts
I came across result={}. We are not storing anything inside this dictionary and we are returning it at the end as an empty dictionary. Is it necessary?

That was invalid porting of the original. As you can see the original aggregated all the metrics and returned them:

transformers/examples/legacy/seq2seq/finetune_trainer.py

Line 299 in b013842

all_metrics = {}

In other scripts like run_qa.py we are using results instead of metrics in eval and test section. Should we unify this behaviour in the scripts? I mean either use metrics like in run_seq2seq or use results like other scripts

I think all these return metrics from main were mainly used for testing. But since we save all the metrics to the disc, this isn't necessarily needed as the data is already accessible - changing this may impact some tests which would need to be adjusted to use the metric dumps.

Alternatively, we could have the trainer also store all the metrics not only on the disc but internally, and so the last command of main in all scripts could be:

return trainer.metrics()

@sgugger, what do you think - should we just not return anything from main in all example scripts or always return all metrics and then tweak the trainer to store all the metrics internally and have an method to return them?

I also want to ask that in many scripts we are only doing train and eval things, Not test/predict things. Should we also create a separate issue for that? we might not be adding --max_test_samples since we are not doing testing in the script?

Another good catch. Probably for now just skip --max_test_samples in those scripts, but meanwhile I raised your question here: #10482

stas00 · 2021-03-02T18:10:41Z

The test thing should definitely have its separate issue (and --max_test_samples can be added when the test/predict is added for those scripts).

Filed: #10482

It might be easier to sort out test/predict first then, as it'd make the copy-n-paste of all 3 cl arg flags easier. But either way works.

sgugger · 2021-03-02T18:13:41Z

The metrics returned are mainly for the tests AFAIK, so we can remove that behavior if the tests are all adapted to load the file where the metrics are stored.

stas00 · 2021-03-02T18:17:07Z

OK, let's sync everything then to remove the inconsistent return metrics. Just please be aware that the example tests examples/test_examples.py will need to be adapted, e.g. currently they rely on the return value from main:

transformers/examples/test_examples.py

Lines 100 to 102 in b013842

    
           with patch.object(sys, "argv", testargs): 
        
               result = run_glue.main() 
        
               self.assertGreaterEqual(result["eval_accuracy"], 0.75)

So instead should write a wrapper to load the metrics from the filesystem and test that.

stas00 · 2021-03-02T19:08:30Z

Just to make sure my mentioning of a wrapper wasn't ambiguous:

For models and examples we are trying to be as explicit as possible to help the users understand what is going on in the code - so avoiding refactoring and duplicating code where it is needed. Unless we can make something a functional method in Trainer and then all the noise can be abstracted away, especially for code that's really just formatting.

For tests it's the software engineering as normal, refactoring is important as it helps minimize hard to maintain code and avoid errors. So there let's not duplicate any code like reading the json file from the filesystem.

bhadreshpsavani · 2021-03-04T08:59:40Z

Hi @stas00,
I could not figure out the code or implementation for a wrapper for loading all metrics for testing in test_exmaples.py. We are storing in the file system based on an argument output_dir which is accessible for trainer object. I don't how to access the trainer object for individual script in test_exmaples.py.

In trainer, we can write code for loading the metrics but to access the trainer in test_examples.py that I couldn't figure out.

Another thing if we use all_metrics={} to store all the metrics of train, test, and validation, we can save it once as all_results.json like legacy code, right?

Sorry keep asking multiple questions, Once these things are clear then implementation and testing will not take much time

stas00 · 2021-03-04T16:38:46Z

Hi @stas00,
I could not figure out the code or implementation for a wrapper for loading all metrics for testing in test_examples.py. We are storing in the file system based on an argument output_dir which is accessible for trainer object. I don't how to access the trainer object for individual script in test_exmaples.py.

You have the output_dir tmp_dir here:

transformers/examples/test_examples.py

Line 78 in 805c520

tmp_dir = self.get_auto_remove_tmp_dir()

so you just load the f"{tmp_dir}/all_results.json" file right after this line:

transformers/examples/test_examples.py

Line 101 in 805c520

result = run_glue.main()

That's it - You have the metrics to test on the following line ;)

Another thing if we use all_metrics={} to store all the metrics of train, test, and validation, we can save it once as all_results.json like legacy code, right?

I already changed the code to save all_results.json auto-magically in trainer.save_metrics - make sure you rebased your branch

transformers/src/transformers/trainer_pt_utils.py

Lines 651 to 661 in 805c520

    
           if combined: 
        
               path = os.path.join(self.args.output_dir, "all_results.json") 
        
               if os.path.exists(path): 
        
                   with open(path, "r") as f: 
        
                       all_metrics = json.load(f) 
        
               else: 
        
                   all_metrics = {} 
        
               all_metrics.update(metrics) 
        
               with open(path, "w") as f: 
        
                   json.dump(all_metrics, f, indent=4, sort_keys=True)

Sorry keep asking multiple questions, Once these things are clear then implementation and testing will not take much time

On the contrary, please don't hesitate to ask any questions. It takes quite some time to find one's way in this complex massive code base.

bhadreshpsavani · 2021-03-04T17:59:19Z

Hi @stas00,

Below two lines in run_ner.py don't seem much meaningful since metrics does not represent any details from test/prediction.

trainer.log_metrics("test", metrics)
trainer.save_metrics("test", metrics)

stas00 · 2021-03-04T18:07:10Z

Why do you suggest it's not meaningful?

metrics gets set here:

transformers/examples/token-classification/run_ner.py

Line 412 in 6290169

predictions, labels, metrics = trainer.predict(test_dataset)

bhadreshpsavani · 2021-03-05T04:42:55Z

oooh, I didn't notice it!
I thought it is taking earlier metrics
Thanks

bhadreshpsavani · 2021-03-05T11:36:19Z

Hello @stas00 and @sgugger,

I have made changes for adding three arguments in PyTorch-based scripts. It's working as expected. I also modified test_examples.py accordingly.

For TensorFlow-based scripts, I am facing issues while running the script even in colab directly from the master branch without any changes. I create an issue for the same.

We have four run_tf_*.py files :

run_tf_multiple_choice.py (Reported Issue)
run_tf_squad.py (Reported Issue)
run_tf_glue.py (Got AttributeError: 'TFTrainer' object has no attribute 'log_metrics')
run_tf_text_classification.py

Based on the error in the third file, trainer.py only for PyTorch based model, and trainer_tf.py is for tf based model. In that case do we need to write save_metrics() and log_metrics() for trainer_tf.py, right? In the last pull request, I could not test the changes for TF Script but I will fix that mistake in this PR.

Do we need to add test_script for this TensorFlow files, currently, we only have PyTorch-based scripts in the test_examples.py?

sgugger · 2021-03-05T12:52:42Z

Please don't touch the TF examples as they have not been cleaned up and will change in the near future. And yes, none of the TF examples are currently tested.

stas00 mentioned this issue Feb 27, 2021

[examples] add --max_train_samples --max_val_samples --max_test_samples cl args to all scripts #10423

Closed

stas00 assigned bhadreshpsavani Feb 28, 2021

stas00 added the trainer label Feb 28, 2021

stas00 mentioned this issue Mar 2, 2021

[examples] should all examples support the predict stage? #10482

Closed

bhadreshpsavani mentioned this issue Mar 5, 2021

Added max_sample_ arguments #10551

Merged

5 tasks

LysandreJik closed this as completed in #10551 Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Trainer] add --max_train_samples --max_val_samples --max_test_samples #10437

[Trainer] add --max_train_samples --max_val_samples --max_test_samples #10437

stas00 commented Feb 27, 2021 •

edited

Loading

sgugger commented Feb 27, 2021

stas00 commented Feb 28, 2021 •

edited

Loading

bhadreshpsavani commented Feb 28, 2021

stas00 commented Feb 28, 2021

patil-suraj commented Mar 1, 2021

bhadreshpsavani commented Mar 1, 2021 •

edited

Loading

sgugger commented Mar 1, 2021

bhadreshpsavani commented Mar 1, 2021 •

edited

Loading

sgugger commented Mar 1, 2021

bhadreshpsavani commented Mar 2, 2021 •

edited

Loading

sgugger commented Mar 2, 2021

stas00 commented Mar 2, 2021 •

edited

Loading

stas00 commented Mar 2, 2021

sgugger commented Mar 2, 2021

stas00 commented Mar 2, 2021 •

edited

Loading

stas00 commented Mar 2, 2021

bhadreshpsavani commented Mar 4, 2021 •

edited

Loading

stas00 commented Mar 4, 2021 •

edited

Loading

bhadreshpsavani commented Mar 4, 2021

stas00 commented Mar 4, 2021

bhadreshpsavani commented Mar 5, 2021 •

edited

Loading

bhadreshpsavani commented Mar 5, 2021 •

edited

Loading

sgugger commented Mar 5, 2021

[Trainer] add --max_train_samples --max_val_samples --max_test_samples #10437

[Trainer] add --max_train_samples --max_val_samples --max_test_samples #10437

Comments

stas00 commented Feb 27, 2021 • edited Loading

sgugger commented Feb 27, 2021

stas00 commented Feb 28, 2021 • edited Loading

bhadreshpsavani commented Feb 28, 2021

stas00 commented Feb 28, 2021

patil-suraj commented Mar 1, 2021

bhadreshpsavani commented Mar 1, 2021 • edited Loading

sgugger commented Mar 1, 2021

bhadreshpsavani commented Mar 1, 2021 • edited Loading

sgugger commented Mar 1, 2021

bhadreshpsavani commented Mar 2, 2021 • edited Loading

sgugger commented Mar 2, 2021

stas00 commented Mar 2, 2021 • edited Loading

stas00 commented Mar 2, 2021

sgugger commented Mar 2, 2021

stas00 commented Mar 2, 2021 • edited Loading

stas00 commented Mar 2, 2021

bhadreshpsavani commented Mar 4, 2021 • edited Loading

stas00 commented Mar 4, 2021 • edited Loading

bhadreshpsavani commented Mar 4, 2021

stas00 commented Mar 4, 2021

bhadreshpsavani commented Mar 5, 2021 • edited Loading

bhadreshpsavani commented Mar 5, 2021 • edited Loading

sgugger commented Mar 5, 2021

stas00 commented Feb 27, 2021 •

edited

Loading

stas00 commented Feb 28, 2021 •

edited

Loading

bhadreshpsavani commented Mar 1, 2021 •

edited

Loading

bhadreshpsavani commented Mar 1, 2021 •

edited

Loading

bhadreshpsavani commented Mar 2, 2021 •

edited

Loading

stas00 commented Mar 2, 2021 •

edited

Loading

stas00 commented Mar 2, 2021 •

edited

Loading

bhadreshpsavani commented Mar 4, 2021 •

edited

Loading

stas00 commented Mar 4, 2021 •

edited

Loading

bhadreshpsavani commented Mar 5, 2021 •

edited

Loading

bhadreshpsavani commented Mar 5, 2021 •

edited

Loading