Support `logging_ratio`, `save_ratio`, and `eval_ratio` (like for `warmup_ratio`) #23171

konstantinjdobler · 2023-05-05T14:50:01Z

Feature request

I would love if TrainingArguments and the Huggingface Trainer would support logging_ratio, save_ratio, and eval_ratio arguments (complementing logging_steps, save_steps, and eval_steps). If the *_ratio argument is set to e.g. 0.1, logging/saving/eval would be done every 0.1 * total_training_steps. This is already done for warmup_ratio and warmup_steps.

Motivation

When dealing with many different tasks and datasets, it can be frustrating to have to calculate different appropriate logging_steps etc. for each individual dataset. This proposal would enable a unified, simple and concise way to solve this problem.

Your contribution

I realize this might not be trivial to fully integrate, but hopefully, we can take warmup_steps and warmup_ratio as a reference. Depending on how deep the required changes are, I can also submit a PR (with some pointers on what to look out for).

The text was updated successfully, but these errors were encountered:

sgugger · 2023-05-05T15:50:22Z

We already have 96 training arguments though, and that would make three more of all users to learn :-/

konstantinjdobler · 2023-05-05T15:56:40Z

Another option would be to allow float inputs for logging_steps, save_steps, eval_steps and interpret all inputs <1.0 as a ratio (and throw an error if inputs >1.0 are not integers).

But yes there is a tradeoff with too much complexity

sgugger · 2023-05-05T16:24:44Z

I would prefer that solution actually, even if the naming is not perfect.

konstantinjdobler · 2023-05-06T20:02:28Z

Going over the code a bit, it seems like we would have to wait until after this if statement in _inner_training_loop to set the correct values based on the max_steps that gets calculated there + some additional guards in the __post_init__ of the TrainingArguments.

transformers/src/transformers/trainer.py

Lines 1683 to 1711 in 04ab560

    
           if has_length(train_dataloader): 
        
               len_dataloader = len(train_dataloader) 
        
               num_update_steps_per_epoch = len_dataloader // args.gradient_accumulation_steps 
        
               num_update_steps_per_epoch = max(num_update_steps_per_epoch, 1) 
        
               num_examples = self.num_examples(train_dataloader) 
        
               if args.max_steps > 0: 
        
                   max_steps = args.max_steps 
        
                   num_train_epochs = args.max_steps // num_update_steps_per_epoch + int( 
        
                       args.max_steps % num_update_steps_per_epoch > 0 
        
                   ) 
        
                   # May be slightly incorrect if the last batch in the training dataloader has a smaller size but it's 
        
                   # the best we can do. 
        
                   num_train_samples = args.max_steps * total_train_batch_size 
        
               else: 
        
                   max_steps = math.ceil(args.num_train_epochs * num_update_steps_per_epoch) 
        
                   num_train_epochs = math.ceil(args.num_train_epochs) 
        
                   num_train_samples = self.num_examples(train_dataloader) * args.num_train_epochs 
        
           elif args.max_steps > 0:  # Rely on max_steps when dataloader does not have a working size 
        
               max_steps = args.max_steps 
        
               # Setting a very large number of epochs so we go as many times as necessary over the iterator. 
        
               num_train_epochs = sys.maxsize 
        
               num_update_steps_per_epoch = max_steps 
        
               num_examples = total_train_batch_size * args.max_steps 
        
               num_train_samples = args.max_steps * total_train_batch_size 
        
           else: 
        
               raise ValueError( 
        
                   "args.max_steps must be set to a positive value if dataloader does not have a length, was" 
        
                   f" {args.max_steps}" 
        
               )

At first glance, it looks like the setting of when things get logged/saved/evaluated in DefaultFlowCallback should work out-of-the-box with this change.
I'm willing to contribute the changes once I find some time, does the general plan sound reasonable?

sgugger · 2023-05-06T22:13:48Z

Yes it does. I'll be looking forward to your PR!

lava18 · 2024-02-05T22:24:19Z

Can someone clarify if this ratio means the % of training steps per epoch or the total training steps? If the latter, how do we know preemptively the total number of epochs (or total number of training steps) that the model is going to train for?

konstantinjdobler mentioned this issue May 9, 2023

Support ratios for logging_steps, eval_steps, and save_steps #23235

Merged

5 tasks

sgugger closed this as completed in #23235 May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `logging_ratio`, `save_ratio`, and `eval_ratio` (like for `warmup_ratio`) #23171

Support `logging_ratio`, `save_ratio`, and `eval_ratio` (like for `warmup_ratio`) #23171

konstantinjdobler commented May 5, 2023 •

edited

Loading

sgugger commented May 5, 2023

konstantinjdobler commented May 5, 2023 •

edited

Loading

sgugger commented May 5, 2023

konstantinjdobler commented May 6, 2023

sgugger commented May 6, 2023

lava18 commented Feb 5, 2024

Support logging_ratio, save_ratio, and eval_ratio (like for warmup_ratio) #23171

Support logging_ratio, save_ratio, and eval_ratio (like for warmup_ratio) #23171

Comments

konstantinjdobler commented May 5, 2023 • edited Loading

Feature request

Motivation

Your contribution

sgugger commented May 5, 2023

konstantinjdobler commented May 5, 2023 • edited Loading

sgugger commented May 5, 2023

konstantinjdobler commented May 6, 2023

sgugger commented May 6, 2023

lava18 commented Feb 5, 2024

Support `logging_ratio`, `save_ratio`, and `eval_ratio` (like for `warmup_ratio`) #23171

Support `logging_ratio`, `save_ratio`, and `eval_ratio` (like for `warmup_ratio`) #23171

konstantinjdobler commented May 5, 2023 •

edited

Loading

konstantinjdobler commented May 5, 2023 •

edited

Loading