diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3e27528c14..d718736307 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -283,3 +283,56 @@ The deprecation and removal schedule is based on each feature's usage and impact - **Widely-Used Components**: For a feature with high usage, we aim for a more gradual transition period of approximately **5 months**, generally scheduling deprecation around **5 minor releases** after the initial warning. These examples represent the two ends of a continuum. The specific timeline for each feature will be determined individually, balancing innovation with user stability needs. + +### Working with warnings + +Warnings play a critical role in guiding users toward resolving potential issues, but they should be used thoughtfully to avoid unnecessary noise. Unlike logging, which provides informational context or operational details, warnings signal conditions that require attention and action. Overusing warnings can dilute their importance, leading users to ignore them entirely. + +#### Definitions + +- **Correct**: An operation is correct if it is valid, follows the intended approach, and aligns with the current best practices or guidelines within the codebase. This is the recommended or intended way to perform the operation. +- **Supported**: An operation is supported if it is technically valid and works within the current codebase, but it may not be the most efficient, optimal, or recommended way to perform the task. This includes deprecated features or legacy approaches that still work but may be phased out in the future. + +#### Choosing the right message + +- **Correct → No warning**: + If the operation is fully valid and expected, no message should be issued. The system is working as intended, so no warning is necessary. + +- **Correct but deserves attention → No warning, possibly a log message**: + When an operation is correct but uncommon or requires special attention, providing an informational message can be helpful. This keeps users informed without implying any issue. If available, use the logger to output this message. Example: + + ```python + logger.info("This is an informational message about a rare but correct operation.") + ``` + +- **Correct but very likely a mistake → Warning with option to disable**: + In rare cases, you may want to issue a warning for a correct operation that’s very likely a mistake. In such cases, you must provide an option to suppress the warning. This can be done with a flag in the function. Example: + + ```python + def my_function(foo, bar, _warn=True): + if foo == bar: + if _warn: + warnings.warn("foo and bar are the same, this is likely a mistake. Ignore this warning by setting `_warn=False`.") + # Do something + ``` + +- **Supported but not correct → Warning**: + If the operation is technically supported but is deprecated, suboptimal, or could cause future issues (e.g., conflicting arguments), a warning should be raised. This message should be actionable, meaning it must explain how to resolve the issue. Example: + + ```python + def my_function(foo, bar): + if foo and bar: + warnings.warn("Both `foo` and `bar` were provided, but only one is allowed. Ignoring `foo`. Please pass only one of these arguments.") + # Do something + ``` + +- **Not supported → Exception**: + If the operation is invalid or unsupported, raise an exception. This indicates that the operation cannot be performed and requires immediate attention. Example: + + ```python + def my_function(foo, bar): + if foo and bar: + raise ValueError("Both `foo` and `bar` were provided, but only one is allowed. Please pass only one of these arguments.") + ``` + +By following this classification, you ensure that warnings, information, and exceptions are used appropriately, providing clear guidance to the user without cluttering the system with unnecessary messages. diff --git a/docs/source/cpo_trainer.mdx b/docs/source/cpo_trainer.mdx index 587252adf9..3f9fb88cfc 100644 --- a/docs/source/cpo_trainer.mdx +++ b/docs/source/cpo_trainer.mdx @@ -75,7 +75,7 @@ While training and evaluating we record the following reward metrics: ### Simple Preference Optimization (SimPO) -The [SimPO](https://huggingface.co/papers/2405.14734) method is also implemented in the [`CPOTrainer`]. SimPO is an alternative loss that adds a reward margin, allows for length normalization, and does not use BC regularization. To use this loss, we can use SimPO easily by turning on `loss_type="simpo"` and `cpo_alpha=0` in the [`CPOConfig`]. +The [SimPO](https://huggingface.co/papers/2405.14734) method is also implemented in the [`CPOTrainer`]. SimPO is an alternative loss that adds a reward margin, allows for length normalization, and does not use BC regularization. To use this loss, we can use SimPO easily by turning on `loss_type="simpo"` and `cpo_alpha=0.0` in the [`CPOConfig`]. ### CPO-SimPO diff --git a/examples/scripts/reward_modeling.py b/examples/scripts/reward_modeling.py index 073016bc77..ce99964da9 100644 --- a/examples/scripts/reward_modeling.py +++ b/examples/scripts/reward_modeling.py @@ -99,7 +99,8 @@ if model_config.use_peft and model_config.lora_task_type != "SEQ_CLS": warnings.warn( "You are using a `task_type` that is different than `SEQ_CLS` for PEFT. This will lead to silent bugs" - " Make sure to pass --lora_task_type SEQ_CLS when using this script with PEFT." + " Make sure to pass --lora_task_type SEQ_CLS when using this script with PEFT.", + UserWarning, ) ############## diff --git a/trl/core.py b/trl/core.py index bfb23ccd3b..d4e77f5fdc 100644 --- a/trl/core.py +++ b/trl/core.py @@ -296,7 +296,8 @@ def randn_tensor( warnings.warn( f"The passed generator was created on 'cpu' even though a tensor on {device} was expected." f" Tensors will be created on 'cpu' and then moved to {device}. Note that one can probably" - f" slighly speed up this function by passing a generator that was created on the {device} device." + f" slighly speed up this function by passing a generator that was created on the {device} device.", + UserWarning, ) elif gen_device_type != device.type and gen_device_type == "cuda": raise ValueError(f"Cannot generate a {device} tensor from a generator of type {gen_device_type}.") diff --git a/trl/environment/base_environment.py b/trl/environment/base_environment.py index e9c5658d2d..fa7e21f91b 100644 --- a/trl/environment/base_environment.py +++ b/trl/environment/base_environment.py @@ -13,7 +13,6 @@ # limitations under the License. import re -import warnings from typing import Optional import torch @@ -145,8 +144,10 @@ def show_text(self, show_legend=False): Print the text history. """ if not is_rich_available(): - warnings.warn("install rich to display text") - return + raise ImportError( + "The `rich` library is required to display text with formatting. " + "Install it using `pip install rich`." + ) text = Text(self.text) text.stylize(self.prompt_color, self.text_spans[0][0], self.text_spans[1][0]) @@ -167,8 +168,10 @@ def show_tokens(self, tokenizer, show_legend=False): Print the history tokens. """ if not is_rich_available(): - warnings.warn("install rich to display tokens") - return + raise ImportError( + "The `rich` library is required to display tokens with formatting. " + "Install it using `pip install rich`." + ) text = Text() prompt_end = self.token_spans[0][1] @@ -192,8 +195,10 @@ def show_colour_legend(self): Print the colour legend. """ if not is_rich_available(): - warnings.warn("install rich to display colour legend") - return + raise ImportError( + "The `rich` library is required to display colour legends with formatting. " + "Install it using `pip install rich`." + ) text = Text("\n\n(Colour Legend: ") text.append("Prompt", style=self.prompt_color) text.append("|") diff --git a/trl/models/modeling_sd_base.py b/trl/models/modeling_sd_base.py index fbd4fe5b2d..0b729a0009 100644 --- a/trl/models/modeling_sd_base.py +++ b/trl/models/modeling_sd_base.py @@ -808,8 +808,9 @@ def __init__(self, pretrained_model_name: str, *, pretrained_model_revision: str except OSError: if use_lora: warnings.warn( - "If you are aware that the pretrained model has no lora weights to it, ignore this message. " - "Otherwise please check the if `pytorch_lora_weights.safetensors` exists in the model folder." + "Trying to load LoRA weights but no LoRA weights found. Set `use_lora=False` or check that " + "`pytorch_lora_weights.safetensors` exists in the model folder.", + UserWarning, ) self.sd_pipeline.scheduler = DDIMScheduler.from_config(self.sd_pipeline.scheduler.config) diff --git a/trl/trainer/alignprop_config.py b/trl/trainer/alignprop_config.py index 5817c45fe9..0efdcc74ce 100644 --- a/trl/trainer/alignprop_config.py +++ b/trl/trainer/alignprop_config.py @@ -14,11 +14,10 @@ import os import sys -import warnings from dataclasses import dataclass, field from typing import Any, Literal, Optional -from transformers import is_bitsandbytes_available, is_torchvision_available +from transformers import is_bitsandbytes_available from ..core import flatten_dict @@ -139,14 +138,6 @@ def to_dict(self): return flatten_dict(output_dict) def __post_init__(self): - if self.log_with not in ["wandb", "tensorboard"]: - warnings.warn( - "Accelerator tracking only supports image logging if `log_with` is set to 'wandb' or 'tensorboard'." - ) - - if self.log_with == "wandb" and not is_torchvision_available(): - warnings.warn("Wandb image logging requires torchvision to be installed") - if self.train_use_8bit_adam and not is_bitsandbytes_available(): raise ImportError( "You need to install bitsandbytes to use 8bit Adam. " diff --git a/trl/trainer/bco_trainer.py b/trl/trainer/bco_trainer.py index 6b7a8c4a8d..89c2357a19 100644 --- a/trl/trainer/bco_trainer.py +++ b/trl/trainer/bco_trainer.py @@ -394,17 +394,9 @@ def __init__( ref_model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the BCOTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) model = AutoModelForCausalLM.from_pretrained(model, **model_init_kwargs) if isinstance(ref_model, str): - warnings.warn( - "You passed a ref model_id to the BCOTrainer. This will automatically create an " - "`AutoModelForCausalLM`" - ) ref_model = AutoModelForCausalLM.from_pretrained(ref_model, **ref_model_init_kwargs) # Initialize this variable to False. This helps tracking the case when `peft_module_casting_to_bf16` @@ -573,8 +565,11 @@ def make_inputs_require_grad(module, input, output): self.aux_loss_coef = getattr(model.config, "router_aux_loss_coef", 0.0) if self.aux_loss_enabled and self.aux_loss_coef == 0.0: warnings.warn( - "You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0," - " meaning the auxiliary loss will not be used." + "You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to " + "`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value " + "greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary " + "loss.", + UserWarning, ) # Underlying Distribution Matching argument @@ -714,7 +709,6 @@ def make_inputs_require_grad(module, input, output): self.running = RunningMoments(accelerator=self.accelerator) if self.embedding_func is None: - warnings.warn("You did not pass `embedding_func` underlying distribution matching feature is deactivated.") return chosen_embeddings = self._get_sample_prompt_embeddings(desirable, sample_size=self.args.prompt_sample_size) @@ -884,16 +878,12 @@ def _load_optimizer_and_scheduler(self, checkpoint): return # when loading optimizer and scheduler from checkpoint, also load the running delta object. running_file = os.path.join(checkpoint, RUNNING_NAME) - if not os.path.isfile(running_file): - warnings.warn(f"Missing file {running_file}. Will use a new running delta value for BCO loss calculation") - else: + if os.path.isfile(running_file): self.running = RunningMoments.load_from_json(self.accelerator, running_file) if self.match_underlying_distribution: clf_file = os.path.join(checkpoint, CLF_NAME) - if not os.path.isfile(running_file): - warnings.warn(f"Missing file {clf_file}. Will use a new UDM classifier for BCO loss calculation") - else: + if os.path.isfile(running_file): self.clf.set_params(**torch.load(clf_file, weights_only=True, map_location="cpu")) @contextmanager @@ -1278,11 +1268,6 @@ def compute_loss( return_outputs=False, num_items_in_batch=None, ) -> Union[torch.Tensor, tuple[torch.Tensor, dict[str, torch.Tensor]]]: - if not self.use_dpo_data_collator: - warnings.warn( - "compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) compute_loss_context_manager = amp.autocast("cuda") if self._peft_has_been_casted_to_bf16 else nullcontext() with compute_loss_context_manager: @@ -1359,11 +1344,6 @@ def prediction_step( prediction_loss_only: bool, ignore_keys: Optional[list[str]] = None, ): - if not self.use_dpo_data_collator: - warnings.warn( - "prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) if ignore_keys is None: if hasattr(model, "config"): ignore_keys = getattr(model.config, "keys_to_ignore_at_inference", []) diff --git a/trl/trainer/cpo_trainer.py b/trl/trainer/cpo_trainer.py index a1153dec5d..9a068b21ac 100644 --- a/trl/trainer/cpo_trainer.py +++ b/trl/trainer/cpo_trainer.py @@ -144,10 +144,6 @@ def __init__( model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the CPOTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) model = AutoModelForCausalLM.from_pretrained(model, **model_init_kwargs) # Initialize this variable to False. This helps tracking the case when `peft_module_casting_to_bf16` @@ -290,7 +286,9 @@ def make_inputs_require_grad(module, input, output): if args.loss_type in ["hinge", "ipo"] and args.label_smoothing > 0: warnings.warn( - "You are using a loss type that does not support label smoothing. Ignoring label_smoothing parameter." + f"You are using the {args.loss_type} loss type that does not support label smoothing. The " + "`label_smoothing` parameter will be ignored. Set `label_smoothing` to `0.0` to remove this warning.", + UserWarning, ) if args.loss_type == "kto_pair": raise ValueError("Support for kto_pair has been removed in CPOTrainer. Please use KTOTrainer.") @@ -303,19 +301,15 @@ def make_inputs_require_grad(module, input, output): self.aux_loss_coef = getattr(model.config, "router_aux_loss_coef", 0.0) if self.aux_loss_enabled and self.aux_loss_coef == 0.0: warnings.warn( - "You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0," - " meaning the auxiliary loss will not be used." + "You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to " + "`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value " + "greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary " + "loss.", + UserWarning, ) if args.loss_type == "simpo": self.simpo_gamma = args.simpo_gamma - if self.cpo_alpha > 0: - warnings.warn( - "You are using CPO-SimPO method because you set a non-zero cpo_alpha. " - "This will result in the CPO-SimPO method " - "(https://github.com/fe1ixxu/CPO_SIMPO/tree/main). " - "If you want to use a pure SimPO method, please set cpo_alpha to 0." - ) self._stored_metrics = defaultdict(lambda: defaultdict(list)) @@ -845,12 +839,6 @@ def compute_loss( return_outputs=False, num_items_in_batch=None, ) -> Union[torch.Tensor, tuple[torch.Tensor, dict[str, torch.Tensor]]]: - if not self.use_dpo_data_collator: - warnings.warn( - "compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) - compute_loss_context_manager = amp.autocast("cuda") if self._peft_has_been_casted_to_bf16 else nullcontext() with compute_loss_context_manager: @@ -891,11 +879,6 @@ def prediction_step( prediction_loss_only: bool, ignore_keys: Optional[list[str]] = None, ): - if not self.use_dpo_data_collator: - warnings.warn( - "prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) if ignore_keys is None: if hasattr(model, "config"): ignore_keys = getattr(model.config, "keys_to_ignore_at_inference", []) diff --git a/trl/trainer/ddpo_config.py b/trl/trainer/ddpo_config.py index 4ff42312a6..442689be8f 100644 --- a/trl/trainer/ddpo_config.py +++ b/trl/trainer/ddpo_config.py @@ -14,11 +14,10 @@ import os import sys -import warnings from dataclasses import dataclass, field from typing import Literal, Optional -from transformers import is_bitsandbytes_available, is_torchvision_available +from transformers import is_bitsandbytes_available from ..core import flatten_dict @@ -167,14 +166,6 @@ def to_dict(self): return flatten_dict(output_dict) def __post_init__(self): - if self.log_with not in ["wandb", "tensorboard"]: - warnings.warn( - "Accelerator tracking only supports image logging if `log_with` is set to 'wandb' or 'tensorboard'." - ) - - if self.log_with == "wandb" and not is_torchvision_available(): - warnings.warn("Wandb image logging requires torchvision to be installed") - if self.train_use_8bit_adam and not is_bitsandbytes_available(): raise ImportError( "You need to install bitsandbytes to use 8bit Adam. " diff --git a/trl/trainer/dpo_config.py b/trl/trainer/dpo_config.py index dec8e93bc8..ea4a176aa1 100644 --- a/trl/trainer/dpo_config.py +++ b/trl/trainer/dpo_config.py @@ -192,7 +192,8 @@ class DPOConfig(TrainingArguments): def __post_init__(self): if self.max_target_length is not None: warnings.warn( - "The `max_target_length` argument is deprecated in favor of `max_completion_length` and will be removed in a future version.", + "The `max_target_length` argument is deprecated in favor of `max_completion_length` and will be " + "removed in v0.14.", FutureWarning, ) if self.max_completion_length is None: diff --git a/trl/trainer/dpo_trainer.py b/trl/trainer/dpo_trainer.py index 4e9cfb2d66..c1f2776511 100644 --- a/trl/trainer/dpo_trainer.py +++ b/trl/trainer/dpo_trainer.py @@ -211,6 +211,9 @@ def __init__( preprocess_logits_for_metrics: Optional[Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None, peft_config: Optional[dict] = None, ): + if model is None: + raise ValueError("No model provided. Please provide a model to train.") + if not isinstance(model, str) and ref_model is model: raise ValueError( "`model` and `ref_model` cannot be the same object. If you want `ref_model` to be the " @@ -256,17 +259,9 @@ def __init__( ref_model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the DPOTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) model = AutoModelForCausalLM.from_pretrained(model, **model_init_kwargs) if isinstance(ref_model, str): - warnings.warn( - "You passed a ref model_id to the DPOTrainer. This will automatically create an " - "`AutoModelForCausalLM`" - ) ref_model = AutoModelForCausalLM.from_pretrained(ref_model, **ref_model_init_kwargs) # Initialize this variable to False. This helps tracking the case when `peft_module_casting_to_bf16` @@ -340,23 +335,8 @@ def make_inputs_require_grad(module, input, output): " Please install `wandb` to resolve." ) - if model is not None: - self.is_encoder_decoder = model.config.is_encoder_decoder - elif args.is_encoder_decoder is None: - raise ValueError( - "When no model is provided, you need to pass the parameter is_encoder_decoder to the DPOTrainer/DPOConfig." - ) - else: - self.is_encoder_decoder = args.is_encoder_decoder - - if model is not None: - self.is_vision_model = model.config.model_type in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES.keys() - else: - warnings.warn( - "No model provided, cannot determine if it is a vision model. Setting is_vision_model to False." - ) - self.is_vision_model = False - + self.is_encoder_decoder = model.config.is_encoder_decoder + self.is_vision_model = model.config.model_type in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES.keys() self.is_peft_model = is_peft_available() and isinstance(model, PeftModel) self.model_adapter_name = args.model_adapter_name self.ref_adapter_name = args.ref_adapter_name @@ -414,7 +394,9 @@ def make_inputs_require_grad(module, input, output): and args.label_smoothing > 0 ): warnings.warn( - "You are using a loss type that does not support label smoothing. Ignoring label_smoothing parameter." + f"You are using the {args.loss_type} loss type that does not support label smoothing. The " + "`label_smoothing` parameter will be ignored. Set `label_smoothing` to `0.0` to remove this warning.", + UserWarning, ) if args.loss_type == "kto_pair": raise ValueError("Support for kto_pair has been removed in DPOTrainer. Please use KTOTrainer.") @@ -427,8 +409,11 @@ def make_inputs_require_grad(module, input, output): self.aux_loss_coef = getattr(model.config, "router_aux_loss_coef", 0.0) if self.aux_loss_enabled and self.aux_loss_coef == 0.0: warnings.warn( - "You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0," - " meaning the auxiliary loss will not be used." + "You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to " + "`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value " + "greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary " + "loss.", + UserWarning, ) self._stored_metrics = defaultdict(lambda: defaultdict(list)) diff --git a/trl/trainer/gkd_trainer.py b/trl/trainer/gkd_trainer.py index fade180a4e..bc4f35ef33 100644 --- a/trl/trainer/gkd_trainer.py +++ b/trl/trainer/gkd_trainer.py @@ -14,7 +14,6 @@ import os import random import textwrap -import warnings from copy import deepcopy from typing import Any, Callable, Optional, Union @@ -115,10 +114,6 @@ def __init__( ) if isinstance(teacher_model, str): - warnings.warn( - "You passed a teacher model_id to the GKDTrainer. This will automatically create an " - "`AutoModelForCausalLM`" - ) if args.use_liger: teacher_model = AutoLigerKernelForCausalLM.from_pretrained(teacher_model, **teacher_model_init_kwargs) else: diff --git a/trl/trainer/iterative_sft_trainer.py b/trl/trainer/iterative_sft_trainer.py index 76891287c7..d2b02ab33b 100644 --- a/trl/trainer/iterative_sft_trainer.py +++ b/trl/trainer/iterative_sft_trainer.py @@ -123,15 +123,10 @@ def __init__( if data_collator is None: if self.is_encoder_decoder: - warnings.warn( - "No data collator is provided. Using 'DataCollatorForSeq2Seq' with" - "'labels_pad_token_id' set to '-100' and 'pad_to_multiple_of' set to 8." - ) self.data_collator = DataCollatorForSeq2Seq( processing_class, label_pad_token_id=-100, pad_to_multiple_of=8 ) else: - warnings.warn("No data collator is provided. Using 'DataCollatorForLanguageModeling'") self.data_collator = DataCollatorForLanguageModeling(self.processing_class, mlm=False) else: self.data_collator = data_collator @@ -293,7 +288,9 @@ def step( raise ValueError("Step should include `input_ids` or `texts` as keyword arguments.") elif input_ids is not None and texts is not None: warnings.warn( - "Both 'input_ids' and 'texts' are provided. 'input_ids' will be overwritten using inputs provided by the 'texts' keyword argument." + "Both `input_ids` and `texts` argument are provided. `input_ids` will be ignored. " + "Please provide only one of the two.", + UserWarning, ) if labels is None and texts_labels is None and self.is_encoder_decoder: @@ -318,7 +315,6 @@ def step( )["input_ids"] if labels is None: - warnings.warn("No labels are provided. Setting labels to input_ids") labels = input_ids model_inputs = self.prepare_model_inputs(input_ids, attention_mask, labels) diff --git a/trl/trainer/kto_trainer.py b/trl/trainer/kto_trainer.py index 2ef78b05f9..5513cf8d08 100644 --- a/trl/trainer/kto_trainer.py +++ b/trl/trainer/kto_trainer.py @@ -386,17 +386,9 @@ def __init__( ref_model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the KTOTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) model = AutoModelForCausalLM.from_pretrained(model, **model_init_kwargs) if isinstance(ref_model, str): - warnings.warn( - "You passed a ref model_id to the KTOTrainer. This will automatically create an " - "`AutoModelForCausalLM`" - ) ref_model = AutoModelForCausalLM.from_pretrained(ref_model, **ref_model_init_kwargs) # Initialize this variable to False. This helps tracking the case when `peft_module_casting_to_bf16` @@ -574,8 +566,11 @@ def make_inputs_require_grad(module, input, output): self.aux_loss_coef = getattr(model.config, "router_aux_loss_coef", 0.0) if self.aux_loss_enabled and self.aux_loss_coef == 0.0: warnings.warn( - "You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0," - " meaning the auxiliary loss will not be used." + "You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to " + "`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value " + "greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary " + "loss.", + UserWarning, ) # The trainer estimates the number of FLOPs (floating-point operations) using the number of elements in the @@ -1283,11 +1278,6 @@ def compute_loss( return_outputs=False, num_items_in_batch=None, ) -> Union[torch.Tensor, tuple[torch.Tensor, dict[str, torch.Tensor]]]: - if not self.use_dpo_data_collator: - warnings.warn( - "compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) compute_loss_context_manager = amp.autocast("cuda") if self._peft_has_been_casted_to_bf16 else nullcontext() with compute_loss_context_manager: @@ -1365,11 +1355,6 @@ def prediction_step( prediction_loss_only: bool, ignore_keys: Optional[list[str]] = None, ): - if not self.use_dpo_data_collator: - warnings.warn( - "prediction_step is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) if ignore_keys is None: if hasattr(model, "config"): ignore_keys = getattr(model.config, "keys_to_ignore_at_inference", []) diff --git a/trl/trainer/online_dpo_trainer.py b/trl/trainer/online_dpo_trainer.py index 4dbd6a050c..7830d3fe64 100644 --- a/trl/trainer/online_dpo_trainer.py +++ b/trl/trainer/online_dpo_trainer.py @@ -161,7 +161,8 @@ def __init__( if reward_model is not None and judge is not None: warnings.warn( "Both `reward_model` and `judge` are provided. Please choose provide only one of them. " - "Ignoring `judge` and using `reward_model`." + "Ignoring `judge` and using `reward_model`.", + UserWarning, ) judge = None elif reward_model is None and judge is None: diff --git a/trl/trainer/orpo_trainer.py b/trl/trainer/orpo_trainer.py index 3551a7960f..e90fec8dfe 100644 --- a/trl/trainer/orpo_trainer.py +++ b/trl/trainer/orpo_trainer.py @@ -155,10 +155,6 @@ def __init__( model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the ORPOTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) model = AutoModelForCausalLM.from_pretrained(model, **model_init_kwargs) # Initialize this variable to False. This helps tracking the case when `peft_module_casting_to_bf16` @@ -303,8 +299,11 @@ def make_inputs_require_grad(module, input, output): self.aux_loss_coef = getattr(model.config, "router_aux_loss_coef", 0.0) if self.aux_loss_enabled and self.aux_loss_coef == 0.0: warnings.warn( - "You set `output_router_logits` to True in the model config, but `router_aux_loss_coef` is set to 0.0," - " meaning the auxiliary loss will not be used." + "You set `output_router_logits` to `True` in the model config, but `router_aux_loss_coef` is set to " + "`0.0`, meaning the auxiliary loss will not be used. Either set `router_aux_loss_coef` to a value " + "greater than `0.0`, or set `output_router_logits` to `False` if you don't want to use the auxiliary " + "loss.", + UserWarning, ) self._stored_metrics = defaultdict(lambda: defaultdict(list)) @@ -860,12 +859,6 @@ def compute_loss( return_outputs=False, num_items_in_batch=None, ) -> Union[torch.Tensor, tuple[torch.Tensor, dict[str, torch.Tensor]]]: - if not self.use_dpo_data_collator: - warnings.warn( - "compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than " - "DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator" - ) - compute_loss_context_manager = amp.autocast("cuda") if self._peft_has_been_casted_to_bf16 else nullcontext() with compute_loss_context_manager: diff --git a/trl/trainer/reward_trainer.py b/trl/trainer/reward_trainer.py index c76c2461ec..4000697336 100644 --- a/trl/trainer/reward_trainer.py +++ b/trl/trainer/reward_trainer.py @@ -32,7 +32,6 @@ PreTrainedTokenizerBase, ProcessorMixin, Trainer, - TrainingArguments, is_wandb_available, ) from transformers.trainer_callback import TrainerCallback @@ -137,26 +136,10 @@ def __init__( peft_config (`dict`, defaults to `None`): The PEFT configuration to use for training. If you pass a PEFT configuration, the model will be wrapped in a PEFT model. """ - if type(args) is TrainingArguments: - warnings.warn( - "Using `transformers.TrainingArguments` for `args` is deprecated and will be removed in a future version. Please use `RewardConfig` instead.", - FutureWarning, + if max_length is not None and args.max_length is not None: + raise ValueError( + "You cannot specify both `max_length` and `args.max_length`. Please use the `RewardConfig` to set `max_length` once." ) - if max_length is not None: - warnings.warn( - "The `max_length` argument is deprecated and will be removed in a future version. Please use the `RewardConfig` to set `max_length` instead.", - FutureWarning, - ) - else: - if max_length is not None and args.max_length is not None: - raise ValueError( - "You cannot specify both `max_length` and `args.max_length`. Please use the `RewardConfig` to set `max_length` once." - ) - if max_length is not None and args.max_length is None: - warnings.warn( - "The `max_length` argument is deprecated and will be removed in a future version. Please use the `RewardConfig` to set `max_length` instead.", - FutureWarning, - ) if not is_peft_available() and peft_config is not None: raise ValueError( "PEFT is not installed and you passed a `peft_config` in the trainer's kwargs, please install it to use the PEFT models" @@ -173,7 +156,8 @@ def __init__( if not _supports_gc_kwargs and args.gradient_checkpointing_kwargs is not None: warnings.warn( "You passed `gradient_checkpointing_kwargs` in the trainer's kwargs, but your peft version does not support it. " - "please update to the latest version of peft to use `gradient_checkpointing_kwargs`." + "please update to the latest version of peft to use `gradient_checkpointing_kwargs`.", + UserWarning, ) elif _supports_gc_kwargs and args.gradient_checkpointing_kwargs is not None: prepare_model_kwargs["gradient_checkpointing_kwargs"] = args.gradient_checkpointing_kwargs @@ -191,7 +175,7 @@ def __init__( "A processing_class must be specified when using the default RewardDataCollatorWithPadding" ) if max_length is None: - max_length = 512 if type(args) is TrainingArguments or args.max_length is None else args.max_length + max_length = 512 if args.max_length is None else args.max_length data_collator = RewardDataCollatorWithPadding(processing_class) @@ -281,12 +265,6 @@ def compute_loss( return_outputs=False, num_items_in_batch=None, ) -> Union[torch.Tensor, tuple[torch.Tensor, dict[str, torch.Tensor]]]: - if not self.use_reward_data_collator: - warnings.warn( - "The current compute_loss is implemented for RewardDataCollatorWithPadding," - " if you are using a custom data collator make sure you know what you are doing or" - " implement your own compute_loss method." - ) rewards_chosen = model( input_ids=inputs["input_ids_chosen"], attention_mask=inputs["attention_mask_chosen"], diff --git a/trl/trainer/sft_trainer.py b/trl/trainer/sft_trainer.py index dfc85b4dde..34a95f36ec 100644 --- a/trl/trainer/sft_trainer.py +++ b/trl/trainer/sft_trainer.py @@ -128,9 +128,7 @@ def __init__( formatting_func: Optional[Callable] = None, ): if args is None: - output_dir = "tmp_trainer" - warnings.warn(f"No `SFTConfig` passed, using `output_dir={output_dir}`.") - args = SFTConfig(output_dir=output_dir) + args = SFTConfig(output_dir="tmp_trainer") elif args is not None and args.__class__.__name__ == "TrainingArguments": args_as_dict = args.to_dict() # Manually copy token values as TrainingArguments.to_dict() redacts them @@ -155,10 +153,6 @@ def __init__( model_init_kwargs["torch_dtype"] = torch_dtype if isinstance(model, str): - warnings.warn( - "You passed a model_id to the SFTTrainer. This will automatically create an " - "`AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you." - ) if args.use_liger: model = AutoLigerKernelForCausalLM.from_pretrained(model, **model_init_kwargs) else: @@ -245,10 +239,6 @@ def make_inputs_require_grad(module, input, output): # to overcome some issues with broken tokenizers args.max_seq_length = min(processing_class.model_max_length, 1024) - warnings.warn( - f"You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to {args.max_seq_length}" - ) - self.dataset_num_proc = args.dataset_num_proc self.dataset_batch_size = args.dataset_batch_size @@ -306,8 +296,10 @@ def make_inputs_require_grad(module, input, output): if processing_class.padding_side is not None and processing_class.padding_side != "right": warnings.warn( - "You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to " - "overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code." + "You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might " + "lead to some unexpected behaviour due to overflow issues when training a model in half-precision. " + "You might consider adding `processing_class.padding_side = 'right'` to your code.", + UserWarning, ) super().__init__( @@ -330,9 +322,6 @@ def make_inputs_require_grad(module, input, output): if self.train_dataset is not None: if self.args.max_steps > 0 and args.packing: - warnings.warn( - "You passed `packing=True` to the SFTTrainer/SFTConfig, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached." - ) self.train_dataset.infinite = True elif self.args.max_steps == -1 and args.packing: self.train_dataset.infinite = False @@ -366,7 +355,10 @@ def _prepare_dataset( if column_names and "input_ids" in column_names: if formatting_func is not None: warnings.warn( - "You passed a dataset that is already processed (contains an `input_ids` field) together with a valid formatting function. Therefore `formatting_func` will be ignored." + "You passed a dataset that is already processed (contains an `input_ids` field) together with a " + "valid formatting function. Therefore `formatting_func` will be ignored. Either remove the " + "`formatting_func` or pass a dataset that is not already processed.", + UserWarning, ) def formatting_func(x): @@ -444,8 +436,11 @@ def tokenize(element): if not remove_unused_columns and len(extra_columns) > 0: warnings.warn( - "You passed `remove_unused_columns=False` on a non-packed dataset. This might create some issues with the default collator and yield to errors. If you want to " - f"inspect dataset other columns (in this case {extra_columns}), you can subclass `DataCollatorForLanguageModeling` in case you used the default collator and create your own data collator in order to inspect the unused dataset columns." + "You passed `remove_unused_columns=False` on a non-packed dataset. This might create some issues with " + "the default collator and yield to errors. If you want to inspect dataset other columns (in this " + f"case {extra_columns}), you can subclass `DataCollatorForLanguageModeling` in case you used the " + "default collator and create your own data collator in order to inspect the unused dataset columns.", + UserWarning, ) map_kwargs = { diff --git a/trl/trainer/utils.py b/trl/trainer/utils.py index 420e3abc78..d1cc3a0e9d 100644 --- a/trl/trainer/utils.py +++ b/trl/trainer/utils.py @@ -136,7 +136,8 @@ def __init__( "The pad_token_id and eos_token_id values of this tokenizer are identical. " "If you are planning for multi-turn training, " "it can result in the model continuously generating questions and answers without eos token. " - "To avoid this, set the pad_token_id to a different value." + "To avoid this, set the pad_token_id to a different value.", + UserWarning, ) self.ignore_index = ignore_index @@ -159,10 +160,10 @@ def torch_call(self, examples: list[Union[list[int], Any, dict[str, Any]]]) -> d if response_token_ids_start_idx is None: warnings.warn( - f"Could not find response key `{self.response_template}` in the " - f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} ' - f"This instance will be ignored in loss calculation. " - f"Note, if this happens often, consider increasing the `max_seq_length`." + f"Could not find response key `{self.response_template}` in the following instance: " + f"{self.tokenizer.decode(batch['input_ids'][i])}. This instance will be ignored in loss " + "calculation. Note, if this happens often, consider increasing the `max_seq_length`.", + UserWarning, ) batch["labels"][i, :] = self.ignore_index else: @@ -186,10 +187,10 @@ def torch_call(self, examples: list[Union[list[int], Any, dict[str, Any]]]) -> d if len(response_token_ids_idxs) == 0: warnings.warn( - f"Could not find response key `{self.response_template}` in the " - f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} ' - f"This instance will be ignored in loss calculation. " - f"Note, if this happens often, consider increasing the `max_seq_length`." + f"Could not find response key `{self.response_template}` in the following instance: " + f"{self.tokenizer.decode(batch['input_ids'][i])}. This instance will be ignored in loss " + "calculation. Note, if this happens often, consider increasing the `max_seq_length`.", + UserWarning, ) batch["labels"][i, :] = self.ignore_index @@ -201,10 +202,10 @@ def torch_call(self, examples: list[Union[list[int], Any, dict[str, Any]]]) -> d if len(human_token_ids_idxs) == 0: warnings.warn( - f"Could not find instruction key `{self.instruction_template}` in the " - f'following instance: {self.tokenizer.decode(batch["input_ids"][i])} ' - f"This instance will be ignored in loss calculation. " - f"Note, if this happens often, consider increasing the `max_seq_length`." + f"Could not find instruction key `{self.instruction_template}` in the following instance: " + f"{self.tokenizer.decode(batch['input_ids'][i])}. This instance will be ignored in loss " + "calculation. Note, if this happens often, consider increasing the `max_seq_length`.", + UserWarning, ) batch["labels"][i, :] = self.ignore_index @@ -592,13 +593,6 @@ def __init__( add_special_tokens=True, ): self.tokenizer = tokenizer - - if tokenizer.eos_token_id is None: - warnings.warn( - "The passed tokenizer does not have an EOS token. We will use the passed eos_token_id instead which corresponds" - f" to {eos_token_id}. If this is not the correct EOS token, make sure to pass the correct eos_token_id." - ) - self.concat_token_id = tokenizer.eos_token_id if tokenizer.eos_token_id else eos_token_id self.dataset = dataset self.seq_length = seq_length @@ -612,7 +606,8 @@ def __init__( if dataset_text_field is not None and formatting_func is not None: warnings.warn( "Only one of `dataset_text_field` and `formatting_func` should be provided. " - "Ignoring `dataset_text_field` and using `formatting_func`." + "Ignoring `dataset_text_field` and using `formatting_func`.", + UserWarning, ) if formatting_func is not None: @@ -622,12 +617,6 @@ def __init__( else: # neither is provided raise ValueError("Either `dataset_text_field` or `formatting_func` should be provided.") - if formatting_func is not None: - if formatting_func.__code__.co_argcount > 1: - warnings.warn( - "The passed formatting_func has more than one argument. Usually that function should have a single argument `example`" - " which corresponds to the dictionary returned by each element of the dataset. Make sure you know what you are doing." - ) self.pretokenized = False column_names = ( dataset.column_names if isinstance(dataset, (datasets.Dataset, datasets.IterableDataset)) else None @@ -654,7 +643,6 @@ def __iter__(self): except StopIteration: if self.infinite: iterator = iter(self.dataset) - warnings.warn("The dataset reached end and the iterator is reset to the start.") else: more_examples = False break @@ -771,9 +759,12 @@ def compute_accuracy(eval_pred) -> dict[str, float]: predictions, labels = eval_pred # Here, predictions is rewards_chosen and rewards_rejected. # We want to see how much of the time rewards_chosen > rewards_rejected. - if np.array(predictions[:, 0] == predictions[:, 1], dtype=float).sum() > 0: + equal_predictions_count = np.array(predictions[:, 0] == predictions[:, 1], dtype=float).sum() + if equal_predictions_count > 0: warnings.warn( - f"There are {np.array(predictions[:, 0] == predictions[:, 1]).sum()} out of {len(predictions[:, 0])} instances where the predictions for both options are equal. As a consequence the accuracy can be misleading." + f"There are {equal_predictions_count} out of {len(predictions[:, 0])} instances where the predictions for " + "both options are equal. As a consequence the accuracy can be misleading.", + UserWarning, ) predictions = np.argmax(predictions, axis=1)