Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorators for deprecation and named arguments validation #30799

Merged
merged 56 commits into from
Jun 10, 2024

Conversation

qubvel
Copy link
Member

@qubvel qubvel commented May 14, 2024

What does this PR do?

Introduces 2 decorators for a better approach to deprecate and validate named arguments.

Deprecation decorator

Function or method decorator to notify users about deprecated keyword arguments, replacing them with a new name if specified.

This decorator allows you to:

  • Notify users when a keyword argument is deprecated.
  • Automatically replace deprecated keyword arguments with new ones.
  • Raise an error if deprecated arguments are used, depending on the specified conditions.

By default, the decorator notifies the user about the deprecated argument while the transformers.__version__ < specified version in the decorator. To keep notifications with any version warn_if_greater_or_equal_version=True can be set.

Example usage with renaming argument:

from transformers.utils.deprecation import deprecate_kwarg

@deprecate_kwarg("reduce_labels", new_name="do_reduce_labels", version="6.0.0")
def my_function(do_reduce_labels):
    print(do_reduce_labels)

my_function(reduce_labels=True)  # Will show a deprecation warning and use do_reduce_labels=True
FutureWarning: `reduce_labels` is deprecated and will be removed in version 6.0.0 for `my_function`. Use `do_reduce_labels` instead.

Example when both deprecated and new arguments were specified for function call:

from transformers.utils.deprecation import deprecate_kwarg

@deprecate_kwarg("reduce_labels", new_name="do_reduce_labels", version="6.0.0")
def my_function(do_reduce_labels):
    print(do_reduce_labels)

my_function(do_reduce_labels=False, reduce_labels=True)  # not deprecated kwarg has higher priority
# False
FutureWarning: Both `reduce_labels` and `do_reduce_labels` are set for `my_function`. Using `do_reduce_labels=False` and ignoring deprecated `reduce_labels=True`.
  my_function(do_reduce_labels=False, reduce_labels=True)  # not deprecated kwarg has higher priority

Example usage without renaming argument:

from transformers.utils.deprecation import deprecate_kwarg

@deprecate_kwarg("max_size", version="6.0.0")
def my_function(max_size):
    print(max_size)

my_function(max_size=1333)  # Will show a deprecation warning
FutureWarning: `max_size` is deprecated and will be removed in version 6.0.0 for `my_function`.
  my_function(max_size=1333)  # Will show a deprecation warning

Validation decorator

Decorator to filter out named arguments that are not in the function signature.
This decorator ensures that only the keyword arguments that match the function's signature, or are specified in the
extra list, are passed to the function. Any additional keyword arguments are filtered out and a warning is issued.

Example usage:

from transformers.utils.generic import filter_out_non_signature_kwargs

@filter_out_non_signature_kwargs(extra=["allowed_extra_arg"])
def my_function(arg1, arg2, **kwargs):
    print("Function args:", arg1, arg2, kwargs)

my_function(arg1=1, arg2=2, allowed_extra_arg=3, invalid_arg=4)
UserWarning: The following named arguments are not valid for `my_function` and were ignored: 'invalid_arg'
  my_function(arg1=1, arg2=2, allowed_extra_arg=3, invalid_arg=4)
Function args: 1 2 {'allowed_extra_arg': 3}

Additional testing

Introducing such decorators might be tricky and lead to some unexpected issues since historically all **kwargs arguments were captured and saved as image_processor's attributes. Despite testing the decorators themselves and tests that passed for image processors, additional testing was conducted for configurations on the hub.

For each image processor in ["maskformer", "mask2former", "segformer", "beit", "oneformer"], configs from the hub with more than 10 downloads were pulled, and the result of the preprocess method was compared with the main branch to ensure they are identical.

Code just for reference:
https://gist.github.com/qubvel/85cf17f516063d63219ef773a365e83d

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qubvel qubvel changed the title Clean up do_reduce_labels for image processors Clean up reduce_labels for image processors May 14, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qubvel qubvel requested a review from amyeroberts May 14, 2024 14:09
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this - great to see more alignment between the image processors.

Only comment is to not raise an exception if reduce_labels is passed in

Comment on lines 112 to 111
warnings.warn(
"The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use"
" `do_reduce_labels` instead.",
FutureWarning,
)
do_reduce_labels = kwargs.pop("reduce_labels")
raise ValueError("The `reduce_labels` parameter has been deprecated. Use `do_reduce_labels` instead.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly we can't do this directly for two reasons:

  • A deprecation version wasn't specified in the message (my bad)
  • This is going to break a lot of things for users

We absolutely want to push users to try and use the correct argument, and we should update this value on all of the official checkpoints. However, because this is from the config, which users might not have control over or know how to change; and there might be hundreds of configs with the old value on and off the hub, we're at risk of breaking code and losing users.

What I'd suggest here is:

  • We update the warning message to specify a specific version (typically two versions after the release version this commit will be included in).
  • After this version, we remove the warning. We silently handle this to maintain backwards compatibility i.e. popping the value, and make sure all official references to the flag are removed; we don't guarantee any future maintenance for this value.

Copy link
Member Author

@qubvel qubvel May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's breaking changes...

The idea was to support reading old configs overriding from_dict method with

@classmethod
def from_dict(cls, image_processor_dict: Dict[str, Any], **kwargs):
    ...
    if "reduce_labels" in image_processor_dict:
            image_processor_dict["do_reduce_labels"] = image_processor_dict.pop("reduce_labels")

Here we silently replace reduce_labels with do_reduce_labels.

By raising an exception in __init__, we prevent users from creating new image_processor with deprecated parameters. That might break some fine-tuning scripts, which the user should have control of.

Does it make sense, or am I missing some use cases?
Are such kinds of breaking changes possible only over major versions like v4->v5?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we silently replace reduce_labels with do_reduce_labels.

This is great - we definitely want to do this! It means any old config, if saved out, will be fully compatible.

Are such kinds of breaking changes possible only over major versions like v4->v5?

Kind of, it's more to do with the type and scale of breaking change, and the controls users have over it.

We do deprecate methods, flags, etc. within the codebase semi-regularly. And we do tolerate a certain amount of breaking changes. For example, in musicgen recently, training was enabled. This technically created a breaking change, as one of the models previously returned a loss. The update meant that anyone using the model would see the loss suddenly change. However, as the previous loss was incorrect and the model use relatively low this was deemed OK.

There's two main reasons why we wouldn't want to raise an exception here at the moment:

  • Ability to change the code behaviour. Users don't always have the option of updating the config: they might not have permissions to modify on the hub.
  • Awareness of configs. Many users are not familiar with the configs. It's easier to deprecate calls to e.g. object.method(foo, bar) as we emit a warning, and the user just removes that line of code. Changing the config is less obvious.

The reason I think this is because when we first tried to deprecate flags for the image processors, we got quite a few messages from users either complaining or confused about why they were happening. I'm yet to see a single config updated because of them!

It is something we could do in v5, although we may not want to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the clarification, I will summarize PR just to make sure we are on the same page.

There are 3 most popular ways to create an image processor with deprecated parameters (sorted by popularity, high->low, IMHO)

Method Is BC in current PR
image_processor = ImageProcessor.from_pretrained(
  "repo/model_with_deprecated_params"
)
✅ Yes
Because the user does not
have control of it (by overriding from_dict())
image_processor = ImageProcessor.from_pretrained(
  "repo/model_with_deprecated_params",
  reduce_labels=True,
)
🟥 No, raise error
Because it is explicit parameter setting
image_processor = ImageProcessor(
  ...
  reduce_labels=True,
)
🟥 No, raise error
Because it is explicit parameter setting

I hope this approach:

  1. Pushes the user to update their codebase, which they have control over, while still allowing using previously created models.
  2. Will allow us to gradually move backward compatibility for deprecated params only in one method from_dict (ideally), making code easier to support.
  3. Probably, we can also get rid of **kwargs, have a strict set of arguments, and replace checks with a decorator for renamed/removed arguments:
@depecated("reduce_labels", replace_with="do_reduce_labels", from="4.42.0", to="4.45.0")
def preprocess(...):

The decorator will rename the argument and raise a warning in between, and then we can remove it with automated checks, or leave it for a while to raise an explicit error.
At the moment, because of **kwargs, anyone might easily make a typo in an argument name and it will be silently passed, leading to unexpected behavior that is hard to debug.
As a simple example:

from transformers import DetrImageProcessor

image_processor = DetrImageProcessor.from_pretrained(
    "facebook/detr-resnet-50",
    do_nomalize=False, <----- here set as False
)
print(image_processor)

# DetrImageProcessor {
# ...
#   "do_convert_annotations": true,
#   "do_normalize": true, <----- but we get True
#   "do_pad": true, 
#   "do_rescale": true,
#   "do_resize": true,
# ...
# }

We can test this strategy with some particular models, not super popular, just to see how it works.

I might overestimate the problem or underestimate risks, user behavior, or edge cases. You have much more experience maintaining the library, so let me know if any of these thoughts are valuable. And thank you once again for your clarifications with examples, it is much easier to follow 🤗

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% aligned with you comment above, thanks for taking the time to write this up so clearly.

I really like the idea of a decorator. Let's ask @molbap his opinion on this, as he's been handling a lot of these considerations across processors and image processor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOO for a few days but checked and it's an interesting take. I like the decorator idea too. I wholeheartedly agree with

At the moment, because of **kwargs, anyone might easily make a typo in an argument name and it will be silently passed, leading to unexpected behavior that is hard to debug.

That is almost always true. That's why getting rid of **kwargs is impossible, as we need to be able to handle arbitrary inputs without breaking runtime in situations where people have non-canonical configs.

Hence the point of having a list of valid arguments: that way, we can at least raise a warning and list the arguments passed by a user that are not doing anything. In your case, if do_normalize is part of the valid image processor arguments but do_nomalize (typo) is not, then we can raise a clear warning saying that do_nomalize isn't used anywhere.
We can even push this a bit and do a quick levenshtein check conditioning a perhaps you made a typo? message. That might be too much.

For the decorator option to mark args as deprecated, LGTM too! Less loc, and forces us to write a version number. I'd say the "from" is not necessarily useful as we want to encourage using the latest version, imo.
Thanks for looking into this @qubvel! Kind of related is #30511 that I want to merge ASAP, didn't have time to get back to it with but it's closely connected to this problem.

Copy link
Member Author

@qubvel qubvel May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @molbap

Probably we can use a function signature in decorator to filter not-used kwargs instead of passing them.
What do you think about the following?

import functools
import inspect
import warnings
import numpy as np


def filter_not_used_arguments(func):
    """Filter out named arguments that are not in the function signature."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):

        sig = inspect.signature(func)
        function_named_args = sig.parameters.keys()
        valid_kwargs = {k: v for k, v in kwargs.items() if k in function_named_args}
        invalid_kwargs = {k: v for k, v in kwargs.items() if k not in function_named_args}
        
        if invalid_kwargs:
            warnings.warn(f"Unused named arguments: {', '.join(invalid_kwargs.keys())}")
        
        return func(*args, **valid_kwargs)
    
    return wrapper


class ImageProcessor:

    @filter_not_used_arguments
    def preprocess(self, image, do_normalize=False, do_reduce_labels=False):
        pass


image_processor = ImageProcessor()
image = np.ones((100, 100, 3))

# passing invalid  `do_nomalize` instead of `do_normalize`
image_processor.preprocess(image, do_nomalize=True, do_reduce_labels=True)  
# UserWarning: Unused named arguments: do_nomalize
  1. This will simplify validation, we will not need to store the list of available arguments
  2. Make validation method-specific instead of class-specific
  3. While this is backward compatible and we do not raise an error here, IDE will help users and force users to update parameters if no **kwargs is specified for the function
Screenshot 2024-05-20 at 10 36 32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a great solution :) looks very good to me

@qubvel qubvel marked this pull request as draft May 16, 2024 13:02
@qubvel qubvel force-pushed the clean-up-do-reduce-labels branch from 3f256a1 to b0e26f5 Compare May 24, 2024 00:09
@@ -406,38 +412,18 @@ def __init__(
image_std: Union[float, List[float]] = None,
ignore_index: Optional[int] = None,
do_reduce_labels: bool = False,
num_labels: Optional[int] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, in Mask2Former, and OneFormer: num_labels is not used somewhere across code of image_processor, however it is widely used in tests. I added it explicitly for backward compatibility, in case some pipelines rely on that

Copy link
Member Author

@qubvel qubvel May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could be passed in **kwargs and excluded from filter

@filter_out_non_signature_kwargs(extra=["max_size", "num_labels"])

@qubvel
Copy link
Member Author

qubvel commented May 28, 2024

@molbap @amyeroberts I made a new iteration here. Added two decorators

  1. @deprecate_kwarg("reduce_labels", new_name="do_reduce_labels", version="4.41.0")
  • rename argument if new_name is provided
  • raise warning if current version < version specified, then just do it silently
  1. filter_out_non_signature_kwargs(extra=["max_size"])
  • filter out all named args that are not in the function signature and not specified in extra. As mentioned by @molbap it is not always possible to get rid of all **kwargs (for example, it might include complex logic of parameters interaction).

Please, have a look, feedback is appreciated 🤗

@qubvel qubvel mentioned this pull request May 28, 2024
3 tasks
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really beautiful work - this could be very useful across the library and will help kill so much repeated code ❤️

Only major comment is about the new files src/transformers/utils/kwargs_validation.py and src/transformers/utils/deprecation.py, and whether we want to add these whole new modules into utils.

Personally, I think src/transformers/utils/deprecation.py makes sense. For filter_out_non_signature_kwargs, I'd place it under (the admittedly badly named) utils/generic.py for now

Would be great to get a second opinion on this from @molbap!

src/transformers/utils/deprecation.py Show resolved Hide resolved
src/transformers/utils/kwargs_validation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
src/transformers/utils/kwargs_validation.py Outdated Show resolved Hide resolved
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the kwargs validation method, so much cleaner! Left a few comments :)

src/transformers/utils/kwargs_validation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
src/transformers/utils/deprecation.py Outdated Show resolved Hide resolved
"return_tensors",
"data_format",
"input_data_format",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love seeing this go - just need to ask, since function signatures are currently relying on kwargs, are you using existing config files to verify what passed args should look like as defaults go?
By that I mean, current signature might not be always trustworthy as "canonical parameters" go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got it, could you please provide an example or more details?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sure! What I mean is that the validation decorator is built by

        sig = inspect.signature(func)
        function_named_args = set(sig.parameters.keys())
        valid_kwargs_to_pass = function_named_args.union(extra_params_to_pass)

Hence for SegFormer, where

    @deprecate_kwarg("reduce_labels", new_name="do_reduce_labels", version="4.41.0")
    @filter_out_non_signature_kwargs()
    def __init__(
        self,
        do_resize: bool = True,
        size: Dict[str, int] = None,
        resample: PILImageResampling = PILImageResampling.BILINEAR,
        do_rescale: bool = True,
        rescale_factor: Union[int, float] = 1 / 255,
        do_normalize: bool = True,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        do_reduce_labels: bool = False,
    ) -> None:
...

Here we're using the init signature, and it seems to match, but in vit_hybrid (for instance) you could miss kwargs like input_data_format that are not part of the init signature, but are present in preprocess

    def __init__(
        self,
        do_resize: bool = True,
        size: Dict[str, int] = None,
        resample: PILImageResampling = PILImageResampling.BICUBIC,
        do_center_crop: bool = True,
        crop_size: Dict[str, int] = None,
        do_rescale: bool = True,
        rescale_factor: Union[int, float] = 1 / 255,
        do_normalize: bool = True,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        do_convert_rgb: bool = True,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)
        size = size if size is not None else {"shortest_edge": 224}
        size = get_size_dict(size, default_to_square=False)
        crop_size = crop_size if crop_size is not None else {"height": 224, "width": 224}
        crop_size = get_size_dict(crop_size, default_to_square=True, param_name="crop_size")

        self.do_resize = do_resize
        self.size = size
        self.resample = resample
        self.do_center_crop = do_center_crop
        self.crop_size = crop_size
        self.do_rescale = do_rescale
        self.rescale_factor = rescale_factor
        self.do_normalize = do_normalize
        self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
        self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
        self.do_convert_rgb = do_convert_rgb
        self._valid_processor_keys = [
            "images",
            "do_resize",
            "size",
            "resample",
            "do_center_crop",
            "crop_size",
            "do_rescale",
            "rescale_factor",
            "do_normalize",
            "image_mean",
            "image_std",
            "do_convert_rgb",
            "return_tensors",
            "data_format",
            "input_data_format",
        ]

Hope that clarifies the previous message!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @molbap!

The decorator should be applied per method, specifically for both __init__ and preprocess.

Regarding vit_hybrid, I noticed that input_data_format is present in the preprocess method signature, so it will be handled correctly by the decorator.

However, there could be a potential issue if the preprocess method uses self.input_data_format that was not defined in the __init__ signature and was instead captured by passing **kwargs to super(). I'm not certain if we have instances where we reference an attribute not defined in __init__, but these cases should be managed accurately.

Probably, it is worth writing a short script that will pull the top 100 most popular image_processor configs for each model and compare "main" vs "current branch" methods where decorators were applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script ideas is really a nice one - I did something similar but much more manual last time I rewrote image processors, ➕ on this!

@qubvel qubvel changed the title Clean up reduce_labels for image processors Decorators for deprecation and named arguments validation Jun 6, 2024
@qubvel qubvel force-pushed the clean-up-do-reduce-labels branch from e552513 to 6a0180e Compare June 6, 2024 16:09
@qubvel qubvel marked this pull request as ready for review June 6, 2024 16:18
from typing import Any, Dict, List, Optional, Tuple, Union

import numpy as np

from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_processing_utils import INIT_SERVICE_KWARGS, BaseImageProcessor, BatchFeature, get_size_dict
Copy link
Member Author

@qubvel qubvel Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored from config image processors propogate image_processor_type and processor_class named arguments to __init__ method. Despite they are not necessary and image_processor can be initialized without these arguments, some test logic relies on them. I tried to remove them, but it looks like too many models should be touched, probably, it's better to try as a follow-up.
For now, I came up with this suboptimal solution, just to specify them as extra names for @filter_out_non_signature_kwargs(extra=INIT_SERVICE_KWARGS)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a pretty clean and pragmatic solution

@qubvel
Copy link
Member Author

qubvel commented Jun 6, 2024

I modified PR's name and description to make it clear.
@amyeroberts @molbap could you please have a look once again, it should be done now.

@qubvel qubvel requested review from amyeroberts and molbap June 6, 2024 16:28
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, very clean - not sure about the image_processor_type and processor_class, maybe @amyeroberts will have a suggestion! Added a few comments but other than that lgtm 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is a slow test broken for beit in the modeling, but it is unrelated to this PR.

extra = extra or []
extra_params_to_pass = set(extra)

def decorator(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, add Callable hinting

Copy link
Member Author

@qubvel qubvel Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add type hints in such cases? It's just an internal function inside the decorator and not going to be used somewhere else.

Comment on lines 112 to 130
if old_name in kwargs and new_name in kwargs:
minimum_action = Action.RAISE if raise_if_both_names else Action.NOTIFY
message = f"Both `{old_name}` and `{new_name}` are set. Using `{new_name}={kwargs[new_name]}` and ignoring deprecated `{old_name}={kwargs[old_name]}`."
kwargs.pop(old_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this specific use-case, I may be missing something: if I do

proc = BeitImageProcessor(reduce_labels=True, weird_arg=True)
# UserWarning: The following named arguments are not valid for `BeitImageProcessor.__init__` and were ignored: 'weird_arg'

I'm getting a warning here, but if I call the two versions of the argument (which can be strings it seems, but that's unrelated)

proc = BeitImageProcessor(reduce_labels="any value really", do_reduce_labels="another value")
print( proc(images=np_img)['pixel_values'][0].shape )
# (3, 224, 224)
# no warning here

but it seems it should be the same log level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is a nice catch! Will fix and add a test!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 9156618

Comment on lines +475 to +483
@classmethod
def from_dict(cls, image_processor_dict: Dict[str, Any], **kwargs):
"""
Overrides the `from_dict` method from the base class to save support of deprecated `reduce_labels` in old configs
"""
image_processor_dict = image_processor_dict.copy()
if "reduce_labels" in image_processor_dict:
image_processor_dict["do_reduce_labels"] = image_processor_dict.pop("reduce_labels")
return super().from_dict(image_processor_dict, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, is there a way to abstract this to a generic method too?

Copy link
Member Author

@qubvel qubvel Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove class specific from_dict method since now the decorator handling this, but for that, we have to update super().from_dict, which is currently separate image_processor_dict and **kwargs assignment for image processor initialization.

  • image_processor_dict is passed to __init__ of image processor
  • **kwargs are assigned to image_processor instance with image_processor.key = value

I addressed it for current classes in 2b21c88, however, there would be a small breaking change:

Previously:

  • **kwargs have higher priority over image_processor_dict.
  • 🔴 deprecated kwarg in **kwargs has higher priority than not deprecated one in image_processor_dict

Updated:

  • **kwargs have higher priority over image_processor_dict.
  • 🔴 not deprecated argument has higher priority no matter where it was specified either in image_processor_dict or in **kwargs

For example:

image_processor_dict = {"do_reduce_labels": True}

# Before
image_processor = MaskFormerImageProcessor.from_dict(image_processor_dict, reduce_labels=False)
image_processor.do_reduce_labels == False  # reduce_labels deprecated but still has higher priority

# Now
image_processor = MaskFormerImageProcessor.from_dict(image_processor_dict, reduce_labels=False)

# reduce_labels deprecated -> do_reduce_labels from `image_processor_dict` has higher priority 
# + WARNING will be issued
image_processor.do_reduce_labels == True  

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! thanks for the clear explanation. Then, I think it's ok to have a class-specific method :)

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful piece of work 🔥 It's great to see so much repeated code cleaned up - this should make managing the deprecation for all of the processing classes a lot easier!

@@ -198,3 +200,74 @@ def test_expand_dims_flax(self):
x = np.random.randn(3, 4)
t = jnp.array(x)
self.assertTrue(np.allclose(expand_dims(x, axis=1), np.asarray(expand_dims(t, axis=1))))


class ValidationDecoratorTester(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

from typing import Any, Dict, List, Optional, Tuple, Union

import numpy as np

from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_processing_utils import INIT_SERVICE_KWARGS, BaseImageProcessor, BatchFeature, get_size_dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a pretty clean and pragmatic solution

@qubvel qubvel force-pushed the clean-up-do-reduce-labels branch from e1cd44f to 50f2c10 Compare June 10, 2024 11:10
@qubvel
Copy link
Member Author

qubvel commented Jun 10, 2024

Rebase + revert to class specific from_dict (discussed here #30799 (comment))

@qubvel qubvel merged commit 517df56 into huggingface:main Jun 10, 2024
23 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 14, 2024
…e#30799)

* Fix do_reduce_labels for maskformer image processor

* Deprecate reduce_labels in favor to do_reduce_labels

* Deprecate reduce_labels in favor to do_reduce_labels (segformer)

* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

* Fix typo

* Update mask2former test

* fixup

* Update segmentation examples

* Update docs

* Fixup

* Imports fixup

* Add deprecation decorator draft

* Add deprecation decorator

* Fixup

* Add deprecate_kwarg decorator

* Validate kwargs decorator

* Kwargs validation (beit)

* fixup

* Kwargs validation (mask2former)

* Kwargs validation (maskformer)

* Kwargs validation (oneformer)

* Kwargs validation (segformer)

* Better message

* Fix oneformer processor save-load test

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Better handle classmethod warning

* Fix typo, remove warn

* Add header

* Docs and `additional_message`

* Move to filter decorator ot generic

* Proper deprecation for semantic segm scripts

* Add to __init__ and update import

* Basic tests for filter decorator

* Fix doc

* Override `to_dict()` to pop depracated `_max_size`

* Pop unused parameters

* Fix trailing whitespace

* Add test for deprecation

* Add deprecation warning control parameter

* Update generic test

* Fixup deprecation tests

* Introduce init service kwargs

* Revert popping unused params

* Revert oneformer test

* Allow "metadata" to pass

* Better docs

* Fix test

* Add notion in docstring

* Fix notification for both names

* Add func name to warning message

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix do_reduce_labels for maskformer image processor

* Deprecate reduce_labels in favor to do_reduce_labels

* Deprecate reduce_labels in favor to do_reduce_labels (segformer)

* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

* Fix typo

* Update mask2former test

* fixup

* Update segmentation examples

* Update docs

* Fixup

* Imports fixup

* Add deprecation decorator draft

* Add deprecation decorator

* Fixup

* Add deprecate_kwarg decorator

* Validate kwargs decorator

* Kwargs validation (beit)

* fixup

* Kwargs validation (mask2former)

* Kwargs validation (maskformer)

* Kwargs validation (oneformer)

* Kwargs validation (segformer)

* Better message

* Fix oneformer processor save-load test

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Better handle classmethod warning

* Fix typo, remove warn

* Add header

* Docs and `additional_message`

* Move to filter decorator ot generic

* Proper deprecation for semantic segm scripts

* Add to __init__ and update import

* Basic tests for filter decorator

* Fix doc

* Override `to_dict()` to pop depracated `_max_size`

* Pop unused parameters

* Fix trailing whitespace

* Add test for deprecation

* Add deprecation warning control parameter

* Update generic test

* Fixup deprecation tests

* Introduce init service kwargs

* Revert popping unused params

* Revert oneformer test

* Allow "metadata" to pass

* Better docs

* Fix test

* Add notion in docstring

* Fix notification for both names

* Add func name to warning message

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix do_reduce_labels for maskformer image processor

* Deprecate reduce_labels in favor to do_reduce_labels

* Deprecate reduce_labels in favor to do_reduce_labels (segformer)

* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

* Fix typo

* Update mask2former test

* fixup

* Update segmentation examples

* Update docs

* Fixup

* Imports fixup

* Add deprecation decorator draft

* Add deprecation decorator

* Fixup

* Add deprecate_kwarg decorator

* Validate kwargs decorator

* Kwargs validation (beit)

* fixup

* Kwargs validation (mask2former)

* Kwargs validation (maskformer)

* Kwargs validation (oneformer)

* Kwargs validation (segformer)

* Better message

* Fix oneformer processor save-load test

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Better handle classmethod warning

* Fix typo, remove warn

* Add header

* Docs and `additional_message`

* Move to filter decorator ot generic

* Proper deprecation for semantic segm scripts

* Add to __init__ and update import

* Basic tests for filter decorator

* Fix doc

* Override `to_dict()` to pop depracated `_max_size`

* Pop unused parameters

* Fix trailing whitespace

* Add test for deprecation

* Add deprecation warning control parameter

* Update generic test

* Fixup deprecation tests

* Introduce init service kwargs

* Revert popping unused params

* Revert oneformer test

* Allow "metadata" to pass

* Better docs

* Fix test

* Add notion in docstring

* Fix notification for both names

* Add func name to warning message

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix do_reduce_labels for maskformer image processor

* Deprecate reduce_labels in favor to do_reduce_labels

* Deprecate reduce_labels in favor to do_reduce_labels (segformer)

* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

* Fix typo

* Update mask2former test

* fixup

* Update segmentation examples

* Update docs

* Fixup

* Imports fixup

* Add deprecation decorator draft

* Add deprecation decorator

* Fixup

* Add deprecate_kwarg decorator

* Validate kwargs decorator

* Kwargs validation (beit)

* fixup

* Kwargs validation (mask2former)

* Kwargs validation (maskformer)

* Kwargs validation (oneformer)

* Kwargs validation (segformer)

* Better message

* Fix oneformer processor save-load test

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Better handle classmethod warning

* Fix typo, remove warn

* Add header

* Docs and `additional_message`

* Move to filter decorator ot generic

* Proper deprecation for semantic segm scripts

* Add to __init__ and update import

* Basic tests for filter decorator

* Fix doc

* Override `to_dict()` to pop depracated `_max_size`

* Pop unused parameters

* Fix trailing whitespace

* Add test for deprecation

* Add deprecation warning control parameter

* Update generic test

* Fixup deprecation tests

* Introduce init service kwargs

* Revert popping unused params

* Revert oneformer test

* Allow "metadata" to pass

* Better docs

* Fix test

* Add notion in docstring

* Fix notification for both names

* Add func name to warning message

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
itazap pushed a commit that referenced this pull request Jun 20, 2024
* Fix do_reduce_labels for maskformer image processor

* Deprecate reduce_labels in favor to do_reduce_labels

* Deprecate reduce_labels in favor to do_reduce_labels (segformer)

* Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

* Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

* Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

* Fix typo

* Update mask2former test

* fixup

* Update segmentation examples

* Update docs

* Fixup

* Imports fixup

* Add deprecation decorator draft

* Add deprecation decorator

* Fixup

* Add deprecate_kwarg decorator

* Validate kwargs decorator

* Kwargs validation (beit)

* fixup

* Kwargs validation (mask2former)

* Kwargs validation (maskformer)

* Kwargs validation (oneformer)

* Kwargs validation (segformer)

* Better message

* Fix oneformer processor save-load test

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/utils/deprecation.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Better handle classmethod warning

* Fix typo, remove warn

* Add header

* Docs and `additional_message`

* Move to filter decorator ot generic

* Proper deprecation for semantic segm scripts

* Add to __init__ and update import

* Basic tests for filter decorator

* Fix doc

* Override `to_dict()` to pop depracated `_max_size`

* Pop unused parameters

* Fix trailing whitespace

* Add test for deprecation

* Add deprecation warning control parameter

* Update generic test

* Fixup deprecation tests

* Introduce init service kwargs

* Revert popping unused params

* Revert oneformer test

* Allow "metadata" to pass

* Better docs

* Fix test

* Add notion in docstring

* Fix notification for both names

* Add func name to warning message

* Fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants