Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor #10594

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Mar 8, 2021

What does this PR do?

This PR refactors the class PreTrainedFeatureExtractor. The following changes are done to move functionality that is shared between sequence and image feature extractors into a separate file. This should unblock the PRs of DETR, VIT, and CLIP

  • PreTrainedFeatureExtractor is renamed to PreTrainedSequenceFeatureExtractor because it implicitly assumed that the it will treat only sequential inputs (a.k.a sequence of float values or a sequence of float vectors). PreTrainedFeatureExtractor was too general
  • All functionality that is shared between Image and Speech feature extractors (which IMO all relates to "saving" utilities) is moved to a FeatureExtractorSavingUtilsMixin
  • BatchFeature is moved from the feature_extraction_sequence_utils.py to feature_extraction_common_utils.py to be used by the PreTrainedImageFeatureExtractor class as well
  • The tests are refactored accordingly

The following things were assumed before applying the changes.

  • In the mid-term future there will only be three modalities in HF: text, sequential features (value sequence, vector sequence), image features (2D non-sequential array)
  • Models, such as ViT, DETR & CLIP will call their "preprocessor" VITFeatureExtractor, .... IMO, feature extractor is also a fitting name for image recognition (see: https://en.wikipedia.org/wiki/Feature_extraction) so that it is assumed that for image-text or image-only models there will be a PreTrainedImageFeatureExtractor, a VITFeatureExtractor, (and maybe a VITTokenizer & VITProcessor as well, but not necessary). For vision-text models that do require both a tokenizer and a feature extractor such as CLIP it is assumed that the classes CLIPFeatureExtractor and CLIPTokenizer are wrapped into a CLIPProcessor class similar to Wav2Vec2Processor. I think this is the most important assumption that is taken here, so we should make sure we are on the same page here @LysandreJik @sgugger @patil-suraj @NielsRogge
  • Image - Text or Image - only models won't require a BatchImageFeature or BatchImage, but can just use BatchFeature. From looking at the code in @NielsRogge's PR here: Add Vision Transformer + ViTFeatureExtractor #10513 this seems to be the case.

Backwards compatibility:

The class PreTrainedFeatureExtractor was accessible via:

from transformers import PreTrainedFeatureExtractor

but is now replaced by PreTrainedSequenceFeatureExtractor. However, since PreTrainedFeatureExtractor so far was only available on master, this change is OK IMO.

class BatchFeature(UserDict):
r"""
Holds the output of the :meth:`~transformers.PreTrainedFeatureExtractor.pad` and feature extractor specific
Holds the output of the :meth:`~transformers.PreTrainedSequenceFeatureExtractor.pad` and feature extractor specific
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NielsRogge, here you can just overwrite the docstring by

of the :meth: .... or the :meth:`~transformers.PreTrainedImageFeatureExtractor.pad`


Examples::

# We can't instantiate directly the base class `PreTrainedFeatureExtractor` so let's show the examples on a
# We can't instantiate directly the base class `PreTrainedSequenceFeatureExtractor` so let's show the examples on a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples for ImageFeatureExtractor saving/loading can be appended to the Examples here

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I like the API. Something I would change to streamline it with our tokenizers is to have the pad and _pad methods defined in the superclass, but raise a NotImplementedError if they're not implemented. Similarly to tokenize() in the tokenization utils base.

Also, here you imported PaddingStrategy which is great so as to not duplicate existing objects, how would we manage this for vision models? Since the padding/cropping strategies would probably be different (i.e., largest instead of longest, as @NielsRogge was mentioning)

src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for working on this!

As I said offline, I don't like the very long names for the new classes and the module names so we should strive to find something easier.

src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/feature_extraction_common_utils.py Outdated Show resolved Hide resolved
src/transformers/models/wav2vec2/processing_wav2vec2.py Outdated Show resolved Hide resolved
@patrickvonplaten patrickvonplaten merged commit 9a06b6b into huggingface:master Mar 9, 2021
@dribnet dribnet mentioned this pull request Mar 11, 2021
Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
…ggingface#10594)

* save first version

* finish refactor

* finish refactor

* correct naming

* correct naming

* shorter names

* Update src/transformers/feature_extraction_common_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* change name

* finish

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants