-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor #10594
[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor #10594
Conversation
class BatchFeature(UserDict): | ||
r""" | ||
Holds the output of the :meth:`~transformers.PreTrainedFeatureExtractor.pad` and feature extractor specific | ||
Holds the output of the :meth:`~transformers.PreTrainedSequenceFeatureExtractor.pad` and feature extractor specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NielsRogge, here you can just overwrite the docstring by
of the :meth: .... or the :meth:`~transformers.PreTrainedImageFeatureExtractor.pad`
|
||
Examples:: | ||
|
||
# We can't instantiate directly the base class `PreTrainedFeatureExtractor` so let's show the examples on a | ||
# We can't instantiate directly the base class `PreTrainedSequenceFeatureExtractor` so let's show the examples on a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples for ImageFeatureExtractor saving/loading can be appended to the Examples here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I like the API. Something I would change to streamline it with our tokenizers is to have the pad
and _pad
methods defined in the superclass, but raise a NotImplementedError
if they're not implemented. Similarly to tokenize()
in the tokenization utils base.
Also, here you imported PaddingStrategy
which is great so as to not duplicate existing objects, how would we manage this for vision models? Since the padding/cropping strategies would probably be different (i.e., largest
instead of longest
, as @NielsRogge was mentioning)
…into add_processing_save_load_utils
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for working on this!
As I said offline, I don't like the very long names for the new classes and the module names so we should strive to find something easier.
…ggingface#10594) * save first version * finish refactor * finish refactor * correct naming * correct naming * shorter names * Update src/transformers/feature_extraction_common_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * change name * finish Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
What does this PR do?
This PR refactors the class
PreTrainedFeatureExtractor
. The following changes are done to move functionality that is shared between sequence and image feature extractors into a separate file. This should unblock the PRs of DETR, VIT, and CLIPPreTrainedFeatureExtractor
is renamed toPreTrainedSequenceFeatureExtractor
because it implicitly assumed that the it will treat only sequential inputs (a.k.a sequence of float values or a sequence of float vectors).PreTrainedFeatureExtractor
was too generalFeatureExtractorSavingUtilsMixin
BatchFeature
is moved from thefeature_extraction_sequence_utils.py
tofeature_extraction_common_utils.py
to be used by thePreTrainedImageFeatureExtractor
class as wellThe following things were assumed before applying the changes.
VITFeatureExtractor
, .... IMO, feature extractor is also a fitting name for image recognition (see: https://en.wikipedia.org/wiki/Feature_extraction) so that it is assumed that for image-text or image-only models there will be aPreTrainedImageFeatureExtractor
, aVITFeatureExtractor
, (and maybe a VITTokenizer & VITProcessor as well, but not necessary). For vision-text models that do require both a tokenizer and a feature extractor such as CLIP it is assumed that the classesCLIPFeatureExtractor
andCLIPTokenizer
are wrapped into aCLIPProcessor
class similar toWav2Vec2Processor
. I think this is the most important assumption that is taken here, so we should make sure we are on the same page here @LysandreJik @sgugger @patil-suraj @NielsRoggeBatchImageFeature
orBatchImage
, but can just useBatchFeature
. From looking at the code in @NielsRogge's PR here: Add Vision Transformer + ViTFeatureExtractor #10513 this seems to be the case.Backwards compatibility:
The class
PreTrainedFeatureExtractor
was accessible via:but is now replaced by
PreTrainedSequenceFeatureExtractor
. However, sincePreTrainedFeatureExtractor
so far was only available on master, this change is OK IMO.