-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update feature extractor methods to enable type cast before normalize #18499
Update feature extractor methods to enable type cast before normalize #18499
Conversation
This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.
…th different configs
The documentation is not available anymore as the PR was closed or merged. |
We write a generic function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
@@ -58,13 +58,13 @@ def test_conversion_image_to_array(self): | |||
array3 = feature_extractor.to_numpy_array(image, rescale=False) | |||
self.assertTrue(array3.dtype, np.uint8) | |||
self.assertEqual(array3.shape, (3, 16, 32)) | |||
self.assertTrue(np.array_equal(array1, array3.astype(np.float32) / 255.0)) | |||
self.assertTrue(np.array_equal(array1, array3.astype(np.float32) * (1 / 255.0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed to reflect the rescaling logic. The max difference between the arrays before this change was ~5e-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! If the changes per model are small enough, it would probably be best to change them all in the same PR, rather than doing individual ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Looks good to me too
@sgugger Yep, I completely agree. The changes all together aren't that small, but almost exactly the same across models. Once this is merged in, I'll open a PR for the VideoMAE refactor (https://github.com/amyeroberts/transformers/pull/9/files) as this covers all the changes. Once approved, I'll merge in the other models to the branch, as for re-review of the total PR and then merge all together. |
""" | ||
Rescale a numpy image by scale amount | ||
""" | ||
self._ensure_format_supported(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks whether the image is a PIL image, NumPy array or PyTorch tensor, but this method only expects NumPy arrays. So I'd update to raise a ValueError
in case the image
isn't an instance of np.ndarray.
The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of huggingface#1849 (49e44b2). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see [PR comment](huggingface#18499 (comment))). Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits.
* Updated test values The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of #1849 (49e44b2). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see [PR comment](#18499 (comment))). Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits. * Use double quotes, like previous example * Fix up
…huggingface#18499) * Update methods to optionally rescale This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays. * Cast images to numpy arrays in call to enable consistent behaviour with different configs * Remove accidental clip changes * Update tests to reflect the scaling logic We write a generic function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
* Updated test values The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of huggingface#1849 (49e44b2). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see [PR comment](huggingface#18499 (comment))). Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits. * Use double quotes, like previous example * Fix up
What does this PR do?
At the moment, the return type of our feature extractors isn't always as expected or sometimes fails if a
do_xxx
config flag is set toFalse
. This PR introduces the necessary changes to theImageFeatureExtractionMixin
methods such that we can modify the feature extractor calls to fix this. This is an alternative solution to settingreturn_tensors="np"
as default.Each vision model using
ImageFeatureExtractionMixin
has a separate PR adding their necessary modifications and tests.Details
At the moment, if
do_normalize=False
,do_resize=True
andreturn_tensors=None
then the output tensors will be a list ofPIL.Image.Image
objects if even if the inputs are numpy arrays. Ifdo_normalize=False
andreturn_tensors
is specified ("pt"
,"np"
,"tf"
,"jax"
) an exception is raised.The main reasons for this are:
BatchFeature
can't convertPIL.Image.Image
to the requested tensors.PIL.Image.Image
->np.ndarray
happens within thenormalize
method and the output ofresize
isPIL.Image.Image
.In order to have the type of the returned
pixel_values
reflectreturn_tensors
we need to:PIL.Image.Image
objects to numpy arrays before passing toBatchFeature
normalize
method. If the input tonormalize
is aPIL.Image.Image
it is converted to a numpy array usingto_numpy_array
which rescales to between [0, 1]. Ifdo_resize=False
then this rescaling won't happen if the inputs are numpy arrays.The optional flags enable us to preserve the same default behaviour for the
resize
andnormalize
methods whilst modifying the internal logic of the feature extractor call.Checks
The model PRs are all cherry picked (file diffs) of
type-cast-before-normalize
The following was run to check the outputs:
Output:
Fixes
#17714
#15055
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.