-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuyu processing update #27133
Fuyu processing update #27133
Conversation
@pcuenca Here's the draft PR for updating the image processor. In relation to your PR with the box coordinate transformations, you'll notice that I've removed the |
cc @molbap |
The documentation is not available anymore as the PR was closed or merged. |
@amyeroberts Nice! I'll update accordingly. |
patches = patches.reshape(batch_size, -1, channels * patch_height * patch_width) | ||
return patches | ||
|
||
def preprocess_with_tokenizer_info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was renamed to preprocess_with_tokenizer_info
to reflect the current naming patterns with other image processors: preprocess
for creating the model inputs, post_process_xxx
for processing the model outputs for a specific downstream task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good to know! thanks for the explanation
# Copied from transformers.models.detr.image_processing_detr.max_across_indices | ||
def max_across_indices(values: Iterable[Any]) -> List[Any]: | ||
""" | ||
Return the maximum value across all indices of an iterable of values. | ||
""" | ||
return [max(values_i) for values_i in zip(*values)] | ||
|
||
|
||
# Copied from transformers.models.detr.image_processing_detr.get_max_height_width | ||
def get_max_height_width( | ||
images: List[np.ndarray], input_data_format: Optional[Union[str, ChannelDimension]] = None | ||
) -> List[int]: | ||
""" | ||
Get the maximum height and width across all images in a batch. | ||
""" | ||
if input_data_format is None: | ||
input_data_format = infer_channel_dimension_format(images[0]) | ||
|
||
if input_data_format == ChannelDimension.FIRST: | ||
_, max_height, max_width = max_across_indices([img.shape for img in images]) | ||
elif input_data_format == ChannelDimension.LAST: | ||
max_height, max_width, _ = max_across_indices([img.shape for img in images]) | ||
else: | ||
raise ValueError(f"Invalid channel dimension format: {input_data_format}") | ||
return (max_height, max_width) | ||
|
||
|
||
# Copied from transformers.models.detr.image_processing_detr.make_pixel_mask | ||
def make_pixel_mask( | ||
image: np.ndarray, output_size: Tuple[int, int], input_data_format: Optional[Union[str, ChannelDimension]] = None | ||
) -> np.ndarray: | ||
""" | ||
Make a pixel mask for the image, where 1 indicates a valid pixel and 0 indicates padding. | ||
|
||
Args: | ||
image (`np.ndarray`): | ||
Image to make the pixel mask for. | ||
output_size (`Tuple[int, int]`): | ||
Output size of the mask. | ||
""" | ||
input_height, input_width = get_image_size(image, channel_dim=input_data_format) | ||
mask = np.zeros(output_size, dtype=np.int64) | ||
mask[:input_height, :input_width] = 1 | ||
return mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were removed as they didn't appear to be used anywhere in the processing logic
return mask | ||
|
||
|
||
class FuyuBatchEncoding(BatchEncoding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was replaced wtih BatchFeature
as the processor contains image_patches
which are of float type
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
…rocessing-update-coordinates
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
…rocessing-update-coordinates
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits.
target_width: int = 1920, | ||
do_resize: bool = True, | ||
size: Optional[Dict[str, int]] = None, | ||
resample: PILImageResampling = PILImageResampling.BILINEAR, # FIXME check default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resample: PILImageResampling = PILImageResampling.BILINEAR, # FIXME check default value | |
resample: PILImageResampling = PILImageResampling.BILINEAR, |
This is how it was done in the original code: https://huggingface.co/adept-hf-collab/adept-mm/blob/736c6b570b2a9c0367a3266746fd1f53cfff0a2b/mm-inference-for-hf/multimodal/data/image_utils.py#L208
BILINEAR
seems correct, as our resizing is always done on PIL images and antialias
is True in that case.
if is_vision_available(): | ||
from .image_processing_fuyu import FuyuImageProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is_vision_available(): | |
from .image_processing_fuyu import FuyuImageProcessor | |
from .image_processing_fuyu import FuyuImageProcessor |
Otherwise I think import FuyuProcessor
would fail if torchvision is not installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helped uncover a bug! The image processor was being reset, overwriting the user's input here. If we get rid of that, then we don't need this import at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh, right!
# Batch of two images - different sizes | ||
images = [self.bus_image_pil, self.bus_image_pil.resize((64, 300))] | ||
processor_outputs = self.processor(text=[self.text_prompt, self.text_prompt], images=images) | ||
# FIXME - test outputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completed, this succeeds now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test which checks the processing of an individual resized images, and then checks the padding for two differently sized images in a batch
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
LGTM, I'll add some tests related to model in my PR! Ok to merge to #27007 when amyeroberts#113 is merged, and I'll add a model tester there |
Fuyu processing: handle coordinates
584f792
into
huggingface:fuyu_follow_up_image_processing
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (huggingface#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (huggingface#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
What does this PR do?
This PR builds upon #27007 - ticking off some elements in the TODO list and bringing the processor and image processor more in-line with expected patterns in the library.