-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuyu: improve image processing #27007
Conversation
It could produce negative padding and hence inference errors for certain image sizes.
…ow_up_image_processing
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discovered one bug in FuyuBatchEncoding
* Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
…processing_conflicts
fixing conflicts and updating on main
…ace/transformers into fuyu_follow_up_image_processing
This version of the processor now correctly supports batching, dtype casting, and the left-padded batch generation yields the same results as single-input generation. from PIL import Image
import requests
import io
from transformers import FuyuForCausalLM, FuyuProcessor, FuyuImageProcessor, AutoTokenizer
from PIL import Image
pretrained_path = "adept/fuyu-8b"
tokenizer = AutoTokenizer.from_pretrained(pretrained_path, pad_token_id=0)
image_processor = FuyuImageProcessor()
processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)
text_prompt = "Answer the following DocVQA question based on the image. \n Which is the metro in California that has a good job Outlook?"
jobs_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/jobs.png"
jobs_image_pil = Image.open(io.BytesIO(requests.get(jobs_image_url).content))
second_text_prompt = "Answer the following DocVQA question based on the image. \n What if the maximum male life expectancy?"
chart_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/chart.png"
chart_image_pil = Image.open(io.BytesIO(requests.get(chart_image_url).content))
third_text_prompt = "Answer the following DocVQA question based on the image. \n What sport is that?"
skate_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/skateboard.png"
skate_image_pil = Image.open(io.BytesIO(requests.get(skate_image_url).content))
fourth_text_prompt = "Answer the following DocVQA question based on the image. \n What was the fair amount of paid vacation days in the United Kingdom?"
vacations_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/vacation_days_hr.png"
vacations_image_pil = Image.open(io.BytesIO(requests.get(vacations_image_url).content)).convert('RGB')
texts = [text_prompt, second_text_prompt, third_text_prompt, fourth_text_prompt]
images = [jobs_image_pil, chart_image_pil, skate_image_pil, vacations_image_pil]
model_inputs = processor(text=texts, images=images).to('cuda')
model = FuyuForCausalLM.from_pretrained(pretrained_path, device_map='auto')
generation = processor.tokenizer.batch_decode(model.generate(
**model_inputs, max_new_tokens=10)[:, -10:], skip_special_tokens=True)
single_generations = ['Los Angeles', '80.7',
'skateboarding', '28']
for single_generation, batched_generation in zip(single_generations, generation):
answer = batched_generation.split('\x04 ', 1)[1] if '\x04' in batched_generation else ''
assert (single_generation == answer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There's quite a lot of changes to the image processing code which were done by me - so I might be a bit blind to any issues in the diff. @pcuenca gave a detailed review however, so I think we're good :)
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
I think current version of image processing and tokenization does not support the usage sample code in the original release, right?
I am trying to run the above code, and the error occurs that the inputs['image_patches'] is now a list and cannot be put to device. I suggested that either you can also support this type of processing or you can directly update the sample code on the huggingface release page in link |
Hi, I've updated the code snippet on the model card, it works for me as expected (note that you need to install Transformers from the main branch: pip install -q git+https://github.com/huggingface/transformers.git) |
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (huggingface#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (huggingface#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
Have there been any updates to this? Still running into the same issue. Thanks! |
Hello @cyrilzakka 👋 I've read the thread and according to Niels the sample code in the model card should work, can you please give more details about the issue you are facing? Thank you! :) |
Hey @pcuenca! Sorry for the trouble but having issues running Fuyu inference on a multi-GPU (4x25GB GPUs) setup:
|
same here |
The code I used :
Traceback here:
Sry about the simple traceback lol |
@oops343 would it be possible to open a new issue for this? |
@NielsRogge sure, pls check |
What does this PR do?
This PR aims at aligning the FuyuImageProcessor class with other vision/language models within transformers. Fuyu model expects a tensor of token ids, a tensor of patch embeddings, and an indexing tensor indicating where to put rows of patch embeddings into the token embeddings, separated by the input ids. Currently the image processor does not separate the steps necessary to achieve this output in the Processor. It also limits the inference size to batches of size 1. It also aims at improving readability and code quality of the processor to possibly enable pipelining later on.
Pending tasks:
BatchFeature
with arbitrary batch sizedo_rescale
,do_normalize
,do_pad
arguments in theImageProcessor
constructorprocess_images_for_model_input
, currently hard to readBefore submitting
Pull Request section?
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: