[`Fuyu`] Replace it to `BatchFeature` #27109

younesbelkada · 2023-10-27T13:31:27Z

What does this PR do?

Right now users needs to manually loop over FuyuProcessor's output and apply to to each element. One should use BatchFeature from image_processing_utils and call to directly to the processed elements

Before this PR users needed to do:

model_inputs = processor(text=text_prompt, images=raw_image)
for k, v in model_inputs.items():
    if v.dtype != torch.long:
        v = v.to(torch.float16)
    model_inputs[k] = v.to("cuda")

To run inference on 4bit models

Now they just have to

model_inputs = processor(text=text_prompt, images=raw_image, return_tensors="pt").to("cuda", torch.float16)

cc @ArthurZucker

Script to run the model in 4bit:

import torch
from transformers import FuyuProcessor, FuyuForCausalLM
from PIL import Image
import requests

# load model and processor
model_id = "adept/fuyu-8b"
processor = FuyuProcessor.from_pretrained(model_id)
model = FuyuForCausalLM.from_pretrained(model_id, device_map="cuda:0", load_in_4bit=True)

# prepare inputs for the model
text_prompt = "Generate a coco-style caption.\n"
img_url = 'https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(text=text_prompt, images=raw_image, return_tensors="pt").to("cuda", torch.float16)

# autoregressively generate text
generation_output = model.generate(**inputs, max_new_tokens=7)
generation_text = processor.batch_decode(generation_output[:, -7:], skip_special_tokens=True)

ArthurZucker

Thanks, Related to #27007 but not adressed in it, and cc @amyeroberts let's merge this for now just to be sure we have it

younesbelkada · 2023-10-27T13:42:14Z

THanks !
I just copied over the logic that was in place in BLIP - https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip/processing_blip.py#L129 (the typehint is wrong there BTW, it returns a BatchFeature) per my understanding for processors that have both text and image input uses BatchFeature let me know if another approach is preferred @amyeroberts

HuggingFaceDocBuilderDev · 2023-10-27T13:51:31Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts · 2023-10-27T13:54:10Z

@younesbelkada Thanks for addressing this!

If it's OK with you - can we hold off on this for a day or two? I'm currently working on refactoring the image processing and processing code for Fuyu and this will be addressed there too :)

If you look at #27007 - you'll see that there's a custom BatchEncoding class added (it should actually be a BatchFeature class because there's float tensors). This is to address the atypical data structure that the processor class is returning - lists of lists instead of tensors. This is because each sample in a minibatch can have a variable number of images. There's an internal discussion on slack asking how we should represent the input/output data to reflect this. At the moment, we can wrap with BatchFeature as done in this PR but I'm not certain it extends to batch sizes of more than 1.

amyeroberts · 2023-10-27T13:54:52Z

If it's blocking - then we can merge this and I can rebase the changes into my working branch

younesbelkada · 2023-10-27T13:59:06Z

Thanks @amyeroberts your explanation makes sense to me! I was not aware of #27007 and it is great that this issue is being addressed there.
Definitely ok for me to wait a bit before this gets merged! I just wanted to make sure users have a consistent API for multimodal models for the next release (i.e. avoid looping over the processor outputs), perhaps if #27007 is not ready for the release we can merge this PR first, what do you think?

younesbelkada · 2023-11-02T11:05:50Z

Closing this PR as #27007 is going to be merged

replace it to BatchFeature

c67a146

younesbelkada requested review from molbap and ArthurZucker October 27, 2023 13:31

ArthurZucker approved these changes Oct 27, 2023

View reviewed changes

younesbelkada mentioned this pull request Oct 27, 2023

Fuyu: improve image processing #27007

Merged

9 tasks

younesbelkada closed this Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Fuyu`] Replace it to `BatchFeature` #27109

[`Fuyu`] Replace it to `BatchFeature` #27109

younesbelkada commented Oct 27, 2023

ArthurZucker left a comment

younesbelkada commented Oct 27, 2023

HuggingFaceDocBuilderDev commented Oct 27, 2023 •

edited

Loading

amyeroberts commented Oct 27, 2023

amyeroberts commented Oct 27, 2023

younesbelkada commented Oct 27, 2023

younesbelkada commented Nov 2, 2023

[Fuyu] Replace it to BatchFeature #27109

[Fuyu] Replace it to BatchFeature #27109

Conversation

younesbelkada commented Oct 27, 2023

What does this PR do?

ArthurZucker left a comment

Choose a reason for hiding this comment

younesbelkada commented Oct 27, 2023

HuggingFaceDocBuilderDev commented Oct 27, 2023 • edited Loading

amyeroberts commented Oct 27, 2023

amyeroberts commented Oct 27, 2023

younesbelkada commented Oct 27, 2023

younesbelkada commented Nov 2, 2023

[`Fuyu`] Replace it to `BatchFeature` #27109

[`Fuyu`] Replace it to `BatchFeature` #27109

HuggingFaceDocBuilderDev commented Oct 27, 2023 •

edited

Loading