CUDAerror: batch inference instructblip #328

zhangqingwu · 2023-05-25T13:18:01Z

When batch=1, it can reason normally
``
model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True,device=device)
test_dataset = DatasetInstructBILPImage(transformer=vis_processors, pkl_label_file=pkl_label)
test_dataloader = DataLoader(test_dataset, batch_size=1, num_workers=0)
prompt = "Write a short description for the image."
with torch.no_grad():
for sample in test_dataloader:
image = sample["image"].cuda() # to(device, torch.float16)
text_output = model.generate({"image": image, "prompt": [prompt]*image.size()[0]})

When the batchsize is set to 2, frist batch can be inference normally, and then this problem is encountered
test_dataloader = DataLoader(test_dataset, batch_size=2, num_workers=0)

The text was updated successfully, but these errors were encountered:

Scarecrow0 · 2023-06-01T02:34:02Z

Same problem, can not inference where batch size greater than 1.

24-solar-terms · 2023-06-07T07:32:13Z

Same problem

LiJunnan1992 · 2023-06-09T00:32:21Z

I cannot reproduce this error. May I know your transformers version?

24-solar-terms · 2023-06-09T01:47:51Z

@LiJunnan1992 I use transformers==4.29.1

LiJunnan1992 · 2023-06-09T07:24:37Z

I still cannot reproduce this error. I can successfully run batch inference with the following code.

import torch
from PIL import Image
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

import torch
from lavis.models import load_model_and_preprocess
model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True, device=device)

img_path = "docs/_static/merlion.png"
raw_image = Image.open(img_path).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
image = torch.cat([image,image],dim=0)
prompt = ["Describe the image in details.","Which city is this?"]
model.generate({"image": image,"prompt":prompt})

zhangqingwu · 2023-06-09T14:32:36Z

add outputs[outputs == -1] = 1 to

LAVIS/lavis/models/blip2_models/blip2_vicuna_instruct.py

Line 372 in 59273f6

outputs[outputs == 0] = 2 # convert output id 0 to 2 (eos_token_id)

You can have a try

Scarecrow0 · 2023-06-12T03:04:47Z

I think it may be caused by a torch.DDP, I adpot the training and evaluation loop of blip2-opt for instruct-blip vicuna and cause this error.
@LiJunnan1992 Would you have plan to release implementation of train/evalution loop for instruct-blip? It will help alot, thanks!

Scarecrow0 · 2023-06-12T03:06:17Z

add outputs[outputs == -1] = 1 to

LAVIS/lavis/models/blip2_models/blip2_vicuna_instruct.py

Line 372 in 59273f6

outputs[outputs == 0] = 2 # convert output id 0 to 2 (eos_token_id)

You can have a try

Does this modification will work?
The error is occur within llm_model.generate()

STK101 · 2023-06-23T05:15:50Z

I still cannot reproduce this error. I can successfully run batch inference with the following code.

import torch
from PIL import Image
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

import torch
from lavis.models import load_model_and_preprocess
model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True, device=device)

img_path = "docs/_static/merlion.png"
raw_image = Image.open(img_path).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
image = torch.cat([image,image],dim=0)
prompt = ["Describe the image in details.","Which city is this?"]
model.generate({"image": image,"prompt":prompt})

I am getting the same error for this code too, but as soon as I change to FlanT5, the error disappears. I'm pretty sure this has something to do with the Vicuna7b's generate function

ustcwhy · 2023-07-09T07:00:17Z

Same problem, I cannot run inferences when batch_size_eval > 1. Do you resolve this issue?

STK101 · 2023-07-10T06:59:23Z

Same problem, I cannot run inferences when batch_size_eval > 1. Do you resolve this issue?

Vicuna did not work for me, I just used FlanT5 instead just change the LLM you're using.
model, vis_processors, _ = load_model_and_preprocess(name="blip2_t5_instruct", model_type="flant5xl", is_eval=True, device=device)

Zhiyuan-Fan · 2023-08-01T15:19:25Z

I've encountered the same issue as well. Is there a solution for it?

Cuiunbo · 2023-09-22T13:33:54Z

Is there a solution for it? when batch is more than 1 run the code below got error

def generate(self, images, questions, ocr_tokens=None):
        processed_images = [Image.open(img_path).convert("RGB") for img_path in images]
        
        prompts = []
        for i in range(len(questions)):
            token = ocr_tokens[i] if ocr_tokens and ocr_tokens[i] is not None else ''
            prompt = f"<Image> OCR tokens: {token}. Question: {questions[i]} Short answer:"
            prompts.append(prompt)
        inputs = self.processor(images=processed_images, text=prompts, return_tensors="pt", padding='longest', truncation=True).to(self.device)
 with torch.no_grad():
        generated_texts = self.model.generate(**inputs,
                                                do_sample=False,
                                                num_beams=1,
                                                max_length=256,
                                                min_length=1,
                                                top_p=0.9,
                                                repetition_penalty=1.5,
                                                length_penalty=1.0,
                                                temperature=1)

zzzzzero · 2023-10-07T09:07:52Z

Modify the config.json file of the viccuna model, and change "pad_token_id=-1" to "pad_token_id=2" in "config.json".
In "config.json" change "pad_token_id=-1" to "pad_token_id=2". This happens because during batch generation, the model sometimes generates pad_token_id=-1

huggingface/transformers#22546 (comment)

kochsebastian · 2023-11-09T16:47:40Z

Modify the config.json file of the viccuna model, and change "pad_token_id=-1" to "pad_token_id=2" in "config.json". In "config.json" change "pad_token_id=-1" to "pad_token_id=2". This happens because during batch generation, the model sometimes generates pad_token_id=-1

huggingface/transformers#22546 (comment)

This seems unrelated but it actually solves the issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDAerror: batch inference instructblip #328

CUDAerror: batch inference instructblip #328

zhangqingwu commented May 25, 2023 •

edited

Loading

Scarecrow0 commented Jun 1, 2023

24-solar-terms commented Jun 7, 2023

LiJunnan1992 commented Jun 9, 2023

24-solar-terms commented Jun 9, 2023

LiJunnan1992 commented Jun 9, 2023

zhangqingwu commented Jun 9, 2023 •

edited

Loading

Scarecrow0 commented Jun 12, 2023 •

edited

Loading

Scarecrow0 commented Jun 12, 2023

STK101 commented Jun 23, 2023 •

edited

Loading

ustcwhy commented Jul 9, 2023

STK101 commented Jul 10, 2023

Zhiyuan-Fan commented Aug 1, 2023

Cuiunbo commented Sep 22, 2023

zzzzzero commented Oct 7, 2023

kochsebastian commented Nov 9, 2023

CUDAerror: batch inference instructblip #328

CUDAerror: batch inference instructblip #328

Comments

zhangqingwu commented May 25, 2023 • edited Loading

Scarecrow0 commented Jun 1, 2023

24-solar-terms commented Jun 7, 2023

LiJunnan1992 commented Jun 9, 2023

24-solar-terms commented Jun 9, 2023

LiJunnan1992 commented Jun 9, 2023

zhangqingwu commented Jun 9, 2023 • edited Loading

Scarecrow0 commented Jun 12, 2023 • edited Loading

Scarecrow0 commented Jun 12, 2023

STK101 commented Jun 23, 2023 • edited Loading

ustcwhy commented Jul 9, 2023

STK101 commented Jul 10, 2023

Zhiyuan-Fan commented Aug 1, 2023

Cuiunbo commented Sep 22, 2023

zzzzzero commented Oct 7, 2023

kochsebastian commented Nov 9, 2023

zhangqingwu commented May 25, 2023 •

edited

Loading

zhangqingwu commented Jun 9, 2023 •

edited

Loading

Scarecrow0 commented Jun 12, 2023 •

edited

Loading

STK101 commented Jun 23, 2023 •

edited

Loading