Object Detection Pipeline only outputs first element when batching #31356

simonschoenhofen · 2024-06-10T14:48:26Z

System Info

transformers version: 4.41.2
Platform: Linux-5.4.0-182-generic-x86_64-with-glibc2.31
Python version: 3.11.8
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.2
Accelerate version: 0.30.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

When running the ObjectDetectionPipeline in a batch, the output will only be the bounding boxes of the first input image due to ObjectDetectionPipeline.py accessing element [0] in postprocessing and not looping over all outputs.

transformers/src/transformers/pipelines/object_detection.py

Line 150 in a4e1a1d

raw_annotation = raw_annotations[0]

This accesses only and always the first element, instead of looping over all outputs.

Only the first element is accessed in postprocessing

Who can help?

@Narsil

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Create object detection Pipeline

pipe = pipeline("object-detection", model=model_name, image_processor=preprocessor_name, device=device)

use batch inference with batch_size > 2.
for out in tqdm(pipe(dataset, batch_size=batch_size)):

Expected behavior

Expected Output: 2 Elements with each x items (bboxes).
Actual Output, only bboxes of the first input element.

The text was updated successfully, but these errors were encountered:

simonschoenhofen · 2024-06-10T15:06:55Z

It looks like that the output of the _forward function is correct (batch).
However, the input of postprocess is not a batch anymore, it is only the first element of the batch

NielsRogge · 2024-06-10T18:00:16Z

cc @qubvel

amyeroberts · 2024-06-10T18:00:32Z

Hi @simonschoenhofen, thanks for reporting this!

Would you like to open a PR to address this?

simonschoenhofen · 2024-06-10T20:34:08Z

@amyeroberts Will do tomorrow

amyeroberts · 2024-08-05T12:30:04Z

Adding a good first issue label in case anyone from the community wants to add this

qubvel · 2024-08-06T10:28:53Z

hmm.. I was not able to reproduce the bug, the following example works fine. @simonschoenhofen were you able to solve this issue?

from transformers import pipeline

url = 'http://images.cocodataset.org/val2017/000000039769.jpg' 
pipe = pipeline("object-detection", model="PekingU/rtdetr_r50vd", device="cuda")

results = pipe([url] * 4, batch_size=2)

for i, result in enumerate(results):
    print(f"Image {i}:\n{result}\n")

Image 0:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 1:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 2:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

Image 3:
[{'score': 0.9704199433326721, 'label': 'sofa', 'box': {'xmin': 0, 'ymin': 0, 'xmax': 640, 'ymax': 476}}, {'score': 0.9599390625953674, 'label': 'cat', 'box': {'xmin': 343, 'ymin': 24, 'xmax': 640, 'ymax': 371}}, {'score': 0.9575842022895813, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 54, 'xmax': 318, 'ymax': 472}}, {'score': 0.9506626129150391, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 73, 'xmax': 175, 'ymax': 118}}, {'score': 0.9237849116325378, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 76, 'xmax': 369, 'ymax': 186}}]

royvelich · 2024-08-06T10:53:46Z

@qubvel
Well, apparently, for grounding-dino, you have to do something like this in order to run batches in pipeline:

import requests
from PIL import Image
from transformers import pipeline

device = "cuda"

detector = pipeline(model="IDEA-Research/grounding-dino-tiny", task="zero-shot-object-detection", device=device)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
    "a cat. a remote control.",
    "a cat. a remote control. a sofa.",
]

data = [{'image': image, 'candidate_labels': text} for image, text in zip(images, texts)]

results = detector(data)

print(results)

qubvel · 2024-08-06T11:15:48Z

@royvelich Yes this example works, but the results are weird and do not match with results running the model outside of the pipeline. While I suspect that "object-detection" pipeline doesn't have an issue, it looks like "zero-shot-object-detection" pipeline is not working properly with grounding dino

royvelich · 2024-08-06T11:47:16Z

@qubvel So, do you recommend using your example for now and avoiding the pipeline?

qubvel · 2024-08-06T12:05:03Z

@royvelich yes, please, use grounding dino model, not a pipeline, while we investigating the issue

shankram · 2024-08-29T16:39:19Z

@qubvel the postprocess function in the zero_shot_object_detection pipeline calls image_processor.post_process_object_detection:

def postprocess(self, model_outputs, threshold=0.1, top_k=None):
        results = []
        for model_output in model_outputs:
            label = model_output["candidate_label"]
            model_output = BaseModelOutput(model_output)
            outputs = self.image_processor.post_process_object_detection(
                outputs=model_output, threshold=threshold, target_sizes=model_output["target_size"]
            )[0]
        ...

while the grounding dino hf post says post_process_grounded_object_detection should be used when the text has multiple classes (eg. 'a cat. a remote.').

The post_process_grounded_object_detection method requires input_ids which aren't passed to the postprocess function in the pipeline. Could you please tell me how to fix this without breaking the pipeline for other models? I'd be happy to open a PR and make my first contribution 🤗

qubvel · 2024-08-29T17:23:02Z

Hi @shankram, thank you for investigating this! Indeed there is a problem with the pipeline for zero-shot object detection for some models.

I've prepared a PR fixing a pipeline, its not yet merged, but already functional

Fix zero shot detection pipeline #32490

amyeroberts added Core: Pipeline Internals of the library; Pipeline. Vision labels Jun 10, 2024

huggingface deleted a comment from github-actions bot Jul 11, 2024

huggingface deleted a comment from github-actions bot Aug 5, 2024

amyeroberts added the Good First Issue label Aug 5, 2024

qubvel mentioned this issue Aug 6, 2024

Does GroundingDINO support batched inference? #32206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object Detection Pipeline only outputs first element when batching #31356

Object Detection Pipeline only outputs first element when batching #31356

simonschoenhofen commented Jun 10, 2024 •

edited

Loading

simonschoenhofen commented Jun 10, 2024

NielsRogge commented Jun 10, 2024

amyeroberts commented Jun 10, 2024

simonschoenhofen commented Jun 10, 2024

amyeroberts commented Aug 5, 2024

qubvel commented Aug 6, 2024

royvelich commented Aug 6, 2024

qubvel commented Aug 6, 2024 •

edited

Loading

royvelich commented Aug 6, 2024

qubvel commented Aug 6, 2024

shankram commented Aug 29, 2024

qubvel commented Aug 29, 2024

Object Detection Pipeline only outputs first element when batching #31356

Object Detection Pipeline only outputs first element when batching #31356

Comments

simonschoenhofen commented Jun 10, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

simonschoenhofen commented Jun 10, 2024

NielsRogge commented Jun 10, 2024

amyeroberts commented Jun 10, 2024

simonschoenhofen commented Jun 10, 2024

amyeroberts commented Aug 5, 2024

qubvel commented Aug 6, 2024

royvelich commented Aug 6, 2024

qubvel commented Aug 6, 2024 • edited Loading

royvelich commented Aug 6, 2024

qubvel commented Aug 6, 2024

shankram commented Aug 29, 2024

qubvel commented Aug 29, 2024

simonschoenhofen commented Jun 10, 2024 •

edited

Loading

qubvel commented Aug 6, 2024 •

edited

Loading