[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` #29975

faaany · 2024-04-01T00:56:45Z

What does this PR do?

Fix the below failing test:

(dev) (base) [fanli@skyocean transformers]$ RUN_SLOW=1 pytest tests/pipelines/test_pipelines_image_to_text.py::Imag
eToTextPipelineTests::test_conditional_generation_llava -v -rA
=============================================== test session starts ===============================================
platform linux -- Python 3.10.13, pytest-7.4.4, pluggy-1.4.0 -- /home/fanli/.conda/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /mnt/disk4/fanlilin/transformers
configfile: pyproject.toml
plugins: anyio-4.2.0, xdist-3.5.0, timeout-2.3.1, env-1.1.3, excel-1.6.0
collected 1 item                                                                                                  

tests/pipelines/test_pipelines_image_to_text.py::ImageToTextPipelineTests::test_conditional_generation_llava FAILED [100%]
==================================================== FAILURES =====================================================
___________________________ ImageToTextPipelineTests.test_conditional_generation_llava ____________________________

self = <tests.pipelines.test_pipelines_image_to_text.ImageToTextPipelineTests testMethod=test_conditional_generation_llava>

    @slow
    @require_torch
    def test_conditional_generation_llava(self):
        pipe = pipeline("image-to-text", model="llava-hf/bakLlava-v1-hf")
    
        prompt = (
            "<image>\nUSER: What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud?\nASSISTANT:"
        )
    
        outputs = pipe(
            "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg",
            prompt=prompt,
            generate_kwargs={"max_new_tokens": 200},
        )
>       self.assertEqual(
            outputs,
            [
                {
                    "generated_text": "<image> \nUSER: What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud?\nASSISTANT: Lava"
                }
            ],
        )
E       AssertionError: Lists differ: [{'generated_text': '\nUSER: What does the label 15 represent?[59 chars]va'}] != [{'generated_text': '<image> \nUSER: What does the label 15 re[67 chars]va'}]
E       
E       First differing element 0:
E       {'generated_text': '\nUSER: What does the label 15 represent?[58 chars]ava'}
E       {'generated_text': '<image> \nUSER: What does the label 15 re[66 chars]ava'}
E       
E       - [{'generated_text': '\n'
E       + [{'generated_text': '<image> \n'
E       ?                      ++++++++
E       
E                             'USER: What does the label 15 represent? (1) lava (2) core '
E                             '(3) tunnel (4) ash cloud?\n'
E                             'ASSISTANT: Lava'}]

tests/pipelines/test_pipelines_image_to_text.py:289: AssertionError
---------------------------------------------- Captured stderr call -----------------------------------------------
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.07it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
============================================= short test summary info =============================================
FAILED tests/pipelines/test_pipelines_image_to_text.py::ImageToTextPipelineTests::test_conditional_generation_llava - AssertionError: Lists differ: [{'generated_text': '\nUSER: What does the label 15 represent?[59 chars]va'}] !=...

The output of the Llava model doesn't contain the "" label. You can checkout the example in the model card as well. We should remove <image> and then the test passes:

=========================================================================== short test summary info ===========================================================================
PASSED tests/pipelines/test_pipelines_image_to_text.py::ImageToTextPipelineTests::test_conditional_generation_llava

Who can review?

@younesbelkada @ArthurZucker

ArthurZucker

thanks, I am getting the same locally!

HuggingFaceDocBuilderDev · 2024-04-01T11:29:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ional_generation_llava` (#29975) bug fix

bug fix

15229bc

ArthurZucker approved these changes Apr 1, 2024

View reviewed changes

ArthurZucker merged commit e4f5b57 into huggingface:main Apr 1, 2024
18 checks passed

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

[tests] fix the wrong output in `ImageToTextPipelineTests.test_condit…

ecaa3bc

…ional_generation_llava` (#29975) bug fix

itazap pushed a commit that referenced this pull request May 14, 2024

[tests] fix the wrong output in `ImageToTextPipelineTests.test_condit…

fce23bd

…ional_generation_llava` (#29975) bug fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` #29975

[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` #29975

faaany commented Apr 1, 2024

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Apr 1, 2024

[tests] fix the wrong output in ImageToTextPipelineTests.test_conditional_generation_llava #29975

[tests] fix the wrong output in ImageToTextPipelineTests.test_conditional_generation_llava #29975

Conversation

faaany commented Apr 1, 2024

What does this PR do?

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 1, 2024

[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` #29975

[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` #29975