Adds image-guided object detection support to OWL-ViT #18891

unography · 2022-09-04T21:31:31Z

This adds support for doing object detection with OWL-ViT using query image(s)

unography · 2022-09-04T21:34:47Z

Hi @alaradirik I added an initial version for the image-guided obj detection. I still have to add tests and some other cleanup, however, I've some doubts right now

Is the handling of query_embedding correct, while doing the mean and finding out the most dissimilar embedding?
How should the postprocessor handle this case, when there are no labels as such for this?
Any other implementation details I may have missed

HuggingFaceDocBuilderDev · 2022-09-04T21:46:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

alaradirik

Hi Dhruv,

Thank you for your contribution! I realized that I accidentally mislead you when discussing the one-shot detection implementation. I thought the query image would be embedded by the unmodified base model (OwlViTModel) but here is the correct method as described in the Appendix A1.7 of the paper:

Query image is embedded the same way as the target image is embedded (which you already implemented)
query_image_embeds is forward propagated through class and box prediction heads to retrieve the class embeddings and box preds. The goal of retrieving box predictions is to choose one class embedding / prediction such that it is (1) dissimilar from rest of the predicted class embeddings and (2) has a corresponding box prediction that yields a high intersection of union with the query image.

This part corresponds to def embed_image_query() function at line 110 in the original repo over here. You can disregard the query_box_yxyx argument.

The selected class_embedding / query_embedding is used to query the target image in the same way text query_embeds are used within OwlViTForObjectDetection. You can see an example in the original repo over here (
_image_conditioning_py_callback() function)

Hope this helps and sorry for the delay.

src/transformers/models/owlvit/processing_owlvit.py

src/transformers/models/owlvit/modeling_owlvit.py

unography · 2022-09-06T18:49:50Z

Hi @alaradirik, I made the changes as per the review comments, could you check if they're fine?

I'm working on test cases currently. In the file here, is it okay if I reuse pixel_values itself for query_pixel_values?

So the above line would return

return config, pixel_values, input_ids, attention_mask, pixel_values

and be re-used as

config_and_inputs = self.prepare_config_and_inputs()
config, pixel_values, input_ids, attention_mask, query_pixel_values = config_and_inputs

And apart from the test cases, are there any other changes that I need to make?

alaradirik · 2022-09-07T16:54:05Z

Hi @alaradirik, I made the changes as per the review comments, could you check if they're fine?

I'm working on test cases currently. In the file here, is it okay if I reuse pixel_values itself for query_pixel_values?

So the above line would return
return config, pixel_values, input_ids, attention_mask, pixel_values
and be re-used as
config_and_inputs = self.prepare_config_and_inputs()
config, pixel_values, input_ids, attention_mask, query_pixel_values = config_and_inputs
And apart from the test cases, are there any other changes that I need to make?

Hi @unography, thank you for the contribution once again!

As for your question regarding the tests, yes, it'd make sense to return config, pixel_values, input_ids, attention_mask, pixel_values with OwlViTForObjectDetectionTester.prepare_config_and_inputs().

We can add a line to this function to create query_pixel_values as follows:
query_pixel_values = floats_tensor([self.batch_size, self.num_channels, self.query_image_size, self.query_image_size])

… img_guided_obj_det

alaradirik

The changes look good to me!

There are a few issues that needs to be addressed regarding a missing one-shot detection algorithm detail and state arguments left over from previous commits (from other PRs). Let me know if you have any questions.

src/transformers/models/owlvit/modeling_owlvit.py

alaradirik · 2022-09-07T19:11:49Z

src/transformers/models/owlvit/modeling_owlvit.py

@@ -1167,6 +1174,7 @@ def forward(

 class OwlViTForObjectDetection(OwlViTPreTrainedModel):
    config_class = OwlViTConfig
+    main_input_name = "pixel_values"


I believe this line caused test errors previously, will need to double check.

by default I think the main_input_name is input_ids. Before when it was just text guided detection, the first param was input_ids, and then pixel_values. Now, since input_ids can be None, I made pixel_values as the first param, which is why the test case was failing. Hence this change

That makes sense! I'd comment out the slow flags (tests with @slow decoraters are skipped) within tests/test_modeling_common.py and run all tests to double check this.

You can run the tests from the root of transformers repo as follows:

# All tests pytest tests/models/owlvit/test_modeling_owlvit.py # Run only integration tests pytest tests/models/owlvit/test_modeling_owlvit.py::OwlViTModelIntegrationTest # Run a single test pytest tests/models/owlvit/test_modeling_owlvit.py::OwlViTModelIntegrationTest::test_inference

This is too much of a breaking change, even for a newly released model.

src/transformers/models/owlvit/modeling_owlvit.py

tests/models/owlvit/test_modeling_owlvit.py

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

alaradirik

Thank you for the contribution once again! The code seems to be in great shape and I just left a couple of comments regarding minor style corrections and docstrings.

The only issue is the following tests fail:

OwlViTVisionModelTest.test_model
OwlViTVisionModelTest.test_model_outputs_equivalence
OwlViTModelTest.test_model_outputs_equivalence

I believe this is due to making pixel_values the main argument in OwlViTForObjectDetection.forward() but I couldn't pinpoint the exact issue. @ydshieh could you take a look at the test scripts when you have time?

src/transformers/models/owlvit/modeling_owlvit.py

src/transformers/models/owlvit/processing_owlvit.py

src/transformers/models/owlvit/modeling_owlvit.py

ydshieh · 2022-09-16T11:13:51Z

Thank you for the contribution once again! The code seems to be in great shape and I just left a couple of comments regarding minor style corrections and docstrings.

The only issue is the following tests fail:

OwlViTVisionModelTest.test_model

OwlViTVisionModelTest.test_model_outputs_equivalence

OwlViTModelTest.test_model_outputs_equivalence

I believe this is due to making pixel_values the main argument in OwlViTForObjectDetection.forward() but I couldn't pinpoint the exact issue. @ydshieh could you take a look at the test scripts when you have time?

Hi, I couldn't see any test being run by CI. Could you share the error messges?

@unography Could you follow the instruction below to refresh your CircleCI permission
https://support.circleci.com/hc/en-us/articles/360048210711-How-to-Refresh-User-Permissions-
so that the CI could be triggered. Thanks.

unography · 2022-09-16T11:53:39Z

Sure @alaradirik , I'll go through the review comments and make the changes. And actually, on my local, I'm able to get the test cases passed, on running

RUN_SLOW=1 pytest tests/models/owlvit/test_modeling_owlvit.py

I'll check once more

Hi @ydshieh , I'm not able to refresh the permission for some reason, I get an error Something Unexpected Happened on going to https://app.circleci.com/settings/user
I don't have a CircleCI account linked to my Github actually, not sure how to reset the token and run the tests

alaradirik · 2022-09-16T11:53:59Z

Hi, I couldn't see any test being run by CI. Could you share the error messges?

@ydshieh, of course, here is the full error log. return_dict argument is causing the errors but there hasn't been any changes in the modeling or test files to cause this error.

―――――――――――――――――――――――――――――――――――――――――――――― OwlViTVisionModelTest.test_model ――――――――――――――――――――――――――――――――――――――――――――――

self = <tests.models.owlvit.test_modeling_owlvit.OwlViTVisionModelTest testMethod=test_model>

    def test_model(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
>       self.model_tester.create_and_check_model(*config_and_inputs)

tests/models/owlvit/test_modeling_owlvit.py:181: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/models/owlvit/test_modeling_owlvit.py:123: in create_and_check_model
    self.parent.assertEqual(result.pooler_output.shape, (self.batch_size, num_patches + 1, self.hidden_size))
E   AssertionError: torch.Size([12, 32]) != (12, 257, 32)

 tests/models/owlvit/test_modeling_owlvit.py ⨯✓s✓                                                               13% █▍        

―――――――――――――――――――――――――――――――――――― OwlViTVisionModelTest.test_model_outputs_equivalence ――――――――――――――――――――――――――――――――――――

self = <tests.models.owlvit.test_modeling_owlvit.OwlViTVisionModelTest testMethod=test_model_outputs_equivalence>

    def test_model_outputs_equivalence(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        def set_nan_tensor_to_zero(t):
            t[t != t] = 0
            return t
    
        def check_equivalence(model, tuple_inputs, dict_inputs, additional_kwargs={}):
            with torch.no_grad():
                tuple_output = model(**tuple_inputs, return_dict=False, **additional_kwargs)
                dict_output = model(**dict_inputs, return_dict=True, **additional_kwargs).to_tuple()
    
                def recursive_check(tuple_object, dict_object):
                    if isinstance(tuple_object, (List, Tuple)):
                        for tuple_iterable_value, dict_iterable_value in zip(tuple_object, dict_object):
                            recursive_check(tuple_iterable_value, dict_iterable_value)
                    elif isinstance(tuple_object, Dict):
                        for tuple_iterable_value, dict_iterable_value in zip(
                            tuple_object.values(), dict_object.values()
                        ):
                            recursive_check(tuple_iterable_value, dict_iterable_value)
                    elif tuple_object is None:
                        return
                    else:
                        self.assertTrue(
                            torch.allclose(
                                set_nan_tensor_to_zero(tuple_object), set_nan_tensor_to_zero(dict_object), atol=1e-5
                            ),
                            msg=(
                                "Tuple and dict output are not equal. Difference:"
                                f" {torch.max(torch.abs(tuple_object - dict_object))}. Tuple has `nan`:"
                                f" {torch.isnan(tuple_object).any()} and `inf`: {torch.isinf(tuple_object)}. Dict has"
                                f" `nan`: {torch.isnan(dict_object).any()} and `inf`: {torch.isinf(dict_object)}."
                            ),
                        )
    
                recursive_check(tuple_output, dict_output)
    
        for model_class in self.all_model_classes:
            model = model_class(config)
            model.to(torch_device)
            model.eval()
    
            tuple_inputs = self._prepare_for_class(inputs_dict, model_class)
            dict_inputs = self._prepare_for_class(inputs_dict, model_class)
>           check_equivalence(model, tuple_inputs, dict_inputs)

tests/test_modeling_common.py:1548: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:1512: in check_equivalence
    tuple_output = model(**tuple_inputs, return_dict=False, **additional_kwargs)
/opt/miniconda3/envs/hf/lib/python3.8/site-packages/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
src/transformers/models/owlvit/modeling_owlvit.py:950: in forward
    return self.vision_model(
/opt/miniconda3/envs/hf/lib/python3.8/site-packages/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = OwlViTVisionTransformer(
  (embeddings): OwlViTVisionEmbeddings(
    (patch_embedding): Conv2d(3, 32, kernel_size=(2, ..., elementwise_affine=True)
      )
    )
  )
  (post_layernorm): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
)
pixel_values = tensor([[[[0.6554, 0.4061, 0.0338,  ..., 0.4825, 0.8356, 0.8248],
          [0.3508, 0.3514, 0.2522,  ..., 0.1101, 0.8...07, 0.7844, 0.0197,  ..., 0.9217, 0.2872, 0.7545],
          [0.6380, 0.8504, 0.1550,  ..., 0.4501, 0.0423, 0.5167]]]])
output_attentions = False, output_hidden_states = False, return_dict = False

    @add_start_docstrings_to_model_forward(OWLVIT_VISION_INPUTS_DOCSTRING)
    @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=OwlViTVisionConfig)
    def forward(
        self,
        pixel_values: torch.FloatTensor,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutputWithPooling]:
        r"""
        Returns:
    
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    
        hidden_states = self.embeddings(pixel_values)
        hidden_states = self.pre_layernorm(hidden_states)
        encoder_outputs = self.encoder(
            inputs_embeds=hidden_states,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
    
        last_hidden_state = encoder_outputs[0]
        pooled_output = last_hidden_state[:, 0, :]
        pooled_output = self.post_layernorm(pooled_output)
    
        return BaseModelOutputWithPooling(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
>           hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
        )
E       AttributeError: 'tuple' object has no attribute 'hidden_states'

src/transformers/models/owlvit/modeling_owlvit.py:903: AttributeError

 tests/models/owlvit/test_modeling_owlvit.py ⨯sssss✓s✓✓✓✓✓ss✓✓✓✓sssss✓✓✓s✓sss✓✓✓✓✓✓✓✓✓✓✓s✓✓✓s✓✓sssss✓s✓✓✓✓✓ss✓✓ 47% ████▋     
                                             ✓✓sssss✓s✓sss✓✓✓✓✓✓✓✓✓s✓s✓✓✓ss✓                                    63% ██████▍   

――――――――――――――――――――――――――――――――――――――― OwlViTModelTest.test_model_outputs_equivalence ―――――――――――――――――――――――――――――――――――――――

self = <tests.models.owlvit.test_modeling_owlvit.OwlViTModelTest testMethod=test_model_outputs_equivalence>

    def test_model_outputs_equivalence(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        def set_nan_tensor_to_zero(t):
            t[t != t] = 0
            return t
    
        def check_equivalence(model, tuple_inputs, dict_inputs, additional_kwargs={}):
            with torch.no_grad():
                tuple_output = model(**tuple_inputs, return_dict=False, **additional_kwargs)
                dict_output = model(**dict_inputs, return_dict=True, **additional_kwargs).to_tuple()
    
                def recursive_check(tuple_object, dict_object):
                    if isinstance(tuple_object, (List, Tuple)):
                        for tuple_iterable_value, dict_iterable_value in zip(tuple_object, dict_object):
                            recursive_check(tuple_iterable_value, dict_iterable_value)
                    elif isinstance(tuple_object, Dict):
                        for tuple_iterable_value, dict_iterable_value in zip(
                            tuple_object.values(), dict_object.values()
                        ):
                            recursive_check(tuple_iterable_value, dict_iterable_value)
                    elif tuple_object is None:
                        return
                    else:
                        self.assertTrue(
                            torch.allclose(
                                set_nan_tensor_to_zero(tuple_object), set_nan_tensor_to_zero(dict_object), atol=1e-5
                            ),
                            msg=(
                                "Tuple and dict output are not equal. Difference:"
                                f" {torch.max(torch.abs(tuple_object - dict_object))}. Tuple has `nan`:"
                                f" {torch.isnan(tuple_object).any()} and `inf`: {torch.isinf(tuple_object)}. Dict has"
                                f" `nan`: {torch.isnan(dict_object).any()} and `inf`: {torch.isinf(dict_object)}."
                            ),
                        )
    
                recursive_check(tuple_output, dict_output)
    
        for model_class in self.all_model_classes:
            model = model_class(config)
            model.to(torch_device)
            model.eval()
    
            tuple_inputs = self._prepare_for_class(inputs_dict, model_class)
            dict_inputs = self._prepare_for_class(inputs_dict, model_class)
>           check_equivalence(model, tuple_inputs, dict_inputs)

tests/test_modeling_common.py:1548: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:1512: in check_equivalence
    tuple_output = model(**tuple_inputs, return_dict=False, **additional_kwargs)
/opt/miniconda3/envs/hf/lib/python3.8/site-packages/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
src/transformers/models/owlvit/modeling_owlvit.py:1132: in forward
    vision_outputs = self.vision_model(
/opt/miniconda3/envs/hf/lib/python3.8/site-packages/torch/nn/modules/module.py:1110: in _call_impl
    return forward_call(*input, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = OwlViTVisionTransformer(
  (embeddings): OwlViTVisionEmbeddings(
    (patch_embedding): Conv2d(3, 32, kernel_size=(2, ..., elementwise_affine=True)
      )
    )
  )
  (post_layernorm): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
)
pixel_values = tensor([[[[0.4672, 0.5573, 0.4972,  ..., 0.3060, 0.1213, 0.4710],
          [0.1233, 0.0373, 0.8195,  ..., 0.5669, 0.8...20, 0.2224, 0.6059,  ..., 0.2634, 0.5912, 0.3576],
          [0.1761, 0.1272, 0.9066,  ..., 0.9368, 0.1087, 0.4829]]]])
output_attentions = False, output_hidden_states = False, return_dict = False

    @add_start_docstrings_to_model_forward(OWLVIT_VISION_INPUTS_DOCSTRING)
    @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=OwlViTVisionConfig)
    def forward(
        self,
        pixel_values: torch.FloatTensor,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutputWithPooling]:
        r"""
        Returns:
    
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    
        hidden_states = self.embeddings(pixel_values)
        hidden_states = self.pre_layernorm(hidden_states)
        encoder_outputs = self.encoder(
            inputs_embeds=hidden_states,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
    
        last_hidden_state = encoder_outputs[0]
        pooled_output = last_hidden_state[:, 0, :]
        pooled_output = self.post_layernorm(pooled_output)
    
        return BaseModelOutputWithPooling(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
>           hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
        )
E       AttributeError: 'tuple' object has no attribute 'hidden_states'

src/transformers/models/owlvit/modeling_owlvit.py:903: AttributeError

ydshieh · 2022-09-16T12:56:52Z

@alaradirik

https://github.com/huggingface/transformers/blob/bb61e30962c0a6cf866e7e8e5a75b7d86d8589c2/src/transformers/models/owlvit/modeling_owlvit.py

From the file, it looks like the latest version in this PR is different from the version that produced the error you provided above.

See

transformers/src/transformers/models/owlvit/modeling_owlvit.py

Lines 893 to 902 in bb61e30

    
           if not return_dict: 
        
               return (last_hidden_state, pooled_output) + encoder_outputs[1:] 
        
           return BaseModelOutputWithPooling( 
        
               last_hidden_state=last_hidden_state, 
        
               pooler_output=pooled_output, 
        
               hidden_states=encoder_outputs.hidden_states, 
        
               attentions=encoder_outputs.attentions, 
        
           )

where there is

        if not return_dict:
            return (last_hidden_state, pooled_output) + encoder_outputs[1:]

but not in your error message.

ydshieh · 2022-09-16T13:24:35Z

Sure @alaradirik , I'll go through the review comments and make the changes. And actually, on my local, I'm able to get the test cases passed, on running
RUN_SLOW=1 pytest tests/models/owlvit/test_modeling_owlvit.py
I'll check once more

Hi @ydshieh , I'm not able to refresh the permission for some reason, I get an error Something Unexpected Happened on going to https://app.circleci.com/settings/user I don't have a CircleCI account linked to my Github actually, not sure how to reset the token and run the tests

I triggered it :)

alaradirik · 2022-09-16T14:38:46Z

@ydshieh great, thank you! I hadn't pulled the latest changes on this branch.

@unography we can merge this PR once the remaining minor issues are addressed, thank you again for the clean implementation :)

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

unography · 2022-11-14T11:59:29Z

Hi @alaradirik, sorry my notifications got messed up, I was able to go through the comments only now. Do I need to change anything for merging? Upstream url or anything else?

alaradirik · 2022-11-14T15:20:01Z

Hi @alaradirik, sorry my notifications got messed up, I was able to go through the comments only now. Do I need to change anything for merging? Upstream url or anything else?

Hey @unography no problem at all! I'm about to merge a clean PR with the correct upstream. Could you give me your email address so that I can add you as the co-author to my commits?

unography · 2022-11-14T15:33:28Z

@alaradirik sure, this is my email - k4r4n.dhruv@gmail.com

github-actions · 2022-12-09T15:02:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

init commit

423ece2

unography changed the title ~~Adds image-guided object detection support to OWL-ViT~~ [WIP] Adds image-guided object detection support to OWL-ViT Sep 6, 2022

alaradirik reviewed Sep 6, 2022

View reviewed changes

unography added 2 commits September 6, 2022 23:45

fix review comments

665ae95

add main_input_name for OwlViTForObjectDetection

ef10934

unography added 3 commits September 7, 2022 23:59

indexing fix

5906273

modeling tests

3795987

Merge branch 'main' of https://github.com/unography/transformers into…

37dedeb

… img_guided_obj_det

alaradirik reviewed Sep 7, 2022

View reviewed changes

unography and others added 6 commits September 8, 2022 01:34

Update tests/models/owlvit/test_modeling_owlvit.py

8d96c8e

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

choose query embeddings based on iou threshold

3f772dd

remove img guided tests for now

23c09ff

return_projected remove

780ed99

objdet trace fix input passing

efdca2e

style fixes

bb61e30

alaradirik reviewed Sep 16, 2022

View reviewed changes

src/transformers/models/owlvit/modeling_owlvit.py Show resolved Hide resolved

alaradirik reviewed Sep 16, 2022

View reviewed changes

src/transformers/models/owlvit/modeling_owlvit.py Outdated Show resolved Hide resolved

unography marked this pull request as ready for review September 25, 2022 10:19

unography changed the title ~~[WIP] Adds image-guided object detection support to OWL-ViT~~ Adds image-guided object detection support to OWL-ViT Sep 25, 2022

unography and others added 20 commits November 9, 2022 12:40

types, docstrings

0da80b9

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

comment

00e7b79

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

var name change, types

8127026

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

var name change, types, docstrings

3bcb2c2

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

no need of use_hidden_state anymore

8f3aff0

add copied from statements

09ebd53

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

add copied from statements

4822fc0

squeeze fix

8f6c3fd

access only when nonempty

688fa99

rebase to main, add tests

e48506f

fix inconsistencies, None pred_logits

8d67b42

undo changes to copied function input signature

8834113

update docstrings

03ff0b2

add OwlViTForImageGuidedObjectDetection

8c74600

add image_guided_detection method

ca21141

add image-guided detection and its postprocessing method

02500ac

address reviews

dfb21e2

address reviews

08804bf

address reviews

5c5fa9f

return text and vision outputs

9ebd950

alaradirik force-pushed the img_guided_obj_det branch 2 times, most recently from 9ebd950 to bb61e30 Compare November 9, 2022 10:29

alaradirik added 2 commits November 9, 2022 13:32

fix merge conflict

a55f39c

run make fixup

60aa449

alaradirik mentioned this pull request Nov 9, 2022

Adds image-guided object detection support to OWL-ViT #20136

Merged

1 task

alaradirik closed this Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds image-guided object detection support to OWL-ViT #18891

Adds image-guided object detection support to OWL-ViT #18891

unography commented Sep 4, 2022 •

edited

Loading

unography commented Sep 4, 2022

HuggingFaceDocBuilderDev commented Sep 4, 2022

alaradirik left a comment

unography commented Sep 6, 2022

alaradirik commented Sep 7, 2022 •

edited

Loading

alaradirik left a comment

alaradirik Sep 7, 2022

unography Sep 7, 2022

alaradirik Sep 8, 2022 •

edited

Loading

sgugger Oct 21, 2022

alaradirik left a comment

ydshieh commented Sep 16, 2022

unography commented Sep 16, 2022

alaradirik commented Sep 16, 2022 •

edited

Loading

ydshieh commented Sep 16, 2022

ydshieh commented Sep 16, 2022

alaradirik commented Sep 16, 2022

unography commented Nov 14, 2022

alaradirik commented Nov 14, 2022

unography commented Nov 14, 2022

github-actions bot commented Dec 9, 2022

Adds image-guided object detection support to OWL-ViT #18891

Adds image-guided object detection support to OWL-ViT #18891

Conversation

unography commented Sep 4, 2022 • edited Loading

unography commented Sep 4, 2022

HuggingFaceDocBuilderDev commented Sep 4, 2022

alaradirik left a comment

Choose a reason for hiding this comment

unography commented Sep 6, 2022

alaradirik commented Sep 7, 2022 • edited Loading

alaradirik left a comment

Choose a reason for hiding this comment

alaradirik Sep 7, 2022

Choose a reason for hiding this comment

unography Sep 7, 2022

Choose a reason for hiding this comment

alaradirik Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

sgugger Oct 21, 2022

Choose a reason for hiding this comment

alaradirik left a comment

Choose a reason for hiding this comment

ydshieh commented Sep 16, 2022

unography commented Sep 16, 2022

alaradirik commented Sep 16, 2022 • edited Loading

ydshieh commented Sep 16, 2022

ydshieh commented Sep 16, 2022

alaradirik commented Sep 16, 2022

unography commented Nov 14, 2022

alaradirik commented Nov 14, 2022

unography commented Nov 14, 2022

github-actions bot commented Dec 9, 2022

unography commented Sep 4, 2022 •

edited

Loading

alaradirik commented Sep 7, 2022 •

edited

Loading

alaradirik Sep 8, 2022 •

edited

Loading

alaradirik commented Sep 16, 2022 •

edited

Loading