Skip to content

Conversation

@sushmanthreddy
Copy link
Contributor

@sushmanthreddy sushmanthreddy commented Dec 8, 2024

close #31137

Pull Request Title: Add HQ-SAM Functionality to Transformers Library

Model Overview

HQ-SAM (Segment Anything in High Quality) is an enhanced version of the Segment Anything Model (SAM), addressing limitations in mask quality for intricate structures and challenging segmentation tasks. The model refines SAM’s predictions using a High-Quality Output Token and Global-Local Feature Fusion while preserving SAM’s efficiency and zero-shot generalization capabilities.

According to the original implementation, HQ-SAM significantly improves mask boundaries and reduces segmentation errors by introducing minimal additional parameters (<0.5%) and computational overhead. The model is designed to maintain compatibility with SAM’s existing prompt-based design and mask decoder architecture.

Repository and Weights

The HQ-SAM implementation and pre-trained weights are available in the following repository:
https://github.com/SysCV/sam-hq

HQ-SAM provides three pre-trained weight variants:

  • sam_hq_vit_b – Small vision encoder.
  • sam_hq_vit_l – Medium vision encoder.
  • sam_hq_vit_h – Large vision encoder.

The main difference between these variants is the size of the Vision Transformer (ViT) encoder, while the prompt encoder and mask decoder remain unchanged.

Functionality

For each input (e.g., bounding boxes, 2D points, or coarse masks), HQ-SAM predicts high-quality binary masks that enhance segmentation precision. Improvements include:

  • More accurate boundaries.
  • Correction of coarse masks and segmentation errors.
  • Enhanced detail preservation for thin structures and complex object geometries.

Reviewers: @molbap

@sushmanthreddy sushmanthreddy marked this pull request as draft December 8, 2024 09:15
@sushmanthreddy sushmanthreddy marked this pull request as ready for review December 20, 2024 00:48
@sushmanthreddy sushmanthreddy requested a review from molbap April 22, 2025 18:35
@sushmanthreddy
Copy link
Contributor Author

sushmanthreddy commented Apr 24, 2025

@molbap any further changes needed??for this pr ??

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests seem to pass! I'll merge this as soon as I can - have limited connectivity right now and likely blocking the auth :)

@molbap
Copy link
Contributor

molbap commented Apr 28, 2025

I'm not sure what is blocking the merge, @sushmanthreddy can you update your branch with most recent main changes?

changes are made to updatye with latest code
@sushmanthreddy
Copy link
Contributor Author

@molbap updated with latest code synced with main, can u see once?

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it fixed the weird branch status - can you try the fix for the init_weights test?

Comment on lines +1145 to +1148
elif isinstance(module, SamHQVisionAttention):
if module.use_rel_pos:
module.rel_pos_h.data.zero_()
module.rel_pos_w.data.zero_()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are broken by #37070 which ensure weight initialization is done properly 😅 I think the following ought to fix it, can you try?

Suggested change
elif isinstance(module, SamHQVisionAttention):
if module.use_rel_pos:
module.rel_pos_h.data.zero_()
module.rel_pos_w.data.zero_()
elif isinstance(module, SamHQVisionAttention):
if module.use_rel_pos:
module.rel_pos_h.data.zero_()
module.rel_pos_w.data.zero_()
elif isinstance(module, SamHQVisionEncoder):
if module.pos_embed is not None:
module.pos_embed.data.zero_()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol,I was beating my head around ,where I want wrong while merging with main branch

Copy link
Contributor Author

@sushmanthreddy sushmanthreddy Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this in modular file

@molbap molbap merged commit 65e9402 into huggingface:main Apr 28, 2025
20 checks passed
@molbap
Copy link
Contributor

molbap commented Apr 28, 2025

Merged! Congratulations @sushmanthreddy for the thorough work and the great model addition! I'll post about it soon :D

@sushmanthreddy
Copy link
Contributor Author

Merged! Congratulations @sushmanthreddy for the thorough work and the great model addition! I'll post about it soon :D

Thanks ,That was great learning process about transformers api

@NielsRogge
Copy link
Contributor

Hi @sushmanthreddy thanks for working on this, amazing contribution!

Some things we could do as next steps:

@molbap
Copy link
Contributor

molbap commented Apr 30, 2025

Hey @NielsRogge! For the embeddings the model supports them - it's just that there are both image_embeddings and intermediate_embeddings that now need to be passed. I tried it like that

    if "hq" in MODEL_ID:
        image_embeddings, intermediate_embeddings = model.get_image_embeddings(inputs["pixel_values"])
    else:
        image_embeddings = model.get_image_embeddings(inputs["pixel_values"])
    # ... couple lines below
    if "hq" in MODEL_ID:
        inputs.update({"intermediate_embeddings": intermediate_embeddings})

and looks like it works (left, original, right, HQ)

image

@molbap
Copy link
Contributor

molbap commented Apr 30, 2025

For the checkpoints indeed missed it, @sushmanthreddy could you move your checkpoint to a new org with -community for maintainability? I created that one that ought to do it - https://huggingface.co/syscv-community, if you move your checkpoint here we can indeed handle maintenance (+ add credits to original research team + apache 2.0 license)
:) And let me know if you have time to convert & test other versions, else I'll do it!

@sushmanthreddy
Copy link
Contributor Author

@molbap I will do it today or tomorrow

@NielsRogge
Copy link
Contributor

Hi @sushmanthreddy let us know if you need any help. We can also add you to https://huggingface.co/syscv-community.

@sushmanthreddy
Copy link
Contributor Author

@NielsRogge I have got busy with personal work,I will raise pull request over huggingface hub..

Just by today night 12:00 pm IST

@sushmanthreddy
Copy link
Contributor Author

Having some network issue wasn't able to push weights ,I will update the weights tomorrow on hub with readme file

@NielsRogge
Copy link
Contributor

Hi @sushmanthreddy saw that you've uploaded everything here: https://huggingface.co/syscv-community, awesome!

Btw does the tiny variant require a different architecture?

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* added the configuartion for sam_hq

* added the modeelling for sam_hq

* added the sam hq mask decoder with hq features

* added the code for the samhq

* added the code for the samhq

* added the code for the samhq

* Delete src/transformers/models/sam_hq/modelling_sam_hq.py

* added the code for the samhq

* added the code for the samhq

* added the chnages for the modeelling

* added the code for sam hq for image processing

* added code for the sam hq model

* added the required changes

* added the changes

* added the key mappings for the sam hq

* adding the working code of samhq

* added the required files

* adding the pt object

* added the push to hub account

* added the args for the sam maks  decoder

* added the args for the sam hq vision config

* aded the some more documentation

* removed the unecessary spaces

* all required chnages

* removed the image processor

* added the required file

* added the changes for the checkcopies

* added the code for modular file

* added the changes for the __init file

* added the code for the interm embeds

* added the code for sam hq

* added the changes for modular file

* added the test file

* added the changes required

* added the changes required

* added the code for the

* added the cl errors

* added the changes

* added the required changes

* added the some code

* added the code for the removing image processor

* added the test dimensins

* added the code for the removing extra used variables

* added the code for modeluar file hf_mlp for a better name

* removed abbrevaation in core functionality

* removed abbrevaation in core functionality

* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory

* added the code which is after make fixup

* added some test for the intermediate embeddings test

* added the code for the torch support in sam hq

* added the code for the updated modular file

* added the changes for documentations as mentioned

* removed the heading

* add the changes for the code

* first mentioned issue resolved

* added the changes code to processor

* added the easy loading to init file

* added the changes to code

* added the code to changes

* added the code to work

* added the code for sam hq

* added the code for sam hq

* added the code for the point pad value

* added the small test for the image embeddings and intermediate embedding

* added the code

* added the code

* added the code for the tests

* added the code

* added ythe code for the processor file

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code for tests and some checks

* added some code

* added the code

* added the code

* added some code

* added some code

* added the changes for required

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added some changes

* added some changes

* removed spaces and quality checks

* added some code

* added some code

* added some code

* added code quality checks

* added the checks for quality checks

* addded some code which fixes test_inference_mask_generation_no_point

* added code for the test_inference_mask_generation_one_point_one_bb

* added code for the test_inference_mask_generation_one_point_one_bb_zero

* added code for the test_inference_mask_generation_one_box

* added some code in modelling for testing

* added some code which sort maks with high score

* added some code

* added some code

* added some code for the move KEYS_TO_MODIFY_MAPPING

* added some code for the  unsqueeze removal

* added some code for the  unsqueeze removal

* added some code

* added some code

* add some code

* added some code

* added some code

* added some testign values changed

* added changes to code in sam hq for readbility purpose

* added pre commit checks

* added the fix samvisionmodel for compatibilty

* added the changes made on sam by cyyever

* fixed the tests for samhq

* added some the code

* added some code related to init file issue during merge conflicts

* remobved the merge conflicts

* added changes mentioned by aruther and mobap

* added changes mentioned by aruther and mobap

* solving quality checks

* added the changes for input clearly

* added the changes

* added changes in mask generation file rgearding model inputs and  sam hq quargs  in processor file

* added changes in processor file

* added the  Setup -> setupclass conversion

* added the code mentioned for processor

* added changes for the code

* added some code

* added some code

* added some code

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
@sushmanthreddy
Copy link
Contributor Author

Hi @sushmanthreddy saw that you've uploaded everything here: https://huggingface.co/syscv-community, awesome!

Btw does the tiny variant require a different architecture?

@NielsRogge I have gone through there code to integrate the small models also.But issue is they link

they use small vit architecture ,it could be intergated am seeing possible ways to integrate into sam hq .
Right now am working with the DEIM model ,will add this feature after adding that deim model in hf .

@sbucaille
Copy link
Contributor

Hi,
I noticed the MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variable in modeling_auto.py is defined twice in your PR.
I don't think it is intended, I opened a PR to solve this 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SAM-HQ implementation in transformers

10 participants