Samhq model addition #35147

sushmanthreddy · 2024-12-08T09:14:24Z

Pull Request Title: Add HQ-SAM Functionality to Transformers Library

Model Overview

HQ-SAM (Segment Anything in High Quality) is an enhanced version of the Segment Anything Model (SAM), addressing limitations in mask quality for intricate structures and challenging segmentation tasks. The model refines SAM’s predictions using a High-Quality Output Token and Global-Local Feature Fusion while preserving SAM’s efficiency and zero-shot generalization capabilities.

According to the original implementation, HQ-SAM significantly improves mask boundaries and reduces segmentation errors by introducing minimal additional parameters (<0.5%) and computational overhead. The model is designed to maintain compatibility with SAM’s existing prompt-based design and mask decoder architecture.

Repository and Weights

The HQ-SAM implementation and pre-trained weights are available in the following repository:
https://github.com/SysCV/sam-hq

HQ-SAM provides three pre-trained weight variants:

sam_hq_vit_b – Small vision encoder.
sam_hq_vit_l – Medium vision encoder.
sam_hq_vit_h – Large vision encoder.

The main difference between these variants is the size of the Vision Transformer (ViT) encoder, while the prompt encoder and mask decoder remain unchanged.

Functionality

For each input (e.g., bounding boxes, 2D points, or coarse masks), HQ-SAM predicts high-quality binary masks that enhance segmentation precision. Improvements include:

More accurate boundaries.
Correction of coarse masks and segmentation errors.
Enhanced detail preservation for thin structures and complex object geometries.

Reviewers: @molbap

sushmanthreddy · 2025-04-24T10:38:26Z

@molbap any further changes needed??for this pr ??

molbap

Looks good, tests seem to pass! I'll merge this as soon as I can - have limited connectivity right now and likely blocking the auth :)

molbap · 2025-04-28T10:16:40Z

I'm not sure what is blocking the merge, @sushmanthreddy can you update your branch with most recent main changes?

changes are made to updatye with latest code

sushmanthreddy · 2025-04-28T14:26:36Z

@molbap updated with latest code synced with main, can u see once?

molbap

Seems it fixed the weird branch status - can you try the fix for the init_weights test?

molbap · 2025-04-28T15:57:39Z

src/transformers/models/sam_hq/modeling_sam_hq.py

+        elif isinstance(module, SamHQVisionAttention):
+            if module.use_rel_pos:
+                module.rel_pos_h.data.zero_()
+                module.rel_pos_w.data.zero_()


Tests are broken by #37070 which ensure weight initialization is done properly 😅 I think the following ought to fix it, can you try?

Suggested change

elif isinstance(module, SamHQVisionAttention):

if module.use_rel_pos:

module.rel_pos_h.data.zero_()

module.rel_pos_w.data.zero_()

elif isinstance(module, SamHQVisionAttention):

if module.use_rel_pos:

module.rel_pos_h.data.zero_()

module.rel_pos_w.data.zero_()

elif isinstance(module, SamHQVisionEncoder):

if module.pos_embed is not None:

module.pos_embed.data.zero_()

Lol,I was beating my head around ,where I want wrong while merging with main branch

added this in modular file

molbap · 2025-04-28T17:07:44Z

Merged! Congratulations @sushmanthreddy for the thorough work and the great model addition! I'll post about it soon :D

sushmanthreddy · 2025-04-29T17:25:28Z

Merged! Congratulations @sushmanthreddy for the thorough work and the great model addition! I'll post about it soon :D

Thanks ,That was great learning process about transformers api

NielsRogge · 2025-04-30T10:19:20Z

Hi @sushmanthreddy thanks for working on this, amazing contribution!

Some things we could do as next steps:

I noticed only https://huggingface.co/sushmanth/sam_hq_vit_b is on the hub, would be great to convert the large, huge and tiny variants too: https://github.com/SysCV/sam-hq?tab=readme-ov-file#model-checkpoints
would be great to add a model card to https://huggingface.co/sushmanth/sam_hq_vit_b. Perhaps we can move it to a "community" org on the hub so that the HF team can maintain the checkpoints (cc @molbap).
I tried to use HQ-SAM as a drop-in replacement in this notebook, but using cached image embeddings doesn't seem to work, you need to pass the pixel_values everytime. Could we support passing image_embeddings (which are the pre-computed image embeddings which already went through the vision encoder)?

molbap · 2025-04-30T12:22:10Z

Hey @NielsRogge! For the embeddings the model supports them - it's just that there are both image_embeddings and intermediate_embeddings that now need to be passed. I tried it like that

    if "hq" in MODEL_ID:
        image_embeddings, intermediate_embeddings = model.get_image_embeddings(inputs["pixel_values"])
    else:
        image_embeddings = model.get_image_embeddings(inputs["pixel_values"])
    # ... couple lines below
    if "hq" in MODEL_ID:
        inputs.update({"intermediate_embeddings": intermediate_embeddings})

and looks like it works (left, original, right, HQ)

molbap · 2025-04-30T12:34:10Z

For the checkpoints indeed missed it, @sushmanthreddy could you move your checkpoint to a new org with -community for maintainability? I created that one that ought to do it - https://huggingface.co/syscv-community, if you move your checkpoint here we can indeed handle maintenance (+ add credits to original research team + apache 2.0 license)
:) And let me know if you have time to convert & test other versions, else I'll do it!

sushmanthreddy · 2025-04-30T12:40:05Z

@molbap I will do it today or tomorrow

NielsRogge · 2025-05-05T09:31:21Z

Hi @sushmanthreddy let us know if you need any help. We can also add you to https://huggingface.co/syscv-community.

sushmanthreddy · 2025-05-05T10:09:00Z

@NielsRogge I have got busy with personal work,I will raise pull request over huggingface hub..

Just by today night 12:00 pm IST

sushmanthreddy · 2025-05-05T21:37:47Z

Having some network issue wasn't able to push weights ,I will update the weights tomorrow on hub with readme file

NielsRogge · 2025-05-08T07:55:47Z

Hi @sushmanthreddy saw that you've uploaded everything here: https://huggingface.co/syscv-community, awesome!

Btw does the tiny variant require a different architecture?

* added the configuartion for sam_hq * added the modeelling for sam_hq * added the sam hq mask decoder with hq features * added the code for the samhq * added the code for the samhq * added the code for the samhq * Delete src/transformers/models/sam_hq/modelling_sam_hq.py * added the code for the samhq * added the code for the samhq * added the chnages for the modeelling * added the code for sam hq for image processing * added code for the sam hq model * added the required changes * added the changes * added the key mappings for the sam hq * adding the working code of samhq * added the required files * adding the pt object * added the push to hub account * added the args for the sam maks decoder * added the args for the sam hq vision config * aded the some more documentation * removed the unecessary spaces * all required chnages * removed the image processor * added the required file * added the changes for the checkcopies * added the code for modular file * added the changes for the __init file * added the code for the interm embeds * added the code for sam hq * added the changes for modular file * added the test file * added the changes required * added the changes required * added the code for the * added the cl errors * added the changes * added the required changes * added the some code * added the code for the removing image processor * added the test dimensins * added the code for the removing extra used variables * added the code for modeluar file hf_mlp for a better name * removed abbrevaation in core functionality * removed abbrevaation in core functionality * .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory * added the code which is after make fixup * added some test for the intermediate embeddings test * added the code for the torch support in sam hq * added the code for the updated modular file * added the changes for documentations as mentioned * removed the heading * add the changes for the code * first mentioned issue resolved * added the changes code to processor * added the easy loading to init file * added the changes to code * added the code to changes * added the code to work * added the code for sam hq * added the code for sam hq * added the code for the point pad value * added the small test for the image embeddings and intermediate embedding * added the code * added the code * added the code for the tests * added the code * added ythe code for the processor file * added the code * added the code * added the code * added the code * added the code * added the code for tests and some checks * added some code * added the code * added the code * added some code * added some code * added the changes for required * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added some changes * added some changes * removed spaces and quality checks * added some code * added some code * added some code * added code quality checks * added the checks for quality checks * addded some code which fixes test_inference_mask_generation_no_point * added code for the test_inference_mask_generation_one_point_one_bb * added code for the test_inference_mask_generation_one_point_one_bb_zero * added code for the test_inference_mask_generation_one_box * added some code in modelling for testing * added some code which sort maks with high score * added some code * added some code * added some code for the move KEYS_TO_MODIFY_MAPPING * added some code for the unsqueeze removal * added some code for the unsqueeze removal * added some code * added some code * add some code * added some code * added some code * added some testign values changed * added changes to code in sam hq for readbility purpose * added pre commit checks * added the fix samvisionmodel for compatibilty * added the changes made on sam by cyyever * fixed the tests for samhq * added some the code * added some code related to init file issue during merge conflicts * remobved the merge conflicts * added changes mentioned by aruther and mobap * added changes mentioned by aruther and mobap * solving quality checks * added the changes for input clearly * added the changes * added changes in mask generation file rgearding model inputs and sam hq quargs in processor file * added changes in processor file * added the Setup -> setupclass conversion * added the code mentioned for processor * added changes for the code * added some code * added some code * added some code --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

sushmanthreddy · 2025-05-15T06:43:26Z

Hi @sushmanthreddy saw that you've uploaded everything here: https://huggingface.co/syscv-community, awesome!

Btw does the tiny variant require a different architecture?

@NielsRogge I have gone through there code to integrate the small models also.But issue is they link

they use small vit architecture ,it could be intergated am seeing possible ways to integrate into sam hq .
Right now am working with the DEIM model ,will add this feature after adding that deim model in hf .

sbucaille · 2025-06-07T21:03:10Z

Hi,
I noticed the MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variable in modeling_auto.py is defined twice in your PR.
I don't think it is intended, I opened a PR to solve this 😄

sushmanthreddy added 2 commits December 8, 2024 14:41

added the configuartion for sam_hq

a284572

added the modeelling for sam_hq

2a3caa2

sushmanthreddy marked this pull request as draft December 8, 2024 09:15

added the sam hq mask decoder with hq features

92f291a

qubvel added New model Vision run-slow labels Dec 8, 2024

sushmanthreddy and others added 21 commits December 9, 2024 02:19

added the code for the samhq

395a5a5

added the code for the samhq

091da86

added the code for the samhq

2fa1ac4

Delete src/transformers/models/sam_hq/modelling_sam_hq.py

7453aad

added the code for the samhq

419575a

added the code for the samhq

138c1f3

added the chnages for the modeelling

0579d8a

added the code for sam hq for image processing

c7c0e81

added code for the sam hq model

2c8e5d9

added the required changes

2d836c4

added the changes

dc8b7e3

added the key mappings for the sam hq

6c454bc

adding the working code of samhq

d109255

added the required files

682cf0b

adding the pt object

f2564ef

added the push to hub account

896eb7c

added the args for the sam maks decoder

0869e0a

added the args for the sam hq vision config

f8b8c30

aded the some more documentation

7edc5a5

removed the unecessary spaces

4304188

all required chnages

395956b

sushmanthreddy marked this pull request as ready for review December 20, 2024 00:48

removed the image processor

f3d63ba

added the code mentioned for processor

d63ab64

sushmanthreddy requested a review from molbap April 22, 2025 18:35

molbap approved these changes Apr 26, 2025

View reviewed changes

sushmanthreddy force-pushed the samhq branch from 2de1a95 to d63ab64 Compare April 28, 2025 14:23

Merge branch 'main' into samhq

8855d4e

changes are made to updatye with latest code

sushmanthreddy added 3 commits April 28, 2025 20:02

added changes for the code

f1c0a30

added some code

3f021b2

added some code

4c2e37e

molbap reviewed Apr 28, 2025

View reviewed changes

sushmanthreddy and others added 2 commits April 28, 2025 21:43

added some code

98954a2

Merge branch 'main' into samhq

4deb632

molbap merged commit 65e9402 into huggingface:main Apr 28, 2025
20 checks passed

NielsRogge mentioned this pull request May 8, 2025

HQ-SAM is now available in 🤗 Transformers SysCV/sam-hq#155

Open

sbucaille mentioned this pull request Jun 7, 2025

MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variable is present twice in modeling_auto.py #38663

Closed

Samhq model addition #35147

Samhq model addition #35147

Uh oh!

Conversation

sushmanthreddy commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Overview

Repository and Weights

Functionality

Uh oh!

sushmanthreddy commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap commented Apr 28, 2025

Uh oh!

sushmanthreddy commented Apr 28, 2025

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

sushmanthreddy Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

sushmanthreddy Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap commented Apr 28, 2025

Uh oh!

sushmanthreddy commented Apr 29, 2025

Uh oh!

NielsRogge commented Apr 30, 2025

Uh oh!

molbap commented Apr 30, 2025

Uh oh!

molbap commented Apr 30, 2025

Uh oh!

sushmanthreddy commented Apr 30, 2025

Uh oh!

NielsRogge commented May 5, 2025

Uh oh!

sushmanthreddy commented May 5, 2025

Uh oh!

sushmanthreddy commented May 5, 2025

Uh oh!

NielsRogge commented May 8, 2025

Uh oh!

sushmanthreddy commented May 15, 2025

Uh oh!

sbucaille commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

sushmanthreddy commented Dec 8, 2024 •

edited

Loading

sushmanthreddy commented Apr 24, 2025 •

edited

Loading

sushmanthreddy Apr 28, 2025 •

edited

Loading