-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Samhq model addition #35147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samhq model addition #35147
Conversation
|
@molbap any further changes needed??for this pr ?? |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tests seem to pass! I'll merge this as soon as I can - have limited connectivity right now and likely blocking the auth :)
|
I'm not sure what is blocking the merge, @sushmanthreddy can you update your branch with most recent |
changes are made to updatye with latest code
|
@molbap updated with latest code synced with main, can u see once? |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it fixed the weird branch status - can you try the fix for the init_weights test?
| elif isinstance(module, SamHQVisionAttention): | ||
| if module.use_rel_pos: | ||
| module.rel_pos_h.data.zero_() | ||
| module.rel_pos_w.data.zero_() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are broken by #37070 which ensure weight initialization is done properly 😅 I think the following ought to fix it, can you try?
| elif isinstance(module, SamHQVisionAttention): | |
| if module.use_rel_pos: | |
| module.rel_pos_h.data.zero_() | |
| module.rel_pos_w.data.zero_() | |
| elif isinstance(module, SamHQVisionAttention): | |
| if module.use_rel_pos: | |
| module.rel_pos_h.data.zero_() | |
| module.rel_pos_w.data.zero_() | |
| elif isinstance(module, SamHQVisionEncoder): | |
| if module.pos_embed is not None: | |
| module.pos_embed.data.zero_() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol,I was beating my head around ,where I want wrong while merging with main branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this in modular file
|
Merged! Congratulations @sushmanthreddy for the thorough work and the great model addition! I'll post about it soon :D |
Thanks ,That was great learning process about transformers api |
|
Hi @sushmanthreddy thanks for working on this, amazing contribution! Some things we could do as next steps:
|
|
Hey @NielsRogge! For the embeddings the model supports them - it's just that there are both if "hq" in MODEL_ID:
image_embeddings, intermediate_embeddings = model.get_image_embeddings(inputs["pixel_values"])
else:
image_embeddings = model.get_image_embeddings(inputs["pixel_values"])
# ... couple lines below
if "hq" in MODEL_ID:
inputs.update({"intermediate_embeddings": intermediate_embeddings})and looks like it works (left, original, right, HQ) |
|
For the checkpoints indeed missed it, @sushmanthreddy could you move your checkpoint to a new org with |
|
@molbap I will do it today or tomorrow |
|
Hi @sushmanthreddy let us know if you need any help. We can also add you to https://huggingface.co/syscv-community. |
|
@NielsRogge I have got busy with personal work,I will raise pull request over huggingface hub.. Just by today night 12:00 pm IST |
|
Having some network issue wasn't able to push weights ,I will update the weights tomorrow on hub with readme file |
|
Hi @sushmanthreddy saw that you've uploaded everything here: https://huggingface.co/syscv-community, awesome! Btw does the tiny variant require a different architecture? |
* added the configuartion for sam_hq * added the modeelling for sam_hq * added the sam hq mask decoder with hq features * added the code for the samhq * added the code for the samhq * added the code for the samhq * Delete src/transformers/models/sam_hq/modelling_sam_hq.py * added the code for the samhq * added the code for the samhq * added the chnages for the modeelling * added the code for sam hq for image processing * added code for the sam hq model * added the required changes * added the changes * added the key mappings for the sam hq * adding the working code of samhq * added the required files * adding the pt object * added the push to hub account * added the args for the sam maks decoder * added the args for the sam hq vision config * aded the some more documentation * removed the unecessary spaces * all required chnages * removed the image processor * added the required file * added the changes for the checkcopies * added the code for modular file * added the changes for the __init file * added the code for the interm embeds * added the code for sam hq * added the changes for modular file * added the test file * added the changes required * added the changes required * added the code for the * added the cl errors * added the changes * added the required changes * added the some code * added the code for the removing image processor * added the test dimensins * added the code for the removing extra used variables * added the code for modeluar file hf_mlp for a better name * removed abbrevaation in core functionality * removed abbrevaation in core functionality * .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory * added the code which is after make fixup * added some test for the intermediate embeddings test * added the code for the torch support in sam hq * added the code for the updated modular file * added the changes for documentations as mentioned * removed the heading * add the changes for the code * first mentioned issue resolved * added the changes code to processor * added the easy loading to init file * added the changes to code * added the code to changes * added the code to work * added the code for sam hq * added the code for sam hq * added the code for the point pad value * added the small test for the image embeddings and intermediate embedding * added the code * added the code * added the code for the tests * added the code * added ythe code for the processor file * added the code * added the code * added the code * added the code * added the code * added the code for tests and some checks * added some code * added the code * added the code * added some code * added some code * added the changes for required * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added the code * added some changes * added some changes * removed spaces and quality checks * added some code * added some code * added some code * added code quality checks * added the checks for quality checks * addded some code which fixes test_inference_mask_generation_no_point * added code for the test_inference_mask_generation_one_point_one_bb * added code for the test_inference_mask_generation_one_point_one_bb_zero * added code for the test_inference_mask_generation_one_box * added some code in modelling for testing * added some code which sort maks with high score * added some code * added some code * added some code for the move KEYS_TO_MODIFY_MAPPING * added some code for the unsqueeze removal * added some code for the unsqueeze removal * added some code * added some code * add some code * added some code * added some code * added some testign values changed * added changes to code in sam hq for readbility purpose * added pre commit checks * added the fix samvisionmodel for compatibilty * added the changes made on sam by cyyever * fixed the tests for samhq * added some the code * added some code related to init file issue during merge conflicts * remobved the merge conflicts * added changes mentioned by aruther and mobap * added changes mentioned by aruther and mobap * solving quality checks * added the changes for input clearly * added the changes * added changes in mask generation file rgearding model inputs and sam hq quargs in processor file * added changes in processor file * added the Setup -> setupclass conversion * added the code mentioned for processor * added changes for the code * added some code * added some code * added some code --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
@NielsRogge I have gone through there code to integrate the small models also.But issue is they link they use small vit architecture ,it could be intergated am seeing possible ways to integrate into sam hq . |
|
Hi, |

close #31137
Pull Request Title: Add HQ-SAM Functionality to Transformers Library
Model Overview
HQ-SAM (Segment Anything in High Quality) is an enhanced version of the Segment Anything Model (SAM), addressing limitations in mask quality for intricate structures and challenging segmentation tasks. The model refines SAM’s predictions using a High-Quality Output Token and Global-Local Feature Fusion while preserving SAM’s efficiency and zero-shot generalization capabilities.
According to the original implementation, HQ-SAM significantly improves mask boundaries and reduces segmentation errors by introducing minimal additional parameters (<0.5%) and computational overhead. The model is designed to maintain compatibility with SAM’s existing prompt-based design and mask decoder architecture.
Repository and Weights
The HQ-SAM implementation and pre-trained weights are available in the following repository:
https://github.com/SysCV/sam-hq
HQ-SAM provides three pre-trained weight variants:
sam_hq_vit_b– Small vision encoder.sam_hq_vit_l– Medium vision encoder.sam_hq_vit_h– Large vision encoder.The main difference between these variants is the size of the Vision Transformer (ViT) encoder, while the prompt encoder and mask decoder remain unchanged.
Functionality
For each input (e.g., bounding boxes, 2D points, or coarse masks), HQ-SAM predicts high-quality binary masks that enhance segmentation precision. Improvements include:
Reviewers: @molbap