Feature Idea: Incorporate "Segment Anything" #5984

M-Colley · 2023-04-05T19:49:41Z

Hello, it is great that you support out-of-the-box models like YoloV7, do you also plan to include the latest FAI model "Segment-Anything"? I think that could be very helpful!

https://github.com/facebookresearch/segment-anything

Kind regards

timmermansjoy · 2023-04-05T21:55:49Z

@M-Colley since they support hugging face and roboflow models you could also just make the SAM model available there. And then just import it.

However because this is such a strong model, they should add it to the models imo

nmanovic · 2023-04-06T15:24:35Z

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

medphisiker · 2023-04-10T08:46:40Z

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

Thank you, that would be fantastic !

M-Colley · 2023-04-10T14:18:08Z

Very cool!

I came across this additional project that combines BLIP, GroundingDINO and stable-diffusion: https://github.com/IDEA-Research/Grounded-Segment-Anything

Might be worth also taking a look at :)

Kind regards

anuragxel · 2023-04-11T13:38:19Z

I wrote a simple labelling tool on top of SAM, I think CVAT really needs this as a feature, it'll help a lot of people. Feel free to attribute and borrow helpers from my tool if needed:

https://github.com/anuragxel/salt

bsekachev · 2023-04-11T16:42:08Z

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution.
For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

modyngs · 2023-04-12T08:01:16Z

This one is also for Video:
https://github.com/kadirnar/segment-anything-video

Idea of the PR is to finish this one #5990 Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/`` Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/`` If you want to use GPU, be sure you setup docker for this [guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart). Resolved issue #5984 But the interface probably can be improved Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>

medphisiker · 2023-04-12T14:03:29Z

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution. For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

Thank you very much for integrating this neural network! Works like fBRs, but much more accurate. It's great that it has an inference on both CPU and GPU.

modyngs · 2023-04-15T06:40:03Z

@bsekachev
Is there any plan to implement in tracker mode?
Thanks

medphisiker · 2023-04-17T05:15:40Z

@bsekachev Is there any plan to implement in tracker mode? Thanks

Also, there is a very cool XMem model for tracking masks (link).
There are very cool video demonstrations that look fantastic.
I wrote about it in this issue (link).

descilla · 2023-04-17T18:02:40Z

First of all, thank you for the quick integration of SAM. SAM really seems to be a huge breakthrough.

Unfortunately, at the moment, only positive and negative points can be used. However,SAM also supports the use of bounding boxes and the combination of bounding boxes and points.

I played around with it a bit (adjusted the serverless function) and was able to use bounding boxes. However, with the following limitations:

At least one additional point must always be set for the function to be "triggered".
The bounding box is only ~~used~~ visible in the first iteration; it disappears when adding more points.

Of course, it could be that I am just misunderstood something, but I assume that these are limitations in the CVAT interface for serverless functions, as I could only find the three parameters min_pos_points, min_neg_points, and startswith_box.

Do you think there is hope that the CVAT interface can be adapted/expanded to make full use of SAM's capabilities? The use of (additional) bounding boxes seems to be able to significantly improve the results in my use case.

shortcipher3 · 2023-04-25T21:23:19Z

Track Anything would be super cool too:
https://github.com/gaomingqi/Track-Anything

### Motivation and context Resolved #5984 Resolved #6049 Resolved #6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested?  ### Checklist  - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.

bsekachev · 2023-05-11T11:06:37Z

Hi @descilla

Thank you for reporting.
Let's have a dedicated issue about bboxes support and why it is necessary.

bsekachev · 2023-05-11T11:07:01Z

Hi @shortcipher3

Let's also have another issue about SAM tracker if necessary.

Idea of the PR is to finish this one cvat-ai#5990 Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/`` Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/`` If you want to use GPU, be sure you setup docker for this [guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart). Resolved issue cvat-ai#5984 But the interface probably can be improved Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>

### Motivation and context Resolved cvat-ai#5984 Resolved cvat-ai#6049 Resolved cvat-ai#6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested?  ### Checklist  - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.

nmanovic added the models label Apr 6, 2023

bsekachev mentioned this issue Apr 11, 2023

Added Segment Anything interactor for GPU/CPU #6008

Merged

7 tasks

bsekachev mentioned this issue Apr 13, 2023

Running SAM backbone on frontend #6019

Merged

7 tasks

bsekachev closed this as completed in #6019 May 11, 2023

p0mad mentioned this issue May 20, 2023

Is there any tracker segmentation for human annotaion in videos? #6174

Closed

2 tasks

descilla mentioned this issue Jun 10, 2023

Feature Request: Extend SAM Interaction in CVAT with Bounding Box Inputs #6281

Closed

omerferhatt mentioned this issue Apr 30, 2024

Polygon tracking feature with XMem tracker #7829

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Idea: Incorporate "Segment Anything" #5984

Feature Idea: Incorporate "Segment Anything" #5984

M-Colley commented Apr 5, 2023

timmermansjoy commented Apr 5, 2023

nmanovic commented Apr 6, 2023

medphisiker commented Apr 10, 2023

M-Colley commented Apr 10, 2023

anuragxel commented Apr 11, 2023

bsekachev commented Apr 11, 2023

modyngs commented Apr 12, 2023

medphisiker commented Apr 12, 2023

modyngs commented Apr 15, 2023

medphisiker commented Apr 17, 2023 •

edited

Loading

descilla commented Apr 17, 2023 •

edited

Loading

shortcipher3 commented Apr 25, 2023

bsekachev commented May 11, 2023

bsekachev commented May 11, 2023

Feature Idea: Incorporate "Segment Anything" #5984

Feature Idea: Incorporate "Segment Anything" #5984

Comments

M-Colley commented Apr 5, 2023

timmermansjoy commented Apr 5, 2023

nmanovic commented Apr 6, 2023

medphisiker commented Apr 10, 2023

M-Colley commented Apr 10, 2023

anuragxel commented Apr 11, 2023

bsekachev commented Apr 11, 2023

modyngs commented Apr 12, 2023

medphisiker commented Apr 12, 2023

modyngs commented Apr 15, 2023

medphisiker commented Apr 17, 2023 • edited Loading

descilla commented Apr 17, 2023 • edited Loading

shortcipher3 commented Apr 25, 2023

bsekachev commented May 11, 2023

bsekachev commented May 11, 2023

medphisiker commented Apr 17, 2023 •

edited

Loading

descilla commented Apr 17, 2023 •

edited

Loading