-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Idea: Incorporate "Segment Anything" #5984
Comments
@M-Colley since they support hugging face and roboflow models you could also just make the SAM model available there. And then just import it. However because this is such a strong model, they should add it to the models imo |
@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up! |
Thank you, that would be fantastic ! |
Very cool! I came across this additional project that combines BLIP, GroundingDINO and stable-diffusion: https://github.com/IDEA-Research/Grounded-Segment-Anything Might be worth also taking a look at :) Kind regards |
I wrote a simple labelling tool on top of SAM, I think CVAT really needs this as a feature, it'll help a lot of people. Feel free to attribute and borrow helpers from my tool if needed: |
Hi guys, we implemented the first prototype here: #6008 This should work well on GPU for a self-hosted solution. |
This one is also for Video: |
Idea of the PR is to finish this one #5990 Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/`` Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/`` If you want to use GPU, be sure you setup docker for this [guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart). Resolved issue #5984 But the interface probably can be improved Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>
Thank you very much for integrating this neural network! Works like fBRs, but much more accurate. It's great that it has an inference on both CPU and GPU. |
@bsekachev |
Also, there is a very cool XMem model for tracking masks (link). |
First of all, thank you for the quick integration of SAM. SAM really seems to be a huge breakthrough. Unfortunately, at the moment, only positive and negative points can be used. However,SAM also supports the use of bounding boxes and the combination of bounding boxes and points. I played around with it a bit (adjusted the serverless function) and was able to use bounding boxes. However, with the following limitations:
Of course, it could be that I am just misunderstood something, but I assume that these are limitations in the CVAT interface for serverless functions, as I could only find the three parameters Do you think there is hope that the CVAT interface can be adapted/expanded to make full use of SAM's capabilities? The use of (additional) bounding boxes seems to be able to significantly improve the results in my use case. |
Track Anything would be super cool too: |
<!-- Raise an issue to propose your change (https://github.com/opencv/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://opencv.github.io/cvat/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context Resolved #5984 Resolved #6049 Resolved #6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.
Hi @descilla Thank you for reporting. |
Let's also have another issue about SAM tracker if necessary. |
Idea of the PR is to finish this one cvat-ai#5990 Deploy for GPU: ``./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/`` Deploy for CPU: ``./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/`` If you want to use GPU, be sure you setup docker for this [guide](https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart). Resolved issue cvat-ai#5984 But the interface probably can be improved Co-authored-by: Alx-Wo <alexander.wolpert@googlemail.com>
<!-- Raise an issue to propose your change (https://github.com/opencv/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://opencv.github.io/cvat/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context Resolved cvat-ai#5984 Resolved cvat-ai#6049 Resolved cvat-ai#6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.
Hello, it is great that you support out-of-the-box models like YoloV7, do you also plan to include the latest FAI model "Segment-Anything"? I think that could be very helpful!
https://github.com/facebookresearch/segment-anything
Kind regards
The text was updated successfully, but these errors were encountered: