Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

kunaltyagi · 2023-04-20T01:34:37Z

My actions before raising this issue

Read/searched the docs
Searched past issues

On large images, SAM latency per query adds up.

Possible Solution

Is there a way we can store the encoded image embedding in CVAT db, perhaps a separate table and then run the interactive prompt encoder + decoder with onnx.js?

The table would have a primary key on the project/task/job/frame with columns for different model embeddings. We can use a simple binary blob to store the data. This table can be populated by either by task (workflow already exists) or per frame if the db doesn't have an entry.

This would require adding support for arbitrary scripts/webasm binaries in the browser to enable use of onnx.js

Context

Scaling DB is much easier (and cheaper) than scaling GPU HW, specially since the GPU compute is idempotent

Current Implementation and Thoughts

Create a table manually in the database: Easy, but official support would be appreciated
Create a detector model which gets the request and populates the table, and doesn't detect anything ever
- Feels like a hack to have 2 models, one for pre-process and one for use
- Need to access poorly documented event.body payload
Create an interactor which runs pytorch for the prompt encoder and decoder as a serverless function
- Sadly, making this use onnx.js would take too much effort (and know-how of cvat and js)
- We add a checksum to the db to ensure the image is the same, else generate embeddings again

The text was updated successfully, but these errors were encountered:

bsekachev · 2023-04-25T06:51:24Z

Hi @kunaltyagi

I am working on client-side optimizations for SAM in #6019

ryanalexmartin · 2023-05-02T04:47:27Z

Excellent idea, @kunaltyagi

@bsekachev A frontend solution is fantastic, and I'm very much looking forward to using your work once it's finished, but I would also love to see a backend implementation. as some users would prefer to use more powerful GPUs available on their cloud instances, and may be connecting from thin clients. Perhaps the Nuclio function itself could store and look-up the embeddings for images in a mounted volume.

I think this per-image workflow would feel good and responsive:

User selects the SAM tool, via the magic wand, and clicks "Interact", but is not yet allowed to click on the image. Red cross hair does not yet appear. A request is sent to the Nuclio serverless function. The function checks a table/db/volume/etc to see if embeddings for the image name (and matching checksum) exist.
The modal that usually says "Waiting for a response from Segment Anything..." instead says "generating embeddings for image..." or, if the embeddings have been previously generated, are loaded from wherever the Nuclio function stores them. The embedding of image's location is returned to the client.
The embeddings for the image are now generated, or loaded from the database. Red crosshair appears, and now we are able to annotate the image. The resource locator of the embeddings are included with these requests, so the model no longer has to generate them every time we click, saving the world tons of money in compute and reducing CO2 emissions.

I'll try my hand at writing this, but if anybody sees any glaring issues with what I described above, I'd appreciate a heads-up.

kunaltyagi · 2023-05-03T06:57:55Z

In step 2, I'd provide a button to try again (in case the image changed due to some db operation)

### Motivation and context Resolved #5984 Resolved #6049 Resolved #6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested?  ### Checklist  - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.

### Motivation and context Resolved cvat-ai#5984 Resolved cvat-ai#6049 Resolved cvat-ai#6041 - Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to re-export ONNX mask decoder with some custom model changes (see below) to support other weights (or just download them using links below) - Need to redeploy the serverless function because its interface has been changed. Decoders for other weights: sam_vit_l_0b3195.pth: [Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing) sam_vit_b_01ec64.pth: [Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing) Changes done in ONNX part: ``` git diff scripts/export_onnx_model.py diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py index 8441258..18d5be7 100644 --- a/scripts/export_onnx_model.py +++ b/scripts/export_onnx_model.py @@ -138,7 +138,7 @@ def run_export( _ = onnx_model(**dummy_inputs) - output_names = ["masks", "iou_predictions", "low_res_masks"] + output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"] with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=torch.jit.TracerWarning) bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py index 3196bdf..85729c1 100644 --- a/segment_anything/utils/onnx.py +++ b/segment_anything/utils/onnx.py @@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module): orig_im_size = orig_im_size.to(torch.int64) h, w = orig_im_size[0], orig_im_size[1] masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False) - return masks + masks = torch.gt(masks, 0).to(torch.uint8) + nonzero = torch.nonzero(masks) + xindices = nonzero[:, 3:4] + yindices = nonzero[:, 2:3] + ytl = torch.min(yindices).to(torch.int64) + ybr = torch.max(yindices).to(torch.int64) + xtl = torch.min(xindices).to(torch.int64) + xbr = torch.max(xindices).to(torch.int64) + return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr def select_masks( self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int @@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module): if self.return_single_mask: masks, scores = self.select_masks(masks, scores, point_coords.shape[1]) - upscaled_masks = self.mask_postprocessing(masks, orig_im_size) + upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size) if self.return_extra_metrics: stability_scores = calculate_stability_score( @@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module): areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1) return upscaled_masks, scores, stability_scores, areas, masks - return upscaled_masks, scores, masks + return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr ``` ### How has this been tested?  ### Checklist  - [x] I submit my changes into the `develop` branch - [x] I have added a description of my changes into the [CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md) file - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [x] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [x] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning)) ### License - [x] I submit _my code changes_ under the same [MIT License]( https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.

bsekachev added the enhancement New feature or request label Apr 25, 2023

bsekachev mentioned this issue Apr 25, 2023

Running SAM backbone on frontend #6019

Merged

7 tasks

bsekachev closed this as completed in #6019 May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

kunaltyagi commented Apr 20, 2023

bsekachev commented Apr 25, 2023

ryanalexmartin commented May 2, 2023

kunaltyagi commented May 3, 2023

Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

Comments

kunaltyagi commented Apr 20, 2023

My actions before raising this issue

Possible Solution

Context

Current Implementation and Thoughts

bsekachev commented Apr 25, 2023

ryanalexmartin commented May 2, 2023

kunaltyagi commented May 3, 2023