Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM: How does the decoder handle output resolution? #8545

Closed
hashJoe opened this issue Oct 15, 2024 · 2 comments
Closed

SAM: How does the decoder handle output resolution? #8545

hashJoe opened this issue Oct 15, 2024 · 2 comments

Comments

@hashJoe
Copy link

hashJoe commented Oct 15, 2024

I am currently integrating a fine-tuned SAM model into CVAT, following the integration pattern of the existing SAM model. My integration process involved a few key steps:

  1. Function Addition: I introduced a new function within the serverless/pytorch/ directory.
  2. Model Conversion: The fine-tuned decoder portion was converted from Torch to ONNX utilizing the ONNX exporter repository.
  3. Plugin Creation: I developed a new plugin located at cvat-ui/plugins/ and registered the plugin using the following env variable CLIENT_PLUGINS

During this integration, adjustments were made to the following files in my plugin:
src/ts/index.tsx
src/ts/inference.worker.ts

The implementation successfully generates masks as expected, and no errors arise during execution. However, the visual output on CVAT does not align with the image resolution, resulting in mismatched mask visualization.

Upon troubleshooting, I identified a disparity in the mask dimensions produced by my ONNX model's decoder and the existing integrated model:

Current SAM Decoder ONNX Dimensions: [1, 1, 1221, 1233]
Screenshot from 2024-10-15 16-52-41

My Converted SAM Decoder ONNX Dimensions: [1, 1, 2048, 2048]
Screenshot from 2024-10-15 16-54-35

The dimensions [1, 1, 2048, 2048] correlate precisely with the image resolution, i.e., 2048x2048, leading me to suspect this is causing improper mask visualization.

How does the current SAM decoder model manage resolution adjustment, and is this mechanism embedded within the ONNX model itself?
What adjustments can I make to my model or integration approach to synchronize mask resolution with the existing framework effectively?
Is the script used for exporting SAM decoder to onnx available somewhere?

Additionally, I assume the extra output x, y coordinates are bbox of the generated mask, so I implemented this in my code as well.

Any insights or guidance on this issue would be greatly appreciated.

@hashJoe
Copy link
Author

hashJoe commented Oct 15, 2024

Is my observation right, that mask dimensions are relative to the segmented object and xy coordinates are relative to the image?

@hashJoe
Copy link
Author

hashJoe commented Oct 16, 2024

This is what I was looking for issue-1666290429

@hashJoe hashJoe closed this as completed Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant