You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently integrating a fine-tuned SAM model into CVAT, following the integration pattern of the existing SAM model. My integration process involved a few key steps:
Function Addition: I introduced a new function within the serverless/pytorch/ directory.
Model Conversion: The fine-tuned decoder portion was converted from Torch to ONNX utilizing the ONNX exporter repository.
Plugin Creation: I developed a new plugin located at cvat-ui/plugins/ and registered the plugin using the following env variable CLIENT_PLUGINS
The implementation successfully generates masks as expected, and no errors arise during execution. However, the visual output on CVAT does not align with the image resolution, resulting in mismatched mask visualization.
Upon troubleshooting, I identified a disparity in the mask dimensions produced by my ONNX model's decoder and the existing integrated model:
Current SAM Decoder ONNX Dimensions: [1, 1, 1221, 1233]
My Converted SAM Decoder ONNX Dimensions: [1, 1, 2048, 2048]
The dimensions [1, 1, 2048, 2048] correlate precisely with the image resolution, i.e., 2048x2048, leading me to suspect this is causing improper mask visualization.
How does the current SAM decoder model manage resolution adjustment, and is this mechanism embedded within the ONNX model itself?
What adjustments can I make to my model or integration approach to synchronize mask resolution with the existing framework effectively?
Is the script used for exporting SAM decoder to onnx available somewhere?
Additionally, I assume the extra output x, y coordinates are bbox of the generated mask, so I implemented this in my code as well.
Any insights or guidance on this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
I am currently integrating a fine-tuned SAM model into CVAT, following the integration pattern of the existing SAM model. My integration process involved a few key steps:
CLIENT_PLUGINS
During this integration, adjustments were made to the following files in my plugin:
src/ts/index.tsx
src/ts/inference.worker.ts
The implementation successfully generates masks as expected, and no errors arise during execution. However, the visual output on CVAT does not align with the image resolution, resulting in mismatched mask visualization.
Upon troubleshooting, I identified a disparity in the mask dimensions produced by my ONNX model's decoder and the existing integrated model:
Current SAM Decoder ONNX Dimensions: [1, 1, 1221, 1233]
![Screenshot from 2024-10-15 16-52-41](https://private-user-images.githubusercontent.com/29784227/376669280-279cf8dd-f92b-4e69-9cbf-347cab3635c6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzUwNDQsIm5iZiI6MTczOTM3NDc0NCwicGF0aCI6Ii8yOTc4NDIyNy8zNzY2NjkyODAtMjc5Y2Y4ZGQtZjkyYi00ZTY5LTljYmYtMzQ3Y2FiMzYzNWM2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1MzkwNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNhNDZmMjM3NTg0OGE0ZTZjMDQ0NDkzZDY2ODcyZTU0OTNkODMyZTc4Yzk3MDIyOWU4MzRjODE0ZWEwZDhmYjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.J9ZOK87uhOG8xHI0uh5gWR8ILX_peGyhFbI7PB3E090)
My Converted SAM Decoder ONNX Dimensions: [1, 1, 2048, 2048]
![Screenshot from 2024-10-15 16-54-35](https://private-user-images.githubusercontent.com/29784227/376669505-78f3a3b6-8063-45f0-be60-dccfd88dcfcd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzUwNDQsIm5iZiI6MTczOTM3NDc0NCwicGF0aCI6Ii8yOTc4NDIyNy8zNzY2Njk1MDUtNzhmM2EzYjYtODA2My00NWYwLWJlNjAtZGNjZmQ4OGRjZmNkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1MzkwNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUwOTdkZDM1OTY1YWU1MGFjYzU2MTc1ZWRiYjk0ZmE4YWI2YWRmYWJjOGYyNDAyYWRkZjFjYTk2MDVkOTJkN2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.qr_UboAFU0wIwNwbeZsKSYoKNkUpI3rATLntNXWIsz8)
The dimensions [1, 1, 2048, 2048] correlate precisely with the image resolution, i.e., 2048x2048, leading me to suspect this is causing improper mask visualization.
How does the current SAM decoder model manage resolution adjustment, and is this mechanism embedded within the ONNX model itself?
What adjustments can I make to my model or integration approach to synchronize mask resolution with the existing framework effectively?
Is the script used for exporting SAM decoder to onnx available somewhere?
Additionally, I assume the extra output x, y coordinates are bbox of the generated mask, so I implemented this in my code as well.
Any insights or guidance on this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered: