Add SAM decoder & output masks as png #418

YavorGIvanov · 2023-07-26T12:08:50Z

With this additions, the SAM decoder and therefore the whole SAM ggml implementation for point input is almost finished. The things left are:

- Implement the output_upscaling part for which we need ggml version of PyTorch nn.ConvTranspose2d() (ref: https://github.com/PABannier/encodec.cpp/blob/main/ggml.c#L12389-L12497)
- Implement the final output of the model

ЕDIT: Also there is a difference of part of the elements of some decoder results compared to the PyTorch model. The difference seems to be due to accumulating floating point error or due to the linear interpolation result numeric difference when preprocessing the image in the beggining.

… size - Also filter based on the iou treshold - Additionally filtering based on the stability score and crop boxes should be done

YavorGIvanov · 2023-08-17T12:05:45Z

The initial version is finished as the GGML version outputs masks close to the ones the PyTorch model outputs for hardcoded point inputs. For the point harcoded in the code (414.375, 162.796875) and this image:

the PyTorch model outputs this mask:

and the GGML version outputs this:

There are multiple things to be finished that I will begin doing now. Here are some with high priority:

Trace where the difference in output masks comes from. This will be done by going through the inference tensors step by step and comparing to the PyTorch version
Filter masks based on stability score and based on boxes, which touch crop boundaries
Optimize the GGML implementation by removing not needed ggml_cont operations/reshapes/transposes and so on
Add support for user input

ggerganov

🦙

* Add loading of decoder layers in Model * Multiply by hypernet_layer_cnt for ctx_size on model load * Add decoder layers to py conversion script * Fix wrong and reversed tensor sizes for decoder * Add decoder transformer implementation * Add decoder hypernet and iou prediction mlps * Add transpose convolution operation and unit test * Finish mask decoder and write the decoder output in the model state * Output masks to png after removing padding and upsampling to original size - Also filter based on the iou treshold - Additionally filtering based on the stability score and crop boxes should be done * Add stb image write in order to output masks from SAM * Add transpose convolution 2d name and symbol to ggml ops static arrays * Comment out debug print in transpose convolution test to fix compilation

* Add loading of decoder layers in Model * Multiply by hypernet_layer_cnt for ctx_size on model load * Add decoder layers to py conversion script * Fix wrong and reversed tensor sizes for decoder * Add decoder transformer implementation * Add decoder hypernet and iou prediction mlps * Add transpose convolution operation and unit test * Finish mask decoder and write the decoder output in the model state * Output masks to png after removing padding and upsampling to original size - Also filter based on the iou treshold - Additionally filtering based on the stability score and crop boxes should be done * Add stb image write in order to output masks from SAM * Add transpose convolution 2d name and symbol to ggml ops static arrays * Comment out debug print in transpose convolution test to fix compilation ggml-ci

* sam : image + prompt encoder, store embeddings * sam : add the dense img pe in SAM state (#401) * Add SAM decoder & output masks as png (#418) * Add loading of decoder layers in Model * Multiply by hypernet_layer_cnt for ctx_size on model load * Add decoder layers to py conversion script * Fix wrong and reversed tensor sizes for decoder * Add decoder transformer implementation * Add decoder hypernet and iou prediction mlps * Add transpose convolution operation and unit test * Finish mask decoder and write the decoder output in the model state * Output masks to png after removing padding and upsampling to original size - Also filter based on the iou treshold - Additionally filtering based on the stability score and crop boxes should be done * Add stb image write in order to output masks from SAM * Add transpose convolution 2d name and symbol to ggml ops static arrays * Comment out debug print in transpose convolution test to fix compilation ggml-ci * Multithread GGML_OP_ADD_REL_POS operation * ggml : fix GGML_OP_NAME array * Disable and comment out debug prints in SAM example * Add README for the SAM example * Calculate & filter based on stability score and calculate bounding box ggml-ci --------- Co-authored-by: Yavor Ivanov <yivanov@viewray.com>

YavorGIvanov added 9 commits July 21, 2023 17:41

Add loading of decoder layers in Model

87a8eae

Multiply by hypernet_layer_cnt for ctx_size on model load

4bd3c43

Add decoder layers to py conversion script

9441745

Fix wrong and reversed tensor sizes for decoder

2a043da

Add decoder transformer implementation

6f5725e

Add decoder hypernet and iou prediction mlps

45524a3

Add transpose convolution operation and unit test

9f25ce0

Finish mask decoder and write the decoder output in the model state

a0904c0

Output masks to png after removing padding and upsampling to original…

19723f3

… size - Also filter based on the iou treshold - Additionally filtering based on the stability score and crop boxes should be done

Add stb image write in order to output masks from SAM

53e7f0a

YavorGIvanov changed the title ~~Add SAM decoder: model loading, transformer, hypernetwork & iou_prediction~~ Add SAM decode & output masks as png Aug 17, 2023

YavorGIvanov changed the title ~~Add SAM decode & output masks as png~~ Add SAM decoder & output masks as png Aug 17, 2023

YavorGIvanov added 2 commits August 17, 2023 15:29

Add transpose convolution 2d name and symbol to ggml ops static arrays

9c160fe

Comment out debug print in transpose convolution test to fix compilation

4271115

ggerganov approved these changes Aug 17, 2023

View reviewed changes

ggerganov merged commit d4b5295 into ggerganov:sam Aug 17, 2023

ggerganov mentioned this pull request Aug 17, 2023

examples : add sample SAM inference #74

Merged

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023

Fix Makefile echo escape codes (by removing them). (ggerganov#418)

a140219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SAM decoder & output masks as png #418

Add SAM decoder & output masks as png #418

YavorGIvanov commented Jul 26, 2023 •

edited

Loading

YavorGIvanov commented Aug 17, 2023

ggerganov left a comment

Add SAM decoder & output masks as png #418

Add SAM decoder & output masks as png #418

Conversation

YavorGIvanov commented Jul 26, 2023 • edited Loading

YavorGIvanov commented Aug 17, 2023

ggerganov left a comment

Choose a reason for hiding this comment

YavorGIvanov commented Jul 26, 2023 •

edited

Loading