Segment-Anything (SAM) and SAM v2 Inference In Android

On-device inference of SAM/SAM2 with onnxruntime
Clean Kotlin-only implementation, with no additional code compilation
No support for text-prompt as an input to the model
The inference time is quite high even with float16 quantization enabled

Download the APK or setup the project locally

About Segment-Anything

Large-language models have demonstrated significant performance gains in numerous NLP tasks within zero or few-shot problem settings. The prompt or a text given at inference-time to the LLM guides the generation of the output.
Foundation models like CLIP and ALIGN have been popular due to wide adaptability and fine-tuning capabilities for downstream tasks.
The goal of the authors is to build a foundation model for image segmentation.

Task

Authors define a promptable image segmentation task.
The prompt could be spatial or textual information which guides the model to generate the desired segmentation mask.

Model

A powerful image encoder is used to produce image embeddings and a prompt encoder embeds prompts, both of which are combined with a mask decoder.
The authors focus on point, box and mask prompts with initial results on free-form text prompts.
Image Encoder: MAE (Masked Autoencoder) pre-trained Vision Transformer
Prompt Encoder: Points and boxes are represented by positional encodings, masks are embedded with convolutional layers, and free-form text with an encoder like CLIP
Mask Decoder: Transformer-based decoder model

Data Engine

To achieve strong generalization on unknown datasets, authors propose a model-in-the-loop data annotation process with three phases.
In the assisted-manual phase, SAM helps annotators in annotating masks.
In the semi-automatic phase, SAM automatically generates masks for certain objects, by prompting their locations in the image.
In the fully-automatic phase, SAM is prompted with a regular grid of foreground points, each of which yields a segmentation mask.

Setup

Clone the project from GitHub and open the resulting directory in Android Studio.

git clone --depth=1 https://github.com/shubham0204/Segment-Anything-Android

Android Studio starts building the project automatically. If not, select Build > Rebuild Project to start a project build.
After a successful project build, connect an Android device to your system. Once connected, the name of the device must be visible in top menu-bar in Android Studio.
Download any *_encoder.onnx and corresponding *_decoder.onnx models from the HuggingFace repository and place them in the root directory of the project. The models can be stored in one of the two possible methods

Store the ONNX models in the `assets` folder

By placing the *_encoder.onnx and *_decoder.onnx in the app/src/main/assets folder, the models are packaged with the APK, which increases the overall size of the APK but avoids any additional setup to bring the models to the device. Make sure you change the names of the encoder and decoder models in MainActivity.kt,

class MainActivity : ComponentActivity() {

    private val encoder = SAMEncoder()
    private val decoder = SAMDecoder()
    
    // The app will look for models with these file-names 
    // in the assets folder
    private val encoderFileName = "encoder_base_plus.onnx"
    private val decoderFileName = "decoder_base_plus.onnx"

    // ...
}

Store the ONNX models in the device's temporary storage

Using the adb CLI tool, insert the ONNX models in the device's storage,

adb push sam2_hiera_small_encoder.onnx /data/local/tmp/sam/encoder.onnx
adb push sam2_hiera_small_decoder.onnx /data/local/tmp/sam/decoder.onnx

Replace sam2_hiera_small_decoder.onnx and sam2_hiera_small_encoder.onnx with the name of the model downloaded from the HF repository in step (4).

Update the model paths and set other options in MainActivity.kt,

class MainActivity : ComponentActivity() {

    // ...

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        enableEdgeToEdge()

        setContent {
            SAMAndroidTheme {
                Scaffold(modifier = Modifier.fillMaxSize()) { innerPadding ->
                    Column(
                        // ...
                    ) {

                        // ...

                        LaunchedEffect(0) {
                            // ...
                            // The paths below should match the ones
                            // used in step (5)
                            encoder.init(
                                "/data/local/tmp/sam/encoder_fp16.onnx",
                                useXNNPack = true, // XNNPack delegate for onnxruntime
                                useFP16 = true
                            )
                            decoder.init(
                                "/data/local/tmp/sam/decoder_fp16.onnx",
                                useXNNPack = true,
                                useFP16 = true
                            )
                            // ...
                        }
                        
                        // ...
                    }
                }
            }
        }
    }
}

Resources

ONNX-SAM2-Segment-Anything: ONNX models were derived from the Colab notebook linked in the README.md of this project.
Segment Anything - arxiv
SAM 2: Segment Anything in Images and Videos - arxiv

Citations

@misc{ravi2024sam2segmentimages,
      title={SAM 2: Segment Anything in Images and Videos}, 
      author={Nikhila Ravi and Valentin Gabeur and Yuan-Ting Hu and Ronghang Hu and Chaitanya Ryali and Tengyu Ma and Haitham Khedr and Roman Rädle and Chloe Rolland and Laura Gustafson and Eric Mintun and Junting Pan and Kalyan Vasudev Alwala and Nicolas Carion and Chao-Yuan Wu and Ross Girshick and Piotr Dollár and Christoph Feichtenhofer},
      year={2024},
      eprint={2408.00714},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00714}, 
}

@misc{kirillov2023segment,
      title={Segment Anything}, 
      author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
      year={2023},
      eprint={2304.02643},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2304.02643}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
gradle		gradle
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segment-Anything (SAM) and SAM v2 Inference In Android

About Segment-Anything

Task

Model

Data Engine

Setup

Store the ONNX models in the `assets` folder

Store the ONNX models in the device's temporary storage

Resources

Citations

About

Releases 1

Languages

License

shubham0204/Segment-Anything-Android

Folders and files

Latest commit

History

Repository files navigation

Segment-Anything (SAM) and SAM v2 Inference In Android

About Segment-Anything

Task

Model

Data Engine

Setup

Store the ONNX models in the assets folder

Store the ONNX models in the device's temporary storage

Resources

Citations

About

Resources

License

Stars

Watchers

Forks

Releases 1

Languages

Store the ONNX models in the `assets` folder