Skip to content

An Android app running inference on Meta's Segment-Anything (SAM) and SAM v2

License

Notifications You must be signed in to change notification settings

shubham0204/Segment-Anything-Android

Repository files navigation

Segment-Anything (SAM) and SAM v2 Inference In Android

app_snapshots

  • On-device inference of SAM/SAM2 with onnxruntime
  • Clean Kotlin-only implementation, with no additional code compilation
  • No support for text-prompt as an input to the model
  • The inference time is quite high even with float16 quantization enabled

Download the APK or setup the project locally

About Segment-Anything

sam_architecture

  • Large-language models have demonstrated significant performance gains in numerous NLP tasks within zero or few-shot problem settings. The prompt or a text given at inference-time to the LLM guides the generation of the output.
  • Foundation models like CLIP and ALIGN have been popular due to wide adaptability and fine-tuning capabilities for downstream tasks.
  • The goal of the authors is to build a foundation model for image segmentation.

Task

  • Authors define a promptable image segmentation task.
  • The prompt could be spatial or textual information which guides the model to generate the desired segmentation mask.

Model

  • A powerful image encoder is used to produce image embeddings and a prompt encoder embeds prompts, both of which are combined with a mask decoder.
  • The authors focus on point, box and mask prompts with initial results on free-form text prompts.
  • Image Encoder: MAE (Masked Autoencoder) pre-trained Vision Transformer
  • Prompt Encoder: Points and boxes are represented by positional encodings, masks are embedded with convolutional layers, and free-form text with an encoder like CLIP
  • Mask Decoder: Transformer-based decoder model

Data Engine

  • To achieve strong generalization on unknown datasets, authors propose a model-in-the-loop data annotation process with three phases.
  • In the assisted-manual phase, SAM helps annotators in annotating masks.
  • In the semi-automatic phase, SAM automatically generates masks for certain objects, by prompting their locations in the image.
  • In the fully-automatic phase, SAM is prompted with a regular grid of foreground points, each of which yields a segmentation mask.

Setup

  1. Clone the project from GitHub and open the resulting directory in Android Studio.
git clone --depth=1 https://github.com/shubham0204/Segment-Anything-Android
  1. Android Studio starts building the project automatically. If not, select Build > Rebuild Project to start a project build.

  2. After a successful project build, connect an Android device to your system. Once connected, the name of the device must be visible in top menu-bar in Android Studio.

  3. Download any *_encoder.onnx and corresponding *_decoder.onnx models from the HuggingFace repository and place them in the root directory of the project. The models can be stored in one of the two possible methods

Store the ONNX models in the assets folder

By placing the *_encoder.onnx and *_decoder.onnx in the app/src/main/assets folder, the models are packaged with the APK, which increases the overall size of the APK but avoids any additional setup to bring the models to the device. Make sure you change the names of the encoder and decoder models in MainActivity.kt,

class MainActivity : ComponentActivity() {

    private val encoder = SAMEncoder()
    private val decoder = SAMDecoder()
    
    // The app will look for models with these file-names 
    // in the assets folder
    private val encoderFileName = "encoder_base_plus.onnx"
    private val decoderFileName = "decoder_base_plus.onnx"

    // ...
}

Store the ONNX models in the device's temporary storage

Using the adb CLI tool, insert the ONNX models in the device's storage,

adb push sam2_hiera_small_encoder.onnx /data/local/tmp/sam/encoder.onnx
adb push sam2_hiera_small_decoder.onnx /data/local/tmp/sam/decoder.onnx

Replace sam2_hiera_small_decoder.onnx and sam2_hiera_small_encoder.onnx with the name of the model downloaded from the HF repository in step (4).

Update the model paths and set other options in MainActivity.kt,

class MainActivity : ComponentActivity() {

    // ...

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        enableEdgeToEdge()

        setContent {
            SAMAndroidTheme {
                Scaffold(modifier = Modifier.fillMaxSize()) { innerPadding ->
                    Column(
                        // ...
                    ) {

                        // ...

                        LaunchedEffect(0) {
                            // ...
                            // The paths below should match the ones
                            // used in step (5)
                            encoder.init(
                                "/data/local/tmp/sam/encoder_fp16.onnx",
                                useXNNPack = true, // XNNPack delegate for onnxruntime
                                useFP16 = true
                            )
                            decoder.init(
                                "/data/local/tmp/sam/decoder_fp16.onnx",
                                useXNNPack = true,
                                useFP16 = true
                            )
                            // ...
                        }
                        
                        // ...
                    }
                }
            }
        }
    }
}

Resources

Citations

@misc{ravi2024sam2segmentimages,
      title={SAM 2: Segment Anything in Images and Videos}, 
      author={Nikhila Ravi and Valentin Gabeur and Yuan-Ting Hu and Ronghang Hu and Chaitanya Ryali and Tengyu Ma and Haitham Khedr and Roman Rädle and Chloe Rolland and Laura Gustafson and Eric Mintun and Junting Pan and Kalyan Vasudev Alwala and Nicolas Carion and Chao-Yuan Wu and Ross Girshick and Piotr Dollár and Christoph Feichtenhofer},
      year={2024},
      eprint={2408.00714},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00714}, 
}
@misc{kirillov2023segment,
      title={Segment Anything}, 
      author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
      year={2023},
      eprint={2304.02643},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2304.02643}, 
}

About

An Android app running inference on Meta's Segment-Anything (SAM) and SAM v2

Resources

License

Stars

Watchers

Forks