Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Segment Anything interactor for GPU/CPU #6008

Merged
merged 7 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## \[2.5.0] - Unreleased
### Added
- Add support for Azure Blob Storage connection string authentication(<https://github.com/openvinotoolkit/cvat/pull/4649>)
- Added Segment Anything interactor for CPU/GPU (<https://github.com/opencv/cvat/pull/6008>)

### Changed
- Moving a task from a project to another project is disabled (<https://github.com/opencv/cvat/pull/5901>)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ up to 10x. Here is a list of the algorithms we support, and the platforms they c

| Name | Type | Framework | CPU | GPU |
| ------------------------------------------------------------------------------------------------------- | ---------- | ---------- | --- | --- |
| [Segment Anything](/serverless/pytorch/facebookresearch/sam/nuclio/) | interactor | PyTorch | ✔️ | ✔️ |
| [Deep Extreme Cut](/serverless/openvino/dextr/nuclio) | interactor | OpenVINO | ✔️ | |
| [Faster RCNN](/serverless/openvino/omz/public/faster_rcnn_inception_v2_coco/nuclio) | detector | OpenVINO | ✔️ | |
| [Mask RCNN](/serverless/openvino/omz/public/mask_rcnn_inception_resnet_v2_atrous_coco/nuclio) | detector | OpenVINO | ✔️ | |
Expand Down
71 changes: 71 additions & 0 deletions serverless/pytorch/facebookresearch/sam/nuclio/function-gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

metadata:
bsekachev marked this conversation as resolved.
Show resolved Hide resolved
name: pth.facebookresearch.sam.vit_h
namespace: cvat
annotations:
name: Segment Anything
version: 2
type: interactor
spec:
framework: pytorch
min_pos_points: 1
min_neg_points: 0
animated_gif: https://raw.githubusercontent.com/opencv/cvat/develop/site/content/en/images/hrnet_example.gif
help_message: The interactor allows to get a mask of an object using at least one positive, and any negative points inside it
bsekachev marked this conversation as resolved.
Show resolved Hide resolved

spec:
description: Interactive object segmentation with Segment-Anything
runtime: 'python:3.8'
handler: main:handler
eventTimeout: 30s
env:
- name: PYTHONPATH
value: /opt/nuclio/sam

build:
image: cvat.pth.facebookresearch.sam.vit_h
baseImage: ubuntu:22.04

directives:
preCopy:
# disable interactive frontend
- kind: ENV
value: DEBIAN_FRONTEND=noninteractive
# set workdir
- kind: WORKDIR
value: /opt/nuclio/sam
# install basic deps
- kind: RUN
value: apt-get update && apt-get -y install curl git python3 python3-pip ffmpeg libsm6 libxext6
# install sam deps
- kind: RUN
value: pip3 install torch torchvision torchaudio opencv-python pycocotools matplotlib onnxruntime onnx
# install sam code
- kind: RUN
value: pip3 install git+https://github.com/facebookresearch/segment-anything.git
# download sam weights
- kind: RUN
value: curl -O https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# map pip3 and python3 to pip and python
- kind: RUN
value: ln -s /usr/bin/pip3 /usr/local/bin/pip && ln -s /usr/bin/python3 /usr/bin/python
triggers:
myHttpTrigger:
maxWorkers: 1
kind: 'http'
workerAvailabilityTimeoutMilliseconds: 10000
attributes:
maxRequestBodySize: 33554432 # 32MB
resources:
limits:
nvidia.com/gpu: 1

platform:
attributes:
restartPolicy:
name: always
maximumRetryCount: 3
mountMode: volume
68 changes: 68 additions & 0 deletions serverless/pytorch/facebookresearch/sam/nuclio/function.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

metadata:
bsekachev marked this conversation as resolved.
Show resolved Hide resolved
name: pth.facebookresearch.sam.vit_h
namespace: cvat
annotations:
name: Segment Anything
version: 2
type: interactor
spec:
framework: pytorch
min_pos_points: 1
min_neg_points: 0
animated_gif: https://raw.githubusercontent.com/opencv/cvat/develop/site/content/en/images/hrnet_example.gif
help_message: The interactor allows to get a mask of an object using at least one positive, and any negative points inside it
bsekachev marked this conversation as resolved.
Show resolved Hide resolved

spec:
description: Interactive object segmentation with Segment-Anything
runtime: 'python:3.8'
handler: main:handler
eventTimeout: 30s
env:
- name: PYTHONPATH
value: /opt/nuclio/sam

build:
image: cvat.pth.facebookresearch.sam.vit_h
baseImage: ubuntu:22.04

directives:
preCopy:
# disable interactive frontend
- kind: ENV
value: DEBIAN_FRONTEND=noninteractive
# set workdir
- kind: WORKDIR
value: /opt/nuclio/sam
# install basic deps
- kind: RUN
value: apt-get update && apt-get -y install curl git python3 python3-pip ffmpeg libsm6 libxext6
# install sam deps
- kind: RUN
value: pip3 install torch torchvision torchaudio opencv-python pycocotools matplotlib onnxruntime onnx
# install sam code
- kind: RUN
value: pip3 install git+https://github.com/facebookresearch/segment-anything.git
# download sam weights
- kind: RUN
value: curl -O https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# map pip3 and python3 to pip and python
- kind: RUN
value: ln -s /usr/bin/pip3 /usr/local/bin/pip && ln -s /usr/bin/python3 /usr/bin/python
triggers:
myHttpTrigger:
maxWorkers: 2
kind: 'http'
workerAvailabilityTimeoutMilliseconds: 10000
attributes:
maxRequestBodySize: 33554432 # 32MB

platform:
attributes:
restartPolicy:
name: always
maximumRetryCount: 3
mountMode: volume
33 changes: 33 additions & 0 deletions serverless/pytorch/facebookresearch/sam/nuclio/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

import json
bsekachev marked this conversation as resolved.
Show resolved Hide resolved
import base64
from PIL import Image
import io
from model_handler import ModelHandler

def init_context(context):
context.logger.info("Init context... 0%")
model = ModelHandler()
context.user_data.model = model
context.logger.info("Init context...100%")

def handler(context, event):
context.logger.info("call handler")
data = event.body
pos_points = data["pos_points"]
neg_points = data["neg_points"]
buf = io.BytesIO(base64.b64decode(data["image"]))
image = Image.open(buf)
image = image.convert("RGB") # to make sure image comes in RGB
mask, polygon = context.user_data.model.handle(image, pos_points, neg_points)
return context.Response(body=json.dumps({
'points': polygon,
'mask': mask.tolist(),
}),
headers={},
content_type='application/json',
status_code=200
)
68 changes: 68 additions & 0 deletions serverless/pytorch/facebookresearch/sam/nuclio/model_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

import numpy as np
bsekachev marked this conversation as resolved.
Show resolved Hide resolved
import cv2
import torch
from segment_anything import sam_model_registry, SamPredictor

def convert_mask_to_polygon(mask):
contours = None
if int(cv2.__version__.split('.')[0]) > 3:
contours = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_KCOS)[0]
else:
contours = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_KCOS)[1]

contours = max(contours, key=lambda arr: arr.size)
if contours.shape.count(1):
contours = np.squeeze(contours)
if contours.size < 3 * 2:
raise Exception('Less then three point have been detected. Can not build a polygon.')

polygon = []
for point in contours:
polygon.append([int(point[0]), int(point[1])])

return polygon

class ModelHandler:
def __init__(self):
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.sam_checkpoint = "/opt/nuclio/sam/sam_vit_h_4b8939.pth"
self.model_type = "vit_h"
self.latest_image = None
self.latest_low_res_masks = None
sam_model = sam_model_registry[self.model_type](checkpoint=self.sam_checkpoint)
sam_model.to(device=self.device)
self.predictor = SamPredictor(sam_model)

def handle(self, image, pos_points, neg_points):
# latest image is kept in memory because function is always run-time after startup
# we use to avoid computing emeddings twice for the same image
is_the_same_image = self.latest_image is not None and np.array_equal(np.array(image), self.latest_image)
if not is_the_same_image:
self.latest_low_res_masks = None
numpy_image = np.array(image)
self.predictor.set_image(numpy_image)
self.latest_image = numpy_image
# we assume that pos_points and neg_points are of type:
# np.array[[x, y], [x, y], ...]
input_points = np.array(pos_points)
input_labels = np.array([1] * len(pos_points))

if len(neg_points):
input_points = np.concatenate([input_points, neg_points], axis=0)
input_labels = np.concatenate([input_labels, np.array([0] * len(neg_points))], axis=0)

masks, _, low_res_masks = self.predictor.predict(
point_coords=input_points,
point_labels=input_labels,
mask_input = self.latest_low_res_masks,
multimask_output=False
)
self.latest_low_res_masks = low_res_masks
object_mask = np.array(masks[0], dtype=np.uint8)
cv2.normalize(object_mask, object_mask, 0, 255, cv2.NORM_MINMAX)
polygon = convert_mask_to_polygon(object_mask)
return object_mask, polygon