Composable Models #8

gnodar01 · 2023-11-30T23:46:55Z

gnodar01
Nov 30, 2023
Maintainer

Over the last week or so we've been discussing going forward with composable models. The rough idea is to have individual components of an instance segmentation model, that are pre-trained on some task such as imagenet or coco. One component might be an object detector (probably single-shot such as yolo or ssd) which outputs boundbox coordinates (and classes, but we may not even need those). Another component might be a semantic segmenter, which takes in crops defined by the bounding boxes of the object detector, and semantically segments into bg/fg.

The hypothesis is, given pretrained components of the sort described above, can we add on a relatively slim number of layers on each, to have a fine-tunable model?

If so, we can have a composable pipline that goes images -> object detector (pretrained, inference only) -> slim layers (trainable) -> output bbox coords -> image crops -> semantic segmenter (pretrained, inference only) -> slim layers (trainable) -> instances

In the end this pipeline would be deployed to the browser.

The object detector and semantic segmenter would be pretrained in python on a laptop or DGX, and then converted to a tfjs graph model. We have determined that is possible with more or less arbitrary tensorflow using tfjs convertor or pytorch models using the ultralytics Exporter class (which does pytorch -> onnx -> tensorflow saved model -> tfjs converter -> tfj graph model).

The slim layers would be tfjs layers models, and therefore trainable in-browser.

For the purposes of validating the hypothesis first however, we'll do it all in python, as a proof of concept. We can worry about converting to tfjs and verifying its performance in the browser after.

gnodar01 · 2023-11-30T23:47:02Z

gnodar01
Nov 30, 2023
Maintainer Author

We discussed Mask-RCNN style (two pass) vs SSD (one pass) vs yolo (one pass) for object detection, and are leaning towards yolo. Interestingly yolo has gone through a number of iterations improved substantially over the past few years (review). v1-v3 had the original author Joseph Redmon working on it, but has since left CV research. Alexey Bochkovskiy worked on v4. Other teams, including ultralytics, have been improving it up to the current day v8. Ultralytic's v5 is particularly interesting because it introduces multi-task learning, including segmentation. This continues with ultralytics v8, which take into account improvements in efficiency made in v6 such as quantization, etc. Its also a generally nice framework, with the previously mention export functionality.

The v8 family of models for the task of instance segmentation are called YOLOv8-seg. By this comment it seems that yolov8-seg architecture is pretty much the same as the v8 object detector with "an extra output module in the head which outputs masks coefficients and an added FCN layer", which is inspired by the YOLACT paper. Worth reading is the Fast Segment Anything paper which uses yolov8-seg and references YOLACT (and is also available via the ultralytics library).

1 reply

gnodar01 Nov 30, 2023
Maintainer Author

With yolov8-seg, the question is how do we disentangle it to fit the component pipeline described above? Or alternatively, should we? Maybe we can use the entire model, pretrained on some task, rip out its current classes, and add some relatively "slim" trainable layers on top to fine tune the bbox coordinates + masks for the task we care about?

gnodar01 · 2023-11-30T23:54:35Z

gnodar01
Nov 30, 2023
Maintainer Author

@eddogola feel free to correct me on anything, and add any additional points here. Once you're ready to write some code, press "Create issue from discussion" and convert it into an issue, since this is the next immediate chunk of work we'll be working on.

0 replies

bethac07 · 2023-12-05T15:53:50Z

bethac07
Dec 5, 2023
Maintainer

Questions around picking the object detector head:

One stage vs two
Proposal vs proposal-free
Anchor vs anchor-free

(We may not get to investigating this but may want to eventually get into that)

0 replies

bethac07 · 2023-12-05T15:55:17Z

bethac07
Dec 5, 2023
Maintainer

Possible object detectors:

Yolov1 (one stage, anchor free, but bad recall because no FPN)
some other Yolo (eg most recent v8, which is anchor and proposal free)
RPN (as sub-component for proposal-based methods, ie two stage models)
FPN (as sub-component)
SSD (anchor-based, single stage)
FCOS (one stage, anchor free, fully convolutional, good recall via FPN)
Others?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composable Models #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Composable Models #8

gnodar01 Nov 30, 2023 Maintainer

Replies: 4 comments · 1 reply

gnodar01 Nov 30, 2023 Maintainer Author

gnodar01 Nov 30, 2023 Maintainer Author

gnodar01 Nov 30, 2023 Maintainer Author

bethac07 Dec 5, 2023 Maintainer

bethac07 Dec 5, 2023 Maintainer

gnodar01
Nov 30, 2023
Maintainer

Replies: 4 comments 1 reply

gnodar01
Nov 30, 2023
Maintainer Author

gnodar01 Nov 30, 2023
Maintainer Author

gnodar01
Nov 30, 2023
Maintainer Author

bethac07
Dec 5, 2023
Maintainer

bethac07
Dec 5, 2023
Maintainer