Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Segment Anything Model #43

Closed
Samir-Rashid opened this issue Oct 28, 2023 · 3 comments
Closed

Investigate Segment Anything Model #43

Samir-Rashid opened this issue Oct 28, 2023 · 3 comments
Assignees
Labels
feature New feature or request research

Comments

@Samir-Rashid
Copy link
Contributor

Samir-Rashid commented Oct 28, 2023

This issue will look into doing obstacle detection using SAM, which is a more advanced model than YOLO. The main benefit is that we would not have to fine tune the model at all for out-of-scope object detection.


I remember wondering if we should switch YOLO to Meta's SAM when it came out. Well, luckily for us a lot of development has happened there. There are now real time versions which are super cool how they made it https://github.com/CASIA-IVA-Lab/FastSAM and it's so easy to use https://docs.ultralytics.com/models/fast-sam/#installation . We get the best of real time YOLO and SAM out of data segmenting ability.

I found out about this model from this paper ⭐⭐⭐which I would HIGHLY recommend reading. It is very readable because it is an application paper not a ML theory paper. We would need some async way to initially detect dynamic obstacles, but what they do in the paper can do the remaining real time tracking. A few notes: they are doing this at low resolution (would need to test for our use case), the object memory system for CV is a very smart idea I haven't heard about before.

Resources:

@Samir-Rashid Samir-Rashid added feature New feature or request research labels Oct 28, 2023
@Samir-Rashid Samir-Rashid self-assigned this Oct 28, 2023
@Samir-Rashid
Copy link
Contributor Author

I have started working on this task. I am going to piggy back on the work Igor has done for the CV pipeline to add the segment anything model. I ran FastSAM on datahub, I think it was using 256 CPU cores and it took:
Speed: 1188.5ms preprocess, 417401.7ms inference, 11499.6ms postprocess per image at shape (1, 3, 1024, 1024)
I will try using this Nvidia repo which they claim runs in real time https://github.com/NVIDIA-AI-IOT/nanosam. I will have to verify the performance difference of using the "mobile" version of the model.

@Samir-Rashid
Copy link
Contributor Author

FastSAM is packaged by ultralytics, which makes it dead simple to use in Python https://docs.ultralytics.com/models/fast-sam/. I am still planning on taking advantage of Igor's work on the CV pipeline to also do inference with FastSAM. However, looking at the experience people are having dealing with pytorch in cpp, I think using an IPC library would be a good idea to be able to use multiple languages. I will make a decision later, but for just inference, our existing infrastructure may work out fine.

@Samir-Rashid
Copy link
Contributor Author

Will be done by #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request research
Projects
None yet
Development

No branches or pull requests

1 participant