Implement modern Image recognition model(s) #986

TechupBusiness · 2023-10-04T14:59:46Z

Describe the feature you'd like to request

The AI space is evolving quite fast and there are fantastic models coming out every few months. I would really like to see that this extension starts using modern AI models to improve tagging capabilities. Ideally make this extension model agnostic so that later everybody can use the model they like.

Describe the solution you'd like

A good candidate is Recognize Anything Model - RAM - on huggingface. Please see here https://huggingface.co/spaces/xinyu1205/recognize-anything
You can directly test it also on the website.

Describe alternatives you've considered

I dont see alternatives for proper AI models :)

bonswouar · 2024-03-10T16:25:44Z

I see the models haven't been updated for some years now, that would be really great to be able to implement some new ones!

As far as I understand this app uses tensorflow-js to run models.
I've just checked the available tfjs models (https://github.com/tensorflow/tfjs-models) and tensorflow models (https://www.kaggle.com/models), and it seems there haven't been any new/better ones in image classification / object detection categories unfortunately.

If someone has any suggestion, would be great!

I've found this kinda leaderboard on image classification : https://paperswithcode.com/sota/image-classification-on-imagenet
EfficientNetV2 is ranked 101. I've check a couple of "better" models with less than 1B params and couldn't find any tensorflow implementation.
I guess if someone really wants & know how to do it, converting a pytorch model to tensorflow (and then to tensorflowjs) would be doable : https://paperswithcode.com/sota/image-classification-on-imagenet

bonswouar · 2024-05-08T14:43:11Z

FYI I did some tests with the latest https://github.com/vikhyat/moondream , it's a really impressive model for it's size (1.8B parameters), but unfortunately it's a pretty general vision LLM, thus not made for image classification specifically.

I've tried to pass a list of keywords in the input but I didn't have great results, I guess long text inputs aren't well handled by such a small LLM.
Although testing some keywords one by one gives pretty excellent results!
For example I made this simple prompt; Does this picture contain "{keyword}"? Yes or No. No if you're not sure. and it seems to reliably gives accurate Yes or No responses. But it would be very time intensive to test all keywords one by one... A fine-tuned version for image classification could be pretty amazing though.

Anyway I'm not sure it's even possible to convert it to tensorflow model, but seeing the progress those kind of models have had recently the current Recognize model really feels old now :/

marcelklehr · 2024-05-08T15:23:27Z

seeing the progress those kind of models have had recently the current Recognize model really feels old now :/

I agree. Nextcloud GmbH is unlikely to spend efforts on revamping the models, sadly. I'm more than happy to guide any contributor willing to commit to implementing new stuff. (A good thing to work on would be #73 )

bonswouar · 2024-05-08T15:48:36Z

(A good thing to work on would be #73 )

That could be interesting, didn't notice this feature proposal before!
Although - and I don't know much about recent classification models (but it seems less active than general LLMs unfortunately, on the open-weight side at least) - it could be nice to aim for a specific model, it would be a good motivation for contributors to see how much better it could be (:
(Inference speed isn't really a criteria currently for me, even if I have ~10K pictures and running on the cpu it's fast enough for my use case - but an external device could mean a bigger model)

bonswouar · 2024-06-06T11:09:25Z

Kinda related, in recent news Mozilla is releasing an image recognition model for generating images alt text: https://hacks.mozilla.org/2024/05/experimenting-with-local-alt-text-generation-in-firefox-nightly/

The current model is apparently quite good and very small, and the training / model creation code is also open.
Looking at the dataset it was fined tuned on, it's obviously made for alt_text, but it could be quite easy to refactor this dataset to have a list of keywords instead.

JartanFTW · 2024-06-21T17:31:36Z

I can see a use-case in including harmful content keywords for automated flagging/removal purposes for organizations.

TechupBusiness · 2024-06-21T19:13:24Z

Wow Microsofts newest Florence 2 model looks awesome (Open Source). Just not sure how helpful it is, but it could extract a description and do an object detection.

JartanFTW · 2024-07-31T00:08:44Z

Meta's brand new open-source Segment Anything Model 2 may be very useful for object segmentation for recognition.

TechupBusiness added the enhancement New feature or request label Oct 4, 2023

bonswouar mentioned this issue Mar 10, 2024

Move to better face recognition models #1085

Open

github-project-automation bot added this to Recognize Aug 28, 2024

github-project-automation bot moved this to Backlog in Recognize Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement modern Image recognition model(s) #986

Implement modern Image recognition model(s) #986

TechupBusiness commented Oct 4, 2023 •

edited

Loading

bonswouar commented Mar 10, 2024

bonswouar commented May 8, 2024

marcelklehr commented May 8, 2024

bonswouar commented May 8, 2024 •

edited

Loading

bonswouar commented Jun 6, 2024

JartanFTW commented Jun 21, 2024

TechupBusiness commented Jun 21, 2024

JartanFTW commented Jul 31, 2024 •

edited

Loading

Implement modern Image recognition model(s) #986

Implement modern Image recognition model(s) #986

Comments

TechupBusiness commented Oct 4, 2023 • edited Loading

Describe the feature you'd like to request

Describe the solution you'd like

Describe alternatives you've considered

bonswouar commented Mar 10, 2024

bonswouar commented May 8, 2024

marcelklehr commented May 8, 2024

bonswouar commented May 8, 2024 • edited Loading

bonswouar commented Jun 6, 2024

JartanFTW commented Jun 21, 2024

TechupBusiness commented Jun 21, 2024

JartanFTW commented Jul 31, 2024 • edited Loading

TechupBusiness commented Oct 4, 2023 •

edited

Loading

bonswouar commented May 8, 2024 •

edited

Loading

JartanFTW commented Jul 31, 2024 •

edited

Loading