-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate recent advances for next model backbone (2023/2024) #7
Comments
Found an excellent resource for model implementations at https://github.com/leondgarse/keras_cv_attention_models#recognition-models, which should accelerate trying out new models. I've been doing some catch up on recent advancements of the past two or so years. While ConvNextV2 is intriguing due to model size, I think a better approach for this use case will be to not only focus on the model size but also on the pretraining. In particular, I think the DINO/DINOv2 and CLIP-related pretraining approaches are particularly helpful due to the robustness in distribution shift. Not only is the training material much closer to our target distribution, but generally the models will be much stronger. To test this theory out, I tried a DINOv2 finetune out (dense layer only + gradual weight change on last 15 layers) and got excellent results, better than I had seen for some of my more half-hearted approaches to e.g. Inceptions and/or ResNets. The only challenge there is that the smallest model available uses a whopping 47.23G FLOPS (vs. 0.72G for say EfficientNetV1 B0). A bit surprisingly, I was able to successfully convert it to TensorflowJS but it was slow to the level of several seconds per image prediction. Still, a useful experiment to demonstrate effectiveness of a stronger model. The dataset has also increased somewhat, so it's not quite apples to apples but notice the improvement in the DET and ROC curves. I'm going to check out an EVA02 based model next as it is CLIP-based, but it still weighs in at 4.72G FLOPS so it's in theory going to be a few times slower than the current. |
GCViT XTiny - a bit bigger model. Shows in the performance. SQRXR 136: |
I've been trying this out ... and I'm not sure it's fast enough or good enough to become the next top model yet. I might need to keep searching. |
facebookresearch/ConvNeXt-V2#3
https://github.com/edwardyehuang/iSeg/tree/master/backbones
The text was updated successfully, but these errors were encountered: