Replies: 8 comments
-
Hello, is this issue helpful #18 TL;DR: use |
Beta Was this translation helpful? Give feedback.
-
Hi, when using a pretrained clip model should we care at unfreezing only some particular layers? Should we fine-tune under |
Beta Was this translation helpful? Give feedback.
-
Depends what your aim is, what exactly are you fine-tuning on? We have another repository for fine-tuning pre-trained CLIP on things like ImageNet/CIFAR etc. https://github.com/mlfoundations/wise-ft and an associated preprint https://arxiv.org/abs/2109.01903. |
Beta Was this translation helpful? Give feedback.
-
I would like to fine-tune CLIP on specific datasets (like for example animals, objects, city monuments, ...) in order to get better encodings for images and captions. I'm not adding any new layer, just keeping the original CLIP architecture. In this case do you think it is a good idea to unfreeze all layers ( |
Beta Was this translation helpful? Give feedback.
-
Unfortunately we haven't tried that at this time so don't have a good answer for you |
Beta Was this translation helpful? Give feedback.
-
Hello, I would like to fine-tune CLIP on my own specific dataset (app. 50k image-text pairs), I used provided ViT-B/32 checkpoints as an initial model but the accuracy starts with %1 and after 32 epochs, it reaches only around %30. (I tried various weight decay and LR combinations, the best of them is weight decay=0.001 and LR=5e-4.) Have you tried to fine-tune CLIP on a small specific dataset, if so how is the performance? @milmin |
Beta Was this translation helpful? Give feedback.
-
I have not tried this but those hyperparameters seem like they should be good, is their any reason to use our checkpoints and not OpenAI's via To clarify, the 1% accuracy is on your new task? Or zero-shot performance on ImageNet? |
Beta Was this translation helpful? Give feedback.
-
Actually, I set the parameters of --openai-pretrained and --model to True and ViT-B/32, respectively. In this way, I think, I use the official ViT-B/32 parameters (is that true ? ), therefore I wrote like that. |
Beta Was this translation helpful? Give feedback.
-
a. How the fine-tune result is?Could you provide a set of fine-tuned parameters?
b. For fine-tuning, what suggestions do you have in parameter settings or training skills?
Beta Was this translation helpful? Give feedback.
All reactions