Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AltCLIP has no effect on CIFAR10 after finetune #561

Open
youyuge34 opened this issue Jan 22, 2024 · 1 comment
Open

AltCLIP has no effect on CIFAR10 after finetune #561

youyuge34 opened this issue Jan 22, 2024 · 1 comment
Labels
question Further information is requested

Comments

@youyuge34
Copy link

Description

I trained AltCLIP using official demo code with dataset CIFAR10. However, after 3 epochs, the finetuned weight has no effect on cifar10 images. I wonder if my way to load weight is wrong or not.

使用了官方的代码在cifar10上finetune后,对cifar中的动物图片识别反而完全失效了,想问下原因,是否是我加载权重的方式有问题?推理代码是用的demo里的。

import os
import torch
from PIL import Image
from flagai.auto_model.auto_loader import AutoLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

loader = AutoLoader(
    task_name="txt_img_matching",
    model_name="AltCLIP-XLMR-L",   # Load the checkpoints from Modelhub(model.baai.ac.cn/models)
    model_dir="./checkpoints/"
)

model = loader.get_model()


tokenizer = loader.get_tokenizer()
transform = loader.get_transform()

weight_file = './checkpoints/cifar_altclip_9k/AltCLIP-XLMR-L/pytorch_model.bin'
model.load_state_dict(torch.load(weight_file, map_location='cpu')['module'])

model.eval()
model.to(device)
tokenizer = loader.get_tokenizer()

Alternatives

No response

@youyuge34 youyuge34 added the question Further information is requested label Jan 22, 2024
@youyuge34
Copy link
Author

I also finetuned on my own dataset and CIFAR10 both using official demo code,but both loss are fixed on a value and the output of finetuned model is the same value without considering the input value.

So I wonder if the finetune code has someting error about loss. Besides, I found the error in official demo code which only used the number label like 1,2 instead of the class name like dog.

[2024-01-23 05:42:53,494] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67350/  198605 | elapsed time per iteration (ms): 434.3 | learning rate 8.804E-05 | loss 3.465734E+00 |
[2024-01-23 05:43:15,389] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67400/  198605 | elapsed time per iteration (ms): 437.9 | learning rate 8.803E-05 | loss 3.465734E+00 |
[2024-01-23 05:43:37,527] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67450/  198605 | elapsed time per iteration (ms): 442.8 | learning rate 8.802E-05 | loss 3.465734E+00 |
[2024-01-23 05:44:01,020] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67500/  198605 | elapsed time per iteration (ms): 469.9 | learning rate 8.801E-05 | loss 3.465734E+00 |
[2024-01-23 05:44:21,606] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67550/  198605 | elapsed time per iteration (ms): 411.7 | learning rate 8.799E-05 | loss 3.465734E+00 |
[2024-01-23 05:44:43,518] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67600/  198605 | elapsed time per iteration (ms): 438.2 | learning rate 8.798E-05 | loss 3.465734E+00 |
[2024-01-23 05:45:05,781] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67650/  198605 | elapsed time per iteration (ms): 445.3 | learning rate 8.797E-05 | loss 3.465734E+00 |
[2024-01-23 05:45:29,410] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67700/  198605 | elapsed time per iteration (ms): 472.6 | learning rate 8.796E-05 | loss 3.465734E+00 |
[2024-01-23 05:45:53,111] [INFO] [logger.py:71:log_dist] [Rank -1]  iteration    67750/  198605 | elapsed time per iteration (ms): 474.0 | learning rate 8.794E-05 | loss 3.465734E+00 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant