Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the performance of the pretrained model and finetuned model #1

Open
bewudi opened this issue Dec 20, 2024 · 1 comment

Comments

@bewudi
Copy link

bewudi commented Dec 20, 2024

I appreciate your sharing of this intriguing research and the accompanying code.

Upon reviewing the results, it appears that the performance of the pretrained model and the individual (finetuned) model diverges from those reported in papers [1], [2], and [3], as well as in numerous other studies, despite all utilizing the ViT model from CLIP. Could you provide an explanation for the potential reasons behind these discrepancies?

Thank you for your attention to this matter.

[1] Editing Models with Task Arithmetic
[2] ADAMERGING: Adaptive Model Merging for Multi-Task Learning
[3] Representation Surgery for Multi-Task Model Merging

Thank you.

@AntoAndGar
Copy link
Owner

Hi, thanks for sharing your interest in our work and for the kind words.

From the results of the research pointed out in your question, I see a small variation in the order of often less than $\pm 1$% for the pre-trained models. In my opinion, a possible explanation could be the accumulation of small rounding errors in different computing architectures, for the smaller ones also different rounding approximations adopted by researchers to show results, or a possible corruption of an image while downloading the datasets.
In my opinion, a so small variation of $\pm 1$% falls inside the expected tolerance in the field of DL.

Instead, for the individual finetuned model's performance, I could only tell you that it is often hard to reproduce perfectly a finetuning setting of another research: variations of seed, different numbers of training epochs, different optimizers, schedulers, and other little variations could lead to discrepancies, so in our and other research, normalized accuracy is provided instead of only plain accuracy.

If you have any other doubts, do not hesitate to reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants