Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to converge when using some random seeds #24

Open
Car-pe opened this issue Apr 13, 2024 · 2 comments
Open

Failing to converge when using some random seeds #24

Car-pe opened this issue Apr 13, 2024 · 2 comments

Comments

@Car-pe
Copy link

Car-pe commented Apr 13, 2024

Dear Authors,

Your work is truly exceptional and I am currently attempting to reproduce it. However, I've observed noticeable performance variations when employing different random seeds. For example, during the fine-tuning of Deberta-v3-base on the 'mrpc' task, setting the random seed to '0' results in an evaluation accuracy of 85.05. In contrast, when I choose '71' or '37' as the random seed, the evaluation accuracy significantly drops to 68.38, essentially failing to converge.

Could you possibly offer any guidance regarding this matter? Moreover, I would greatly appreciate it if you could disclose the random seeds you utilized in this work.

Thank you!

@yifan1130
Copy link
Collaborator

Thank you for reaching out and for your kind words about our work. Regarding 2-bit quantization for GLUE task, there are sometimes performance variations in different random seeds. The extent of uncertainties varies for different tasks. For instance, as the mrpc task as you noticed, and also tasks like CoLA, the performance is rather unstable. In most cases, the situation would be even worse using the baseline QLoRA method. In our experiments, we attempted a variety of random seeds and excluded the results that didn't converge. To achieve a more stable performance on GLUE, you can consider using a larger batch size or increasing the precisions, such as using 4-bit precision in precedent layers and 2-bit precision in later layers. There are some ckpts available now. I will try to provide more ckpts and random seeds in the future.

@Car-pe
Copy link
Author

Car-pe commented Apr 17, 2024

Thank you for your response. Could you please tell me your random seeds used in GLUE benchmark? I can not reproduce your results claimed in the paper.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants