You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your work is truly exceptional and I am currently attempting to reproduce it. However, I've observed noticeable performance variations when employing different random seeds. For example, during the fine-tuning of Deberta-v3-base on the 'mrpc' task, setting the random seed to '0' results in an evaluation accuracy of 85.05. In contrast, when I choose '71' or '37' as the random seed, the evaluation accuracy significantly drops to 68.38, essentially failing to converge.
Could you possibly offer any guidance regarding this matter? Moreover, I would greatly appreciate it if you could disclose the random seeds you utilized in this work.
Thank you!
The text was updated successfully, but these errors were encountered:
Thank you for reaching out and for your kind words about our work. Regarding 2-bit quantization for GLUE task, there are sometimes performance variations in different random seeds. The extent of uncertainties varies for different tasks. For instance, as the mrpc task as you noticed, and also tasks like CoLA, the performance is rather unstable. In most cases, the situation would be even worse using the baseline QLoRA method. In our experiments, we attempted a variety of random seeds and excluded the results that didn't converge. To achieve a more stable performance on GLUE, you can consider using a larger batch size or increasing the precisions, such as using 4-bit precision in precedent layers and 2-bit precision in later layers. There are some ckpts available now. I will try to provide more ckpts and random seeds in the future.
Thank you for your response. Could you please tell me your random seeds used in GLUE benchmark? I can not reproduce your results claimed in the paper.
Thank you!
Dear Authors,
Your work is truly exceptional and I am currently attempting to reproduce it. However, I've observed noticeable performance variations when employing different random seeds. For example, during the fine-tuning of Deberta-v3-base on the 'mrpc' task, setting the random seed to '0' results in an evaluation accuracy of 85.05. In contrast, when I choose '71' or '37' as the random seed, the evaluation accuracy significantly drops to 68.38, essentially failing to converge.
Could you possibly offer any guidance regarding this matter? Moreover, I would greatly appreciate it if you could disclose the random seeds you utilized in this work.
Thank you!
The text was updated successfully, but these errors were encountered: