-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Question) About glue tasks #52
Comments
Hi, thanks for your question. Were you using the hyperparameters and settings provided by our paper (appendix)? |
I have the same issue. I have checked the gradient norm and the learning rate are not zero. In the original code, once the metric is initialized, it was not refreshed and kept receiving new predictions across the whole training process. Hence, I manually reload the 'metric' using the 'metric = evaluate.load('glue', args.task_name)'. However, after fixing this potential bug, it seems that while the eval loss of the finetuned model does changes, the accuracy and f1 score metrics remains the same.
|
I spent several hours in adjusting the hyperparameters. I found the adamw optimizer does work with suitable learning rate. You can try this launching command:
This leads to the result It seems that improper learning rate may drive the model to mode collapse, i.e. assigning the same logits on any input sequence. Thus, the accuracy and F1 score remains unchanged as they are doing a fixed guess. |
Hello, thanks for your inspiring and excellent work!
I want to try full fine-tuning to compare with Galora, and I have blocked the use of Galora. However, I'm having some problems that when I try to run the glue task (i.e. mrpc) to full fine-tune roberta, I find that the eval acc doesn't change at all as the training progresses. I have ruled out a possible overfitting problem and I would like to ask the author or anyone else if there is a relevant solution.
The text was updated successfully, but these errors were encountered: