Incorrect setup of Learning Rate Scheduler #81

aswathn1 · 2024-06-08T00:03:22Z

Hello! Thanks for sharing your great work.

I noticed a discrepancy in the way you setup the learning rate scheduler in finetune.py.

When you calculate:
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
Dividing by the total batch size across multiple GPUs should be giving the right number of update steps per epoch instead of gradient_accumulation_steps. This in turn affects your warmup schedule and also your linear decay schedule for your learning rate.

I've also been having issues with reproducing your results with a locally fine-tuned Llama-2 7B model using your codebase and settings compared to your Huggingface checkpoint. So please let me know if you can share any feedback on any additional settings needed to reproduce the Huggingface chcekpoint level perfromance. Thank you.

The text was updated successfully, but these errors were encountered:

fate-ubw · 2024-07-10T08:30:42Z

Hi ~ I have also been having issues with reproducing the selfrag-7B, I got a low evaluation result compared with eval results from paper。Whould you share your reproduction result from fine tune Llama-2 7B into selfrag 7B

aswathn1 · 2024-07-13T01:27:06Z

I ran their finetuning script without making any changes and using the hyperparameter settings from their finetuning scripts on PopQA with retrieval using their pre-computed top-k files and was only able to get up to 34.28% on PopQA and 69.60% on PubHealth and str-em of 28.02 and rg of 35.76 on ASQA. Can you share the results you were able to reproduce? That would be helpful for context.

fate-ubw · 2024-07-14T02:29:29Z

my results is as following:
base-mode: llama2-hf (no llama-2-chat)
epoch: 3
mode : always retrieval ( you have to attention the mode when retrieval, always retrieval and adaptive retrieval has diff performance
PopQA: 0.546
PubHealth:0.678
arc : 0.569
which is lower then paper, but better then your result. Hope to help you~
I have a question about the wrong code in finetune.py

num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)

Have you attempted to train a model using the aforementioned code along with the corrected finetune code? How does the correct code influence the results?

aswathn1 · 2024-07-14T02:32:13Z

I did and it did increase the numbers but still lower than the paper.

fate-ubw · 2024-07-14T07:29:43Z

Could you please tell me how to modify the above code in finetune.py to make it correct～I would like to test whether the correct code can reproduce the results presented in the paper. ths a lots

fate-ubw · 2024-07-14T07:51:24Z

I found finetune.py in self rag is revised based on open-instrction finetune

…ignal code of finetune.py from open-instruction is correct

fate-ubw added a commit to fate-ubw/raglab-exp that referenced this issue Jul 14, 2024

[fix] check the issue of lr: AkariAsai/self-rag#81 conclusion: the or…

4823689

…ignal code of finetune.py from open-instruction is correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect setup of Learning Rate Scheduler #81

Incorrect setup of Learning Rate Scheduler #81

aswathn1 commented Jun 8, 2024

fate-ubw commented Jul 10, 2024

aswathn1 commented Jul 13, 2024

fate-ubw commented Jul 14, 2024

aswathn1 commented Jul 14, 2024

fate-ubw commented Jul 14, 2024

fate-ubw commented Jul 14, 2024

Incorrect setup of Learning Rate Scheduler #81

Incorrect setup of Learning Rate Scheduler #81

Comments

aswathn1 commented Jun 8, 2024

fate-ubw commented Jul 10, 2024

aswathn1 commented Jul 13, 2024

fate-ubw commented Jul 14, 2024

aswathn1 commented Jul 14, 2024

fate-ubw commented Jul 14, 2024

fate-ubw commented Jul 14, 2024