You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:
| Model | gsm8k | MultiArith | SVAMP |
| --------------- | --------------- |
| Flan-Alpaca-Base | 13.42 | 20.33 | 19.50 |
| Flan-Alpaca-Large | 14.40 | 19.83 | 17.80 |
| Flan-Alpaca-XL | 9.25 | 13.83 | 14.30 |
Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data?
Thank you~
The text was updated successfully, but these errors were encountered:
Hi, thanks for the interesting analysis! The gsm8k and SVAMP datasets are indeed used for Flan-T5 training but we are not sure about the reason for the trend of worse performance with model size. This definitely deserves a closer look, please let us know what you find!
Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:
| Model | gsm8k | MultiArith | SVAMP |
| --------------- | --------------- |
| Flan-Alpaca-Base | 13.42 | 20.33 | 19.50 |
| Flan-Alpaca-Large | 14.40 | 19.83 | 17.80 |
| Flan-Alpaca-XL | 9.25 | 13.83 | 14.30 |
Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data?
Thank you~
The text was updated successfully, but these errors were encountered: