Table 2 hyperparameters #68

athglentis · 2025-02-25T23:48:10Z

Hello, thanks for releasing the code!

Is it possible that you also release the exact hyperparameters that you used to obtain the results of Table 2 (C4 dataset experiments)?

I'm especially interested in the optimal learning rates that you found for each model-method configuration based on the tuning you mention in the appendix section C.1:

"For all methods on each size of models (from 60M to 1B), we tune their favorite learning rate from a set of {0.01, 0.005, 0.001, 0.0005, 0.0001}, and the best learning rate is chosen based on the validation perplexity."

Releasing those hyperparameters would be a great help as I'm trying to replicate the results of your paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table 2 hyperparameters #68

Table 2 hyperparameters #68

athglentis commented Feb 25, 2025

Table 2 hyperparameters #68

Table 2 hyperparameters #68

Comments

athglentis commented Feb 25, 2025