You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you in the process of publishing the training scripts also add some intuition about the training procedure and your training metrics for the GPU's/no. of steps/memory requirements, etc. ?
Thanks in advance !
The text was updated successfully, but these errors were encountered:
Thanks for your interest in this work! I'm very busy right now writing another paper and also preparing for job hunting and graduation, but I have included all the information needed for training in the Model Training section of the paper. I did training using Jupyter Notebook again, so it was pretty messy, but I'll share the code once it's cleaned.
It can take some time to clean the code, especially on the librilight dataset. The big model took a month to train on my lab's GPUs, although some experiments were conducted on H100 during my internship, which made it much faster. If anyone is willing to provide computation resources to debug/clean the code on large-scale models, feel free to email me at yl4579@columbia.edu. Also email me too if you want to help me debug/clean the code.
I have gotten many emails in less than a day. Thank you very much! However, I think it is difficult to coordinate the task individually through email, so I have created a discord server for that purpose. Please join the discord server if you are willing to help :)
Hi, very interesting paper!
Could you in the process of publishing the training scripts also add some intuition about the training procedure and your training metrics for the GPU's/no. of steps/memory requirements, etc. ?
Thanks in advance !
The text was updated successfully, but these errors were encountered: