Training details #1

lumpidu · 2024-09-17T10:50:47Z

Hi, very interesting paper!

Could you in the process of publishing the training scripts also add some intuition about the training procedure and your training metrics for the GPU's/no. of steps/memory requirements, etc. ?

Thanks in advance !

yl4579 · 2024-09-17T18:16:50Z

Thanks for your interest in this work! I'm very busy right now writing another paper and also preparing for job hunting and graduation, but I have included all the information needed for training in the Model Training section of the paper. I did training using Jupyter Notebook again, so it was pretty messy, but I'll share the code once it's cleaned.

It can take some time to clean the code, especially on the librilight dataset. The big model took a month to train on my lab's GPUs, although some experiments were conducted on H100 during my internship, which made it much faster. If anyone is willing to provide computation resources to debug/clean the code on large-scale models, feel free to email me at yl4579@columbia.edu. Also email me too if you want to help me debug/clean the code.

yl4579 · 2024-09-18T17:17:01Z

I have gotten many emails in less than a day. Thank you very much! However, I think it is difficult to coordinate the task individually through email, so I have created a discord server for that purpose. Please join the discord server if you are willing to help :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training details #1

Training details #1

lumpidu commented Sep 17, 2024

yl4579 commented Sep 17, 2024

yl4579 commented Sep 18, 2024

Training details #1

Training details #1

Comments

lumpidu commented Sep 17, 2024

yl4579 commented Sep 17, 2024

yl4579 commented Sep 18, 2024