-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epoch 10开始loss都为NaN #16
Comments
你的总batch size是多少,正常应该是8*64。8张卡,每张64。这样学习率才能对上。不然要根据batch size调整lr |
bs没改过,还是64,我用的3张卡,bs和lr要怎么调整呢 |
相当于你现在batchsize是3 x 64,我原始设定的是8 x 64 |
相当于你现在batchsize是364,我原始设定的是864 |
lr应该是缩小这么多倍8/3 |
可以看一下vq diffusion这边论文里面的公式推导。然后这篇论文里面有讲解了一些代码https://arxiv.org/pdf/2102.05379.pdf。 |
在14页左右 |
缩小batchsize 会影响精度吗?通过梯度累加方式能让小batchsize 和大batchsize 保持一样的精度吗? |
缩小应该是会影响精度的,不过影响应该不会很大。可以用梯度累加 |
epoch 10开始loss都为NaN,现在跑到epoch 15还要继续吗?epoch 5 CIDEr就100了,到epoch 10 CIDEr逐渐降到77,这是什么缘故呢?
The text was updated successfully, but these errors were encountered: