Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoch 10开始loss都为NaN #16

Open
ontheway-arch opened this issue Jan 17, 2023 · 11 comments
Open

epoch 10开始loss都为NaN #16

ontheway-arch opened this issue Jan 17, 2023 · 11 comments

Comments

@ontheway-arch
Copy link

epoch 10开始loss都为NaN,现在跑到epoch 15还要继续吗?epoch 5 CIDEr就100了,到epoch 10 CIDEr逐渐降到77,这是什么缘故呢?

@buxiangzhiren
Copy link
Owner

你的总batch size是多少,正常应该是8*64。8张卡,每张64。这样学习率才能对上。不然要根据batch size调整lr

@ontheway-arch
Copy link
Author

bs没改过,还是64,我用的3张卡,bs和lr要怎么调整呢

@buxiangzhiren
Copy link
Owner

可以看看这个https://zhuanlan.zhihu.com/p/64864995?utm_id=0

@buxiangzhiren
Copy link
Owner

buxiangzhiren commented Jan 18, 2023

相当于你现在batchsize是3 x 64,我原始设定的是8 x 64

@buxiangzhiren
Copy link
Owner

相当于你现在batchsize是364,我原始设定的是864

@buxiangzhiren
Copy link
Owner

lr应该是缩小这么多倍8/3

@ontheway-arch
Copy link
Author

图片1

这里有公式吗 有点看不太懂

@buxiangzhiren
Copy link
Owner

可以看一下vq diffusion这边论文里面的公式推导。然后这篇论文里面有讲解了一些代码https://arxiv.org/pdf/2102.05379.pdf。

@buxiangzhiren
Copy link
Owner

可以看一下vq diffusion这边论文里面的公式推导。然后这篇论文里面有讲解了一些代码https://arxiv.org/pdf/2102.05379.pdf。

在14页左右

@verigle
Copy link

verigle commented Feb 6, 2023

lr应该是缩小这么多倍8/3

缩小batchsize 会影响精度吗?通过梯度累加方式能让小batchsize 和大batchsize 保持一样的精度吗?

@buxiangzhiren
Copy link
Owner

缩小应该是会影响精度的,不过影响应该不会很大。可以用梯度累加

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants