Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于代码的一点疑惑 #3

Open
RaynerWu opened this issue Nov 6, 2024 · 2 comments
Open

关于代码的一点疑惑 #3

RaynerWu opened this issue Nov 6, 2024 · 2 comments

Comments

@RaynerWu
Copy link

RaynerWu commented Nov 6, 2024

你好,看代码的时候有一些疑惑,可以帮忙解答一下吗?DimDownModule.py中mask在step到达self.global_step*0.8的时候就停止更新了,是出于什么考虑呢?
image

@kriskrisliu
Copy link
Owner

感谢您关注我们工作!
这个mask的regularizer是一个sigmoid函数的变体,随着step的递增,该函数的斜率会越来越大,从而将连续的scaler逐渐推向0或1。但是我们发现step在快要接近设定好的global step时,斜率变得特别大,导致训练不稳定。我们试了几组参数,最终选用0.8,也就是说在到达0.8*global_step时,固定住斜率(此时的斜率已经相对很大了),但是mask仍然在更新,并且会在active loss(文章中公式9,代码中dimdown_loss)的作用下推向0或1。
如有其他问题欢迎讨论!如果有更巧妙的idea欢迎一起研究!

Thank you for your attention to our work!
This mask regularizer is a variation of the sigmoid function. As the training step increases, the slope of the function becomes steeper, gradually pushing the continuous scalar values toward 0 or 1. However, we observed that as the step approaches the global step, the slope increases sharply, causing instability in the training process. After testing several parameter sets, we decided to fix the slope at 0.8, meaning that at 0.8 * global_step, the slope is held constant (already quite steep at this point). Despite this, the mask continues to update and is still driven towards 0 or 1 under the influence of the active loss (Equation 9 in the paper, dimdown_loss in the code).
If you have any other questions, feel free to discuss them with us! And if you come up with any clever ideas, we’d be glad to explore them together!

@RaynerWu
Copy link
Author

RaynerWu commented Nov 8, 2024

明白了,感谢解答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants