关于代码的一点疑惑 #3

RaynerWu · 2024-11-06T11:02:09Z

你好，看代码的时候有一些疑惑，可以帮忙解答一下吗？DimDownModule.py中mask在step到达self.global_step*0.8的时候就停止更新了，是出于什么考虑呢？

kriskrisliu · 2024-11-06T13:24:51Z

感谢您关注我们工作！
这个mask的regularizer是一个sigmoid函数的变体，随着step的递增，该函数的斜率会越来越大，从而将连续的scaler逐渐推向0或1。但是我们发现step在快要接近设定好的global step时，斜率变得特别大，导致训练不稳定。我们试了几组参数，最终选用0.8，也就是说在到达0.8*global_step时，固定住斜率(此时的斜率已经相对很大了)，但是mask仍然在更新，并且会在active loss（文章中公式9，代码中dimdown_loss）的作用下推向0或1。
如有其他问题欢迎讨论！如果有更巧妙的idea欢迎一起研究！

Thank you for your attention to our work!
This mask regularizer is a variation of the sigmoid function. As the training step increases, the slope of the function becomes steeper, gradually pushing the continuous scalar values toward 0 or 1. However, we observed that as the step approaches the global step, the slope increases sharply, causing instability in the training process. After testing several parameter sets, we decided to fix the slope at 0.8, meaning that at 0.8 * global_step, the slope is held constant (already quite steep at this point). Despite this, the mask continues to update and is still driven towards 0 or 1 under the influence of the active loss (Equation 9 in the paper, dimdown_loss in the code).
If you have any other questions, feel free to discuss them with us! And if you come up with any clever ideas, we’d be glad to explore them together!

RaynerWu · 2024-11-08T02:57:40Z

明白了，感谢解答

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于代码的一点疑惑 #3

关于代码的一点疑惑 #3

RaynerWu commented Nov 6, 2024

kriskrisliu commented Nov 6, 2024

RaynerWu commented Nov 8, 2024

关于代码的一点疑惑 #3

关于代码的一点疑惑 #3

Comments

RaynerWu commented Nov 6, 2024

kriskrisliu commented Nov 6, 2024

RaynerWu commented Nov 8, 2024