Poses Local Minima #26

xiaoyaolovlife · 2024-11-15T05:11:38Z

作者您好，非常优秀的工作，感谢你们做出的贡献！我有一个疑问：根据图片中的公式，我为了使L尽量小，有两个优化变量θ和T，在学习的过程中有没有可能陷入local minima，举个极端的例子，当所有virtual camera在优化过程中都聚集在一起了，这时候重建的结果就是一个模糊的场景，你们提出的无论是线性插值，cubic插值在某种程度上打破了这种情况的发生，但是如果我想用learning的思路生成virtual camera时就遇到了我所说的问题，请问你们有什么建议吗，感谢

"Dear authors, excellent work, and thank you for your contributions! I have a question: according to the formula in the image, in order to minimize L as much as possible, there are two optimization variables, θ and T. During the learning process, is it possible to get stuck in a local minimum? To give an extreme example, when all the virtual cameras converge together during optimization, the result of the reconstruction would be a blurry scene. The linear interpolation and cubic interpolation you proposed can, to some extent, prevent this situation from happening. However, if I want to use a learning-based approach to generate the virtual cameras, I encounter the problem I just described. Do you have any suggestions? Thank you."

LingzheZhao · 2024-11-15T08:15:09Z

您好,感谢您对我们工作的关注!

Local minima是有可能出现的：当相机的pose（轨迹）无法被准确优化出来的时候，3DGS场景为了去overfit各个多视角训练视图(在这里会有各种不同的模糊的pattern)，会在场景中产生很多高频的条纹状的3DGS，这时候我们会尝试去通过调低densification的阈值来降低场景中的3DGS数量，这时候会牺牲掉一些细节，但是3DGS就不容易往高频去overfit，迫使pose去往正确方向优化（这时候也可以一边观察tensorboard中pose的变化量，一边调整pose的学习率，让pose更快收敛）。

不过这些trick肯定是不优雅的，还有改进的空间。比如我们可能可以根据运动模糊的pattern（类似于2D的blur kernel），给camera trajectory提供一个初始估计；还有比如在连续拍摄的图片序列（视频）作为输入的时候，我们可以用一些运动学约束，比如速度、连续轨迹等，来更好地估计相机pose。

Hello, thank you for your interest in our work!

Local minima is possible: when the camera pose (trajectory) cannot be accurately estimated, the 3DGS scene will generate many high-frequency striped 3DGS patterns in the scene in order to overfit the multi-view training views (here with various blurring patterns). And we will try to reduce the number of 3DGS in the scene by lowering the density threshold. Although some details are lost, but 3DGS will tend no to overfit to high frequencies, forcing the pose to optimize in the right direction. In the meantime, you can also watch the curve of delta-pose in tensorboard while adjusting the pose learning rate to make the pose converge faster.

However, these tricks are definitely not elegant and we can seek for some improvement. For example, we may be able to provide an initial estimate for the camera trajectory based on the motion blur pattern (similar to the 2D blur kernel); and for example, when a sequence of continuously captured pictures (i.e. a video) is used as input, we can use some kinematic constraints, such as speed, continuous trajectory, etc., to better estimate the camera poses.

xiaoyaolovlife · 2024-11-15T09:28:42Z

感谢您的回复，“运动模糊的pattern（类似于2D的blur kernel），给camera trajectory提供一个初始估计”和“连续拍摄的图片序列（视频）作为输入的时候，我们可以用一些运动学约束”这两个思路我有想过，1. 但是无论从空域还是频域都无法从2D的模糊核中获取到z轴的运动信息吧，您有什么建议吗，2. 想在运动模糊视频中加入约束，比如速度和连续轨迹也存在少一个维度的问题吧

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poses Local Minima #26

Poses Local Minima #26

xiaoyaolovlife commented Nov 15, 2024

LingzheZhao commented Nov 15, 2024 •

edited

Loading

xiaoyaolovlife commented Nov 15, 2024

Poses Local Minima #26

Poses Local Minima #26

Comments

xiaoyaolovlife commented Nov 15, 2024

LingzheZhao commented Nov 15, 2024 • edited Loading

xiaoyaolovlife commented Nov 15, 2024

LingzheZhao commented Nov 15, 2024 •

edited

Loading