Weird results when training with provided script on RobotCar loop scene. #43

KongYuJL · 2021-11-24T16:59:18Z

Hi there, @samarth-robo.
Thanks for your solid work and your well-structured code, I could produce even better results than the numbers in the original paper by loading pre-trained model weights.

However, when I retrained on the RobotCar dataset and loop scene by using the provided script and config file (from latest version):
python train.py --dataset RobotCar --scene loop --config_file configs/mapnet.ini --model mapnet --device 1 --learn_beta --learn_gamma

The results become weird and errors are much larger than I expected.

It's worth noting that, I executed the script on an 8 * NVIDIA RTX 2080ti node:
When I used pytorch-0.4.1, which is specified in your environment.yaml, there was an error detected by cuda: "THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument"
Thus I ran the script both in pytorch 0.4.1 and 1.0.1 environment. However, both errors are very large.

Besides, I also noticed that the preprocessed images have some over-exposure cases (some are almost all white and barely has information), is it normal?

like 1403774724292807.png, ... 1403774724917727.png at the beginning of 2014-06-26-09-24-58 and other sequences.

samarth-robo · 2021-11-24T19:33:29Z

Hi @KongYuJL is it an issue with pose_stats.txt? Every time you train, it overwrites that file (see this line). So you could check two things:

is the pose_stats.txt produced by your training very different from the included pose_stats.txt? If yes, that might indicate a problem with the RobotCar SDK.
if no, the issue becomes harder to debug, but make sure the eval script uses the pose_stats.txt produced by your training.

samarth-robo · 2021-11-24T19:37:10Z

process_robotcar_images.py only de-mosaics and undistorts the images. It does not do any exposure correction. So the images you showed should also be the images I used for training, and the results should be comparable. You are right that such over-exposed images are not good for learning, but correcting the exposure is not a part of this paper.

KongYuJL · 2021-11-25T08:22:10Z

Hi @KongYuJL is it an issue with pose_stats.txt? Every time you train, it overwrites that file (see this line). So you could check two things:

is the pose_stats.txt produced by your training very different from the included pose_stats.txt? If yes, that might indicate a problem with the RobotCar SDK.

if no, the issue becomes harder to debug, but make sure the eval script uses the pose_stats.txt produced by your training.

Thanks for your suggestions!
I checked the values saved in pose_stats.txt, and found that values only have little difference.

And when I trained on the 7Scenes dataset, there is no problem and the results are very similar to your reported performance.
Now I doubt there are some bugs that existed in the SDK or due to environment inconsistency, and I will try to dig into the code and find the problem.
Anyway, thanks for your patience!

samarth-robo · 2021-11-26T18:33:43Z

@KongYuJL you can also see if the training loss values match with the pretrained model we provided.

cccccv-cm · 2023-02-11T02:28:43Z

你好@KongYuJL这是一个问题吗pose_stats.txt？每次训练时，它都会覆盖该文件（请参阅这一行）。所以你可以检查两件事：

pose_stats.txt您的训练产生的结果与随附的pose_stats.txt有很大不同吗？如果是，则可能表示 RobotCar SDK 存在问题。

如果不是，问题将变得更难调试，但请确保 eval 脚本使用pose_stats.txt您的训练产生的结果。

感谢您的建议！我检查了pose_stats.txt中保存的值，发现值差别不大。

当我在 7Scenes 数据集上训练时，没有问题，结果与您报告的性能非常相似。现在我怀疑是SDK中存在一些bug还是环境不一致导致的，我将尝试深入代码找出问题所在。无论如何，感谢您的耐心等待！

Hello, pardon my presumptuousness. I have encountered the same problem as yours, how did you solve it in the end?

KongYuJL · 2023-02-11T03:09:35Z

Well, it's been a while, I don't know if my memory is accurate.
I only remember that the leading cause of the wired result is virtual environment inconsistency.
There are some packages (probably python, pytorch, cuda version) that are not consistent with the provided environment.

cccccv-cm · 2023-02-11T05:09:25Z

嗯，时间有点久了，不知道自己的记忆是否准确。我只记得有线结果的主要原因是虚拟环境不一致。有一些包（可能是python、pytorch、cuda版本）与提供的环境不一致。

Thanks for your answer. But if it is a problem of inconsistency in the environment, why is it successful on the 7Scenes dataset, and why does it fail on RobotCar? One difference between the two datasets is that RobotCar uses the SDK, could this be the reason?

KongYuJL · 2023-02-11T08:59:16Z

@cccccv-cm The data range of RobotCar is different from the 7SCenes, maybe that causes some problems in the training process. Especially there are some differences when using python2 and python3? I'm not sure about this.

It's really wired work in 7Scenes but failed in RobotCar, I tried to find the reason. But cannot find the related codes. I suspect the CUDA version may also be one of the potential reasons.

KongYuJL changed the title ~~The gap between reproduced (trained with provided script) results and reported performance on RobotCar loop scene.~~ Weird results when training with provided script on RobotCar loop scene. Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird results when training with provided script on RobotCar loop scene. #43

Weird results when training with provided script on RobotCar loop scene. #43

KongYuJL commented Nov 24, 2021

samarth-robo commented Nov 24, 2021

samarth-robo commented Nov 24, 2021

KongYuJL commented Nov 25, 2021

samarth-robo commented Nov 26, 2021

cccccv-cm commented Feb 11, 2023

KongYuJL commented Feb 11, 2023

cccccv-cm commented Feb 11, 2023

KongYuJL commented Feb 11, 2023

Weird results when training with provided script on RobotCar loop scene. #43

Weird results when training with provided script on RobotCar loop scene. #43

Comments

KongYuJL commented Nov 24, 2021

samarth-robo commented Nov 24, 2021

samarth-robo commented Nov 24, 2021

KongYuJL commented Nov 25, 2021

samarth-robo commented Nov 26, 2021

cccccv-cm commented Feb 11, 2023

KongYuJL commented Feb 11, 2023

cccccv-cm commented Feb 11, 2023

KongYuJL commented Feb 11, 2023