Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird results when training with provided script on RobotCar loop scene. #43

Open
KongYuJL opened this issue Nov 24, 2021 · 8 comments
Open

Comments

@KongYuJL
Copy link

Hi there, @samarth-robo.
Thanks for your solid work and your well-structured code, I could produce even better results than the numbers in the original paper by loading pre-trained model weights.
Screenshot from 2021-11-24 20-43-21

myplot

However, when I retrained on the RobotCar dataset and loop scene by using the provided script and config file (from latest version):
python train.py --dataset RobotCar --scene loop --config_file configs/mapnet.ini --model mapnet --device 1 --learn_beta --learn_gamma

The results become weird and errors are much larger than I expected.
Screenshot from 2021-11-25 00-58-38

myplot

It's worth noting that, I executed the script on an 8 * NVIDIA RTX 2080ti node:
When I used pytorch-0.4.1, which is specified in your environment.yaml, there was an error detected by cuda: "THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument"
Thus I ran the script both in pytorch 0.4.1 and 1.0.1 environment. However, both errors are very large.

Besides, I also noticed that the preprocessed images have some over-exposure cases (some are almost all white and barely has information), is it normal?

1403774724917727 1403774725105202

like 1403774724292807.png, ... 1403774724917727.png at the beginning of 2014-06-26-09-24-58 and other sequences.

@KongYuJL KongYuJL changed the title The gap between reproduced (trained with provided script) results and reported performance on RobotCar loop scene. Weird results when training with provided script on RobotCar loop scene. Nov 24, 2021
@samarth-robo
Copy link
Contributor

Hi @KongYuJL is it an issue with pose_stats.txt? Every time you train, it overwrites that file (see this line). So you could check two things:

  • is the pose_stats.txt produced by your training very different from the included pose_stats.txt? If yes, that might indicate a problem with the RobotCar SDK.
  • if no, the issue becomes harder to debug, but make sure the eval script uses the pose_stats.txt produced by your training.

@samarth-robo
Copy link
Contributor

process_robotcar_images.py only de-mosaics and undistorts the images. It does not do any exposure correction. So the images you showed should also be the images I used for training, and the results should be comparable. You are right that such over-exposed images are not good for learning, but correcting the exposure is not a part of this paper.

@KongYuJL
Copy link
Author

Hi @KongYuJL is it an issue with pose_stats.txt? Every time you train, it overwrites that file (see this line). So you could check two things:

  • is the pose_stats.txt produced by your training very different from the included pose_stats.txt? If yes, that might indicate a problem with the RobotCar SDK.
  • if no, the issue becomes harder to debug, but make sure the eval script uses the pose_stats.txt produced by your training.

Thanks for your suggestions!
I checked the values saved in pose_stats.txt, and found that values only have little difference.
Screenshot from 2021-11-25 16-21-20

And when I trained on the 7Scenes dataset, there is no problem and the results are very similar to your reported performance.
Now I doubt there are some bugs that existed in the SDK or due to environment inconsistency, and I will try to dig into the code and find the problem.
Anyway, thanks for your patience!

@samarth-robo
Copy link
Contributor

@KongYuJL you can also see if the training loss values match with the pretrained model we provided.

@cccccv-cm
Copy link

你好@KongYuJL这是一个问题吗pose_stats.txt?每次训练时,它都会覆盖该文件(请参阅这一行)。所以你可以检查两件事:

  • pose_stats.txt您的训练产生的结果与随附pose_stats.txt有很大不同吗?如果是,则可能表示 RobotCar SDK 存在问题。
  • 如果不是,问题将变得更难调试,但请确保 eval 脚本使用pose_stats.txt您的训练产生的结果。

感谢您的建议! 我检查了pose_stats.txt中保存的值,发现值差别不大。 截图自 2021-11-25 16-21-20

当我在 7Scenes 数据集上训练时,没有问题,结果与您报告的性能非常相似。 现在我怀疑是SDK中存在一些bug还是环境不一致导致的,我将尝试深入代码找出问题所在。 无论如何,感谢您的耐心等待!

Hello, pardon my presumptuousness. I have encountered the same problem as yours, how did you solve it in the end?

@KongYuJL
Copy link
Author

Well, it's been a while, I don't know if my memory is accurate.
I only remember that the leading cause of the wired result is virtual environment inconsistency.
There are some packages (probably python, pytorch, cuda version) that are not consistent with the provided environment.

@cccccv-cm
Copy link

嗯,时间有点久了,不知道自己的记忆是否准确。 我只记得有线结果的主要原因是虚拟环境不一致。 有一些包(可能是python、pytorch、cuda版本)与提供的环境不一致。

Thanks for your answer. But if it is a problem of inconsistency in the environment, why is it successful on the 7Scenes dataset, and why does it fail on RobotCar? One difference between the two datasets is that RobotCar uses the SDK, could this be the reason?

@KongYuJL
Copy link
Author

@cccccv-cm The data range of RobotCar is different from the 7SCenes, maybe that causes some problems in the training process. Especially there are some differences when using python2 and python3? I'm not sure about this.

It's really wired work in 7Scenes but failed in RobotCar, I tried to find the reason. But cannot find the related codes. I suspect the CUDA version may also be one of the potential reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants