Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time cost #10

Open
OrangeSodahub opened this issue Sep 4, 2023 · 7 comments
Open

Training time cost #10

OrangeSodahub opened this issue Sep 4, 2023 · 7 comments

Comments

@OrangeSodahub
Copy link

I was wondering the time cost for training, with specific configurations. Is the nerf-branch time-consuming?

@chenfengxu714
Copy link

The training is very efficient, it takes only around 7-10 hours on 4 V100.

@OrangeSodahub
Copy link
Author

OrangeSodahub commented Sep 4, 2023

@chenfengxu714 That's amazing. Let me double check, for the training process, you trained on the full scannet train dataset (~1000 scenes), and per iteration per scene, and each scene uses 20 views images as the input (in nerf branch and det branch), cost 16 GB memory.

I wonder in the inference, can it input many enough images (even all regardless the memory) to represent the entire scene well (nerf branch).

@chenfengxu714
Copy link

Yes, you can. But I suggest using our advanced config that is inspired by our previous work SOLOFusion, i.e., using more frames but low-resolution images. This is more effective and efficient. But if your memory is sufficient and you do not care about latency, just use as many frames and high resolutions as possible.

@OrangeSodahub
Copy link
Author

OrangeSodahub commented Sep 6, 2023

@chenfengxu714 I saw in your configs just randomly choose 50 images per scene, and randomly choose 10 target views, (means other 40 are source views). In 'volume' mode you build scene volume from 40 source views and render 10 target views. And you caculate the number of points seen by source views along the rays from target views to mask part of them, I wonder is that reasonable to render a ray even if not all points are seen along this ray?

And in testing, how could you make sure that 10 target views totally convered by 40 source views?

@chenfengxu714
Copy link

Good question. We do not deal with this issue, and this is indeed a difficult one. We did find that if the views are sparse, the NVS is much worse, i.e., much cases do not target points projecting to source views. Then this is reduced to vallina NeRF which can be represented as "density, h = mlp(pos_enc) -> rgb = mlp(h, view_dir)". This is also why our solofusion tricks help since we reduce resolutions but largely increase the views.

@OrangeSodahub
Copy link
Author

@chenfengxu714 I wonder if you could share the training logs of mmdet, like the loss value, etc.

@chenfengxu714
Copy link

Sorry I did want but I can’t access the workstation I used now. I will reclean the code and polish everything of the code and experiments on my own machines after CVPR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants