Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using different num of views in a tuple #34

Open
Twilight89 opened this issue Feb 27, 2023 · 5 comments
Open

Using different num of views in a tuple #34

Twilight89 opened this issue Feb 27, 2023 · 5 comments

Comments

@Twilight89
Copy link

Twilight89 commented Feb 27, 2023

Hi, thanks for your wonderful job and readily code!

In my case, I want to use arbitrary views in a tuple, like from 2 to 8. And I have noticed that in your original paper, you did the experiment to test the influence of view num.

So do I have to retrain a model to suit one particular view num in a tuple? Or I just need one model like HERO_MODEL to test different num_images_in_tuple.

I try to directly change the model_num_views in options.py【of course I generate data_split file with num_images_in_tuple: 2】, but when I run, the terminal shows that Number of source views: 7 and I got shape error like below.

########################## Using FeatureVolumeManager ##########################
Number of source views: 7
Using all metadata.
Number of channels: [202, 128, 128, 1]
################################################################################

0%| | 0/37 [00:27<?, ?it/s]
0%| | 0/1 [00:27<?, ?it/s]
Traceback (most recent call last):
File "/root/simplerecon/test.py", line 473, in
main(opts)
File "/root/simplerecon/test.py", line 270, in main
outputs = model(
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/simplerecon/experiment_modules/depth_model.py", line 361, in forward
cost_volume, lowest_cost, _, overall_mask_bhw = self.cost_volume(
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/simplerecon/modules/cost_volume.py", line 360, in forward
self.build_cost_volume(
File "/root/simplerecon/modules/cost_volume.py", line 727, in build_cost_volume
feature_b1hw = self.mlp(
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/simplerecon/modules/networks.py", line 147, in forward
return self.net(x)
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/root/.pyenv/versions/simplerecon/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (393216x46 and 202x128)

So I assume that for different view num, I have to train a particular model to suit the size, is that right?

Hope to hear from you! THX:)

@mohammed-amr
Copy link
Collaborator

mohammed-amr commented Feb 27, 2023

The "hero model" with metadata that we provide only supports 7 source views (8 views total). To use the repo with another number of source views you'll have to either

  1. train other metadata models with the number of source views you want to have,
  2. use the dot product model as the correlation is agnostic to number of views,
  3. or use the hero model but duplicate the remaining source view indices using what you have in the tuple.

Here's a longer explanation for 3. Say you have generated a tuple of size 4, but your model uses 8. Duplicate source views using the three you already have. Concretely:

Tuple of four:
frame_99, frame_80, frame_76, frame_70

For your tuple of eight, just randomly duplicate source views from the tuple of four:
frame_99, frame_80, frame_76, frame_70. frame_70, frame_76, frame_76. frame_80

The scripts for tuple generation do some version of this duplication for test tuples when the number of tuples is in the list isn't enough. You can insert a little flag in there to force this behavior. Change n_measurement_frames at this line to three for three source views (four views in total). The block at this line will perform the random sample repeat for you.

@Twilight89
Copy link
Author

Thanks for your detailed and early reply! I have tried using the dot product model and successfully generate pred_depth with 2 images in a tuple. So the model contains two parts: metadata and 2D CNN, right? But if want to train a new HERO_MODEL, I also have to train both parts since it is end-to-end?

@mohammed-amr
Copy link
Collaborator

Great!

For the hero model, yes. You'd need to retrain the entire model end to end.

@mohammed-amr
Copy link
Collaborator

mohammed-amr commented Mar 1, 2023

To elaborate on that. The feature volume's MLP does collapse features down to a single value, and from all my experiments, the MLP does end up learning some form of higher is better score.

Maybe you can get away with training a 4 view volume with a frozen 8 view decoder network? You'd need to try it. Do let us know if you do! Would be cool to learn from that.

@Twilight89
Copy link
Author

@mohammed-amr Hi, thanks again for your elaboration. I'll try to do that!
I also want to try whether simplerecon can run on a mobile device (as you say in the paper, it is potentially enabling use in embedded and resource-constrained environments). What I want to do is to lower the memory usage and inference time. That's why I want to try 2 views in a tuple. There are two questions for me now:

  1. According to your experiments, do you think it can run on mobiles in real-time with minimal cost (like just using two views)?
  2. And what about the generalization of different input image sizes? For example, 360640 (512384 in your implementation). I have tried to use the images captured by my iPhone(360640) to run. The images are resized to 512384 before sending to the model. The depth seems quite OK but the fused mesh is bad.

Could you give me some suggestions about them? I will try my best to it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants