You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to use the model to predict given 2 context images.
But it always predicts 3 different gaussian worlds, i.e., means vector of shape: 3xNx3
I see that in the training step you call it "intermediate output" or "supervise_intermediate_depth"
What's the purpose of that? Why do you get 3 different depth maps of each context image?
Thank you
The text was updated successfully, but these errors were encountered:
Hi, I guess you are running the base model, where it predicts 3 depth maps at different resolutions. Since we predict residual depths at higher scales, we add supervisions to all the intermediate depth predictions to avoid the depths at earlier stages becoming arbitrary. This is a common strategy in coarse-to-fine architectures and it empricially improves the performance.
Hi, I'm trying to use the model to predict given 2 context images.
But it always predicts 3 different gaussian worlds, i.e., means vector of shape: 3xNx3
I see that in the training step you call it "intermediate output" or "supervise_intermediate_depth"
What's the purpose of that? Why do you get 3 different depth maps of each context image?
Thank you
The text was updated successfully, but these errors were encountered: