Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I don't get the logic of the intermediate output #39

Open
Golbstein opened this issue Dec 9, 2024 · 1 comment
Open

I don't get the logic of the intermediate output #39

Golbstein opened this issue Dec 9, 2024 · 1 comment

Comments

@Golbstein
Copy link

Golbstein commented Dec 9, 2024

Hi, I'm trying to use the model to predict given 2 context images.
But it always predicts 3 different gaussian worlds, i.e., means vector of shape: 3xNx3
I see that in the training step you call it "intermediate output" or "supervise_intermediate_depth"

What's the purpose of that? Why do you get 3 different depth maps of each context image?
Thank you

@haofeixu
Copy link
Member

haofeixu commented Jan 5, 2025

Hi, I guess you are running the base model, where it predicts 3 depth maps at different resolutions. Since we predict residual depths at higher scales, we add supervisions to all the intermediate depth predictions to avoid the depths at earlier stages becoming arbitrary. This is a common strategy in coarse-to-fine architectures and it empricially improves the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants