I don't get the logic of the intermediate output #39

Golbstein · 2024-12-09T09:56:19Z

Hi, I'm trying to use the model to predict given 2 context images.
But it always predicts 3 different gaussian worlds, i.e., means vector of shape: 3xNx3
I see that in the training step you call it "intermediate output" or "supervise_intermediate_depth"

What's the purpose of that? Why do you get 3 different depth maps of each context image?
Thank you

haofeixu · 2025-01-05T16:50:12Z

Hi, I guess you are running the base model, where it predicts 3 depth maps at different resolutions. Since we predict residual depths at higher scales, we add supervisions to all the intermediate depth predictions to avoid the depths at earlier stages becoming arbitrary. This is a common strategy in coarse-to-fine architectures and it empricially improves the performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I don't get the logic of the intermediate output #39

I don't get the logic of the intermediate output #39

Golbstein commented Dec 9, 2024 •

edited

Loading

haofeixu commented Jan 5, 2025

I don't get the logic of the intermediate output #39

I don't get the logic of the intermediate output #39

Comments

Golbstein commented Dec 9, 2024 • edited Loading

haofeixu commented Jan 5, 2025

Golbstein commented Dec 9, 2024 •

edited

Loading