Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many detaild questions #13

Open
shuwei666 opened this issue Jun 8, 2023 · 1 comment
Open

Many detaild questions #13

shuwei666 opened this issue Jun 8, 2023 · 1 comment

Comments

@shuwei666
Copy link

shuwei666 commented Jun 8, 2023

Thanks for your great work! it has indeed sparked a lot of inspiration for me. However, there are several aspects that I would like to discuss further:

The paper mentioned: "To allow the network to reason about the set of additional input images in a way that is insensitive to their ordering, we adopt the permutation invariant pooling approach of Aittala et al."

1. Could you elaborate on why insensitivity to ordering is crucial? Specifically, I'm curious whether a sufficiently large training dataset would inherently cover all potential orderings.

Regarding the number of additional unlabeld images (m), it appears that were used in both the training and testing stages. From the ablation study, it seems that various values of m were only tested on the test camera, as illustrated in Table 4. I have a question about this:

2. During the training process, did you experiment with varying quantities for 'm', or was there a consistent fixed number applied throughout, for example, 8?

When m equals 1, I understand that this means only the query image is used during testing. If so, my question is:

4. Could you clarify whether m=1 only signifies the zero-shot condition, i.e., just inferring, or does it mean that the single query image is used for self-calibration, followed by parameter fixation, and then inference?

5. From the results shown in Table 4, it doesn't seem that the results improve as m increases(i.e., error(m=13)>error(m=7)). Could you provide some insights into this?

6. Have you considered using additional labeled images for fine-tuning? If so, would this lead to better results than the current method?

Thank you for taking the time to answer these questions. Your responses will be greatly beneficial to my understanding.

@mahmoudnafifi
Copy link
Owner

Hi, thanks for your questions. Here are the responses below.

Could you elaborate on why insensitivity to ordering is crucial? Specifically, I'm curious whether a sufficiently large training dataset would inherently cover all potential orderings.

It may happen that the network ignore one or more of the additional inputs and rely on others during training. To prevent that we did the permutation invariant pooling.

During the training process, did you experiment with varying quantities for 'm', or was there a consistent fixed number applied throughout, for example, 8?

The value of 'm' affects our network architecture, where we have 'm' encoders. So when we say, for example, m=7, that means we have 7 encoders, and ofc that 6 additional images are used for training and testing.

Could you clarify whether m=1 only signifies the zero-shot condition, i.e., just inferring, or does it mean that the single query image is used for self-calibration, followed by parameter fixation, and then inference?

m = 1 means that only the query image is used as an input with no additional images

From the results shown in Table 4, it doesn't seem that the results improve as m increases(i.e., error(m=13)>error(m=7)). Could you provide some insights into this?

This needs more investigation, but probably one of the reasons is that m=13, for example, requires 13 encoders and increases our model capacity thus leads to some overfitting.

Have you considered using additional labeled images for fine-tuning? If so, would this lead to better results than the current method?

Fine-tuning on testing set would definitely help and may lead to better results. But the target of our paper is to avoid any further training/tuning .. kind of taking on a challenge, but in practical I would argue that fine-tuning on a small set is still practical and may lead to better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants