Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding question - what value to take of the estimator while evaluating? #18

Open
sdahan12 opened this issue Jan 6, 2023 · 1 comment

Comments

@sdahan12
Copy link

sdahan12 commented Jan 6, 2023

First, thanks or this great work and implementation - I want to use it in my own work.
I have a basic question about the implementation:
assume I have fixed embedding (size 512) with many samples (about 2 million)

I saw in the examples that the values of the MI is changing through the optimization, and moreover the values have high variance but extremally good MSE.

As I understand I will use all the 2 million samples in order to train the CLUB estimator - when is the best time to take the evaluation of the MI? is it best to monitor the loss in order to see it not changing or other measure? what is your suggestion ? and then what portion of the 2 million examples will you use for the evaluation of the true MI? all of them? and then taking the MSE of all the examples?

second question, regarding the architecture of the hidden layer and the network, any suggestion about that for the case I have two variables with 512 dim each?

the last question regarding the robustness of the optimizer, lets assume I will change the two vectors in time, optimizing them for a different task, and I will want to measure the MI again after changing them, will you initialize the optimizer for measuring the MI for the modified vectors or use the last optimizer that was trained?
thanks!

@Linear95
Copy link
Owner

Thanks, Sdahan!

For your first question, training the CLUB estimator is actually a loglikelihood maximization problem of the conditional distribution Q(Y|X). In my understanding, your question is similar to "when to stop the model training with a negative loglikelihood of learning loss?" And I think this is a complicated question that depends on the data distribution, the model architecture, and other training setups. If we take the evaluation early, the estimator may not well-learned. If evaluated late, the estimator can possibly be over-fitted. So the evaluation time is kind of "tricky" with respect to many factors. It might be helpful if you monitor the variance of the estimation and stop when it converged. Taking all the 2 million samples to evaluate the MI can cause a large computational complexity with all O(n^2) negative sample pairs. You can check our CLUB sample estimator, which reduces the computational complexity to O(n), without losing much estimation performance.

For the second question, I think it also depends on how the variables look like. For example, you can use a CNN encoder-decoder if the two variables are both images.

For the last question, it depends on how different your two vectors changed after the optimization. If you only updated them for a few steps, the vector distributions might not change a lot. So you may still use the learned estimator with a few updating steps to estimate (similar to the examples in mi_minimization.ipynb. If the vectors have changed a lot, there might be little benefit to updating with the previous estimator.

Hope this can help:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants