You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thanks or this great work and implementation - I want to use it in my own work.
I have a basic question about the implementation:
assume I have fixed embedding (size 512) with many samples (about 2 million)
I saw in the examples that the values of the MI is changing through the optimization, and moreover the values have high variance but extremally good MSE.
As I understand I will use all the 2 million samples in order to train the CLUB estimator - when is the best time to take the evaluation of the MI? is it best to monitor the loss in order to see it not changing or other measure? what is your suggestion ? and then what portion of the 2 million examples will you use for the evaluation of the true MI? all of them? and then taking the MSE of all the examples?
second question, regarding the architecture of the hidden layer and the network, any suggestion about that for the case I have two variables with 512 dim each?
the last question regarding the robustness of the optimizer, lets assume I will change the two vectors in time, optimizing them for a different task, and I will want to measure the MI again after changing them, will you initialize the optimizer for measuring the MI for the modified vectors or use the last optimizer that was trained?
thanks!
The text was updated successfully, but these errors were encountered:
For your first question, training the CLUB estimator is actually a loglikelihood maximization problem of the conditional distribution Q(Y|X). In my understanding, your question is similar to "when to stop the model training with a negative loglikelihood of learning loss?" And I think this is a complicated question that depends on the data distribution, the model architecture, and other training setups. If we take the evaluation early, the estimator may not well-learned. If evaluated late, the estimator can possibly be over-fitted. So the evaluation time is kind of "tricky" with respect to many factors. It might be helpful if you monitor the variance of the estimation and stop when it converged. Taking all the 2 million samples to evaluate the MI can cause a large computational complexity with all O(n^2) negative sample pairs. You can check our CLUB sample estimator, which reduces the computational complexity to O(n), without losing much estimation performance.
For the second question, I think it also depends on how the variables look like. For example, you can use a CNN encoder-decoder if the two variables are both images.
For the last question, it depends on how different your two vectors changed after the optimization. If you only updated them for a few steps, the vector distributions might not change a lot. So you may still use the learned estimator with a few updating steps to estimate (similar to the examples in mi_minimization.ipynb. If the vectors have changed a lot, there might be little benefit to updating with the previous estimator.
First, thanks or this great work and implementation - I want to use it in my own work.
I have a basic question about the implementation:
assume I have fixed embedding (size 512) with many samples (about 2 million)
I saw in the examples that the values of the MI is changing through the optimization, and moreover the values have high variance but extremally good MSE.
As I understand I will use all the 2 million samples in order to train the CLUB estimator - when is the best time to take the evaluation of the MI? is it best to monitor the loss in order to see it not changing or other measure? what is your suggestion ? and then what portion of the 2 million examples will you use for the evaluation of the true MI? all of them? and then taking the MSE of all the examples?
second question, regarding the architecture of the hidden layer and the network, any suggestion about that for the case I have two variables with 512 dim each?
the last question regarding the robustness of the optimizer, lets assume I will change the two vectors in time, optimizing them for a different task, and I will want to measure the MI again after changing them, will you initialize the optimizer for measuring the MI for the modified vectors or use the last optimizer that was trained?
thanks!
The text was updated successfully, but these errors were encountered: