-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
impact of the relaxation on theorem 1 #8
Comments
Hi! Thanks for the questions. Regarding "implicit encoder" --- this was written for some experiments we did with stochastic encoders. However, eventually all our attempts to train WAE with stochastic encoders ended up with deterministic encoders, i.e. the encoders preferred to reduce their variance to zero. This was partially reported in one of our follow up papers (Paul Rubenstein, Ilya Tolstikhin, On the latent space of Wasserstein autoencoders). The question of how to train stochastic encoders with WAE and preserve the noise structure is an interesting open problem. Regarding your first question, indeed, we loose any sort of guarantees by relaxing the equation constraint with the penalty. There is a paper called "Sinkhorn autoencoders" which shows that, roughly, using a simple triangle inequality you can prove that the WAE with a Wasserstein divergence in the latent space provides an upper bound on the original transport distance in the input space. I don't know any other results on that topic. Would be interesting to find out more about it! Best wishes, |
Thank you for the very fast reply, Ilya! For the second part, thanks for the pointer. That doesn't describe what's happening super accurately but at least, this triangle inequality is a very good start! And also thank you (and your co-authors) for writing such a beautiful paper and those smaller follow-ups that really have me feel smarter after I read them. :) |
Indeed, what we observed in the Paul Rubenstein's paper is that even though the stochastic encoders decide to drop the variance (i.e. converge to the deterministic ones), these resulting deterministic encoders are different from those you would obtain by training plain deterministic ones. I think this is an interesting topic to look into.. Thank you very much for your kind words! And good luck with your research as well! |
Hello,
Correct me if I'm wrong but my understanding is that the simplified expression of the Wasserstein distance, obtained in theorem 1 relies heavily on the hypothesis that the latent codes distribution matches exactly the prior.
But with the necessary relaxation on this constraint, the hypothesis doesn't hold. Do you have any sense of what is happening when the constraint is "violated too much" (e.g. lambda is too small...)?
I haven't had time to run an empirical study and can't wrap my head around what it implies "theoretically".
Any insight to share?
Also, in your implementation, I notice there is an "implicit" model of noise for the encoder. I understand that the noise is parameterized by a neural network that is learnt along in a training of the WAE but can you give a bit more of an insight about it? I can't find any reference to it in the WAE paper or any of the follow-ups I know. Any pointer?
Thanks.
The text was updated successfully, but these errors were encountered: