Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impact of the relaxation on theorem 1 #8

Open
pierremac opened this issue Nov 1, 2018 · 3 comments
Open

impact of the relaxation on theorem 1 #8

pierremac opened this issue Nov 1, 2018 · 3 comments

Comments

@pierremac
Copy link

Hello,

Correct me if I'm wrong but my understanding is that the simplified expression of the Wasserstein distance, obtained in theorem 1 relies heavily on the hypothesis that the latent codes distribution matches exactly the prior.
But with the necessary relaxation on this constraint, the hypothesis doesn't hold. Do you have any sense of what is happening when the constraint is "violated too much" (e.g. lambda is too small...)?
I haven't had time to run an empirical study and can't wrap my head around what it implies "theoretically".
Any insight to share?

Also, in your implementation, I notice there is an "implicit" model of noise for the encoder. I understand that the noise is parameterized by a neural network that is learnt along in a training of the WAE but can you give a bit more of an insight about it? I can't find any reference to it in the WAE paper or any of the follow-ups I know. Any pointer?

Thanks.

@tolstikhin
Copy link
Owner

Hi!

Thanks for the questions.

Regarding "implicit encoder" --- this was written for some experiments we did with stochastic encoders. However, eventually all our attempts to train WAE with stochastic encoders ended up with deterministic encoders, i.e. the encoders preferred to reduce their variance to zero. This was partially reported in one of our follow up papers (Paul Rubenstein, Ilya Tolstikhin, On the latent space of Wasserstein autoencoders). The question of how to train stochastic encoders with WAE and preserve the noise structure is an interesting open problem.

Regarding your first question, indeed, we loose any sort of guarantees by relaxing the equation constraint with the penalty. There is a paper called "Sinkhorn autoencoders" which shows that, roughly, using a simple triangle inequality you can prove that the WAE with a Wasserstein divergence in the latent space provides an upper bound on the original transport distance in the input space. I don't know any other results on that topic. Would be interesting to find out more about it!

Best wishes,
Ilya

@pierremac
Copy link
Author

Thank you for the very fast reply, Ilya!
Looks like I had missed the bit about the implicit encoder in the "On the latent space of WAE" paper, my bad. Thanks for the pointer!
I guess it kind of makes sense that the stochastic encoders would converge to deterministic ones. I'm also curious about how it goes along the training. Does it change its dynamics? Does it lead to some faster or more stable training maybe?

For the second part, thanks for the pointer. That doesn't describe what's happening super accurately but at least, this triangle inequality is a very good start!

And also thank you (and your co-authors) for writing such a beautiful paper and those smaller follow-ups that really have me feel smarter after I read them. :)

@tolstikhin
Copy link
Owner

Indeed, what we observed in the Paul Rubenstein's paper is that even though the stochastic encoders decide to drop the variance (i.e. converge to the deterministic ones), these resulting deterministic encoders are different from those you would obtain by training plain deterministic ones. I think this is an interesting topic to look into..

Thank you very much for your kind words! And good luck with your research as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants