Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Finite Scalar Quantization: VQ-VAE Made Simple #72

Closed
kashif opened this issue Sep 28, 2023 · 6 comments
Closed

[feature] Finite Scalar Quantization: VQ-VAE Made Simple #72

kashif opened this issue Sep 28, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@kashif
Copy link
Contributor

kashif commented Sep 28, 2023

Abstract:

We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ. On top of such discrete representations, we can train the same models that have been trained on VQ-VAE representations. For example, autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Concretely, we employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation. Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.

See paper https://arxiv.org/abs/2309.15505

Will read over it and see how to add it

@lucidrains
Copy link
Owner

@kashif nice! you are really on top of the vector quantization literature 😄

yea, i can probably add this as a separate file

@lucidrains
Copy link
Owner

@kashif i'm guessing because this is a deepmind work, no code will be available

@kashif
Copy link
Contributor Author

kashif commented Sep 28, 2023

oh i just saw the appendix.. the code in jax is there... nice!

@lucidrains
Copy link
Owner

@kashif woohoo! see if chatgpt can translate it to pytorch 😆

@lucidrains
Copy link
Owner

if not, that's ok, i'll just do it tomorrow morning, seems very straightforward

@lucidrains lucidrains added the enhancement New feature or request label Sep 28, 2023
@kashif
Copy link
Contributor Author

kashif commented Sep 29, 2023

fixed by #74

@kashif kashif closed this as completed Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants