Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate background_latents? #19

Open
mujc21 opened this issue Sep 1, 2024 · 1 comment
Open

How to generate background_latents? #19

mujc21 opened this issue Sep 1, 2024 · 1 comment

Comments

@mujc21
Copy link

mujc21 commented Sep 1, 2024

Thank you very much for your excellent work. I am currently trying to apply the SyncMVD method using the StableDiffusionUpscalePipeline. However, the latent space in this pipeline is not obtained through vae.encode, so the method of combining with a background in SyncMVD should be modified. However, after reviewing the code for StableDiffusionUpscalePipeline, I tried several methods to initialize the background_latents, but none were successful.

color_images = torch.ones(
    (1, 1, latent_size * 8, latent_size * 8),
    device=self._execution_device,
    dtype=self.text_encoder.dtype
) * color_images
color_images *= ((0.5 * color_images) + 0.5)
color_latents = encode_latents(self.vae, color_images)
background_latents = [self.color_latents[color] for color in background_colors]
composited_tensor = composite_rendered_view(self.scheduler, background_latents, latents, masks, t)
latents = composited_tensor.type(latents.dtype)

Is it possible to avoid compositing the latent space with the background? If it is necessary, how should the background_latents be generated in the StableDiffusionUpscalePipeline?

@LIU-Yuxin
Copy link
Owner

LIU-Yuxin commented Sep 19, 2024

To my understanding, the pipeline still uses latents encoded with the vae, the difference is that it is concatenated with a low-resolution image. It that case, the noisy latents should be composited in the same way as current SyncMVD, while the low-resolution should use noiseless latents encoded from the same color.
color_images *= ((0.5 * color_images) + 0.5)
Also please change *= in this line to = which was a typo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants