-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Optimize the VAE module #667
Comments
Benchmark of no-compile, onediff, and stable-fast compiler.
Because of this to hit my goal of 200 images per second with 1 step sd-turbo I compile the Unet with onediff and the vae with stable-fast. I can avg 5 milliseconds per image using batchsize=12 on my 4090 doing this plus my own perf optimizations. |
While my 200 images/sec is purely a tech demo the usage of the TinyVAE is needed for real time video generation. Using 4 step LCM and TinyVAE I can generate single frame images, no batching at 512x512, in about 37ms. This gets me to 27 fps which is over the 24 fps minimum standard for relatively smooth videos. NOTE: I've been busy with this other exploration and have yet to try your video optimizations to my camera -> LCM sd1.5 -> video demo. It'll be interesting to see if I can get to 30 fps with your optimizations to the UNet. |
Tiny VAE optimization has been completed. Dependency:
With the next test code on A100-PCIE-40GB and set the arg The test example:
|
No description provided.
The text was updated successfully, but these errors were encountered: