Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input Requirements for Optimal Results in Image, Video, and Audio Generation #10

Open
WeizhenEricFang opened this issue Jan 20, 2025 · 1 comment

Comments

@WeizhenEricFang
Copy link

Hello,

I’ve tried uploading several images and videos, but the results haven’t been satisfactory. Could you please clarify the input requirements for generating good results? Specifically:

What are the ideal conditions for the images, videos, and audio inputs?
Are there any specific recommendations regarding aspect ratio for inputs to ensure high-quality outputs?
I would appreciate any guidelines or best practices that could help improve the results.

Thank you!

@digital-avatar
Copy link
Collaborator

@WeizhenEricFang
Can you explain the specific circumstances of poor results?

Generally speaking, the most important requirement is that the source image mouth is closed, and the audio requirements are not too strict. If the pronunciation is clear and there is no other sound except speaking, the driving effect will be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants