-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add quantization support #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add quantization support #163
Conversation
|
This PR addresses the CUDA Out of Memory issue discussed in #152 Following the maintainers' feedback that the project needs code-based solutions rather than documentation, I've implemented optional quantization support to directly solve the VRAM limitation problem. |
| return | ||
|
|
||
| full_script = scripts.replace("’", "'").replace('“', '"').replace('”', '"') | ||
| full_script = scripts.replace("'", "'").replace('"', '"').replace('"', '"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pay attention to your code agent. DO NOT introduce bugs like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will pay attention to this, and I have corrected it.
What are your thoughts on the quantization approach? Is it going in right direction or Should I change something?
Implementation of Quantization Support
Following the maintainers' guidance to contribute code rather than documentation, this PR implements the quantization feature discussed previously.
Problem: VibeVoice requires 20GB VRAM, blocking 90% of users
Solution: Optional quantization via
--quantizationparameter (8bit/4bit)Result: Reduces VRAM to 12GB (8-bit) or 7GB (4-bit) with minimal quality loss
Key Features:
Ready for testing and feedback. Happy to make adjustments as needed!