Releases: openvpi/vocoders
PC-NSF-HiFiGAN with ability of pitch shifting and extremely wide pitch range
This release is a major release of the DiffSinger Community Vocoder Project, with our first public model weight of a brand new vocoder architecture: PC-NSF-HiFiGAN. Main improvements:
- The HN-NSF module is replaced by the super lightweight MiniNSF, which is much faster for computation and GPU acceleration.
- By applying a special training paradigm, PC-NSF-HiFiGAN gains the ability to shift pitch while preserving formants (like WORLD vocoder), and still achieves the same level of audio quality as normal NSF-HiFiGAN.
- An effective while universal augmentation workflow is used to expand the pitch range, pushing the typical upper limit to D#7 (2489.0Hz).
This release is distributed as follows:
- A pretrained model for inference in DiffSinger repository
- A pretrained model for fine-tuning in SingingVocoders repository (coming soon)
- A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)
Please note: the file and package names of this released model are different from the former release in Feburary, 2024. You may have to edit your configuration files to switch from the old model to the new model.
Overview
Architecture: PC-NSF-HiFiGAN
Training data: ~79h carefully selected singing voice
Training steps: 40k+108k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Pitch shifting ability: -12 ~ +12 smt.
Pitch range (output): E2 ~ D#7 typical, may shrink with pitch shifting
Notice
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.
Special Statements
We regret to publish a verified Registry of Hostile Conduct (shown as below). This registry documents individuals/entities who have engaged in long-term destructive activities against the development team.
We solemnly declare:
- Strongly recommend all users review this registry before downloading and using this vocoder
- No technical or legal restrictions are currently imposed on listed parties, as the vocoder is still licensed under CC BY-NC-SA 4.0
- Reserve the right to apply further restrictions in case of persistent malicious acts
Registry of Hostile Conduct
Name | Identifiers | Reason |
---|---|---|
旋转_turning_point | QQ: 2673587414; Bilibili UID: 285801087; Discord username: colstone233 |
Engaging in long-term hostile and personal attacks against developers, repeatedly spreading false information about DiffSinger and the development team, and interfering with the development process of the vocoder and other projects in the community |
Finetuned NSF-HiFiGAN with expanded pitch range and improved audio quality
This release is an update to the first release of the DiffSinger Community Vocoder Project.
This release contains a model weight that has an expanded pitch range up to C6 (1046.5Hz) and significantly improved audio quality. It is distributed as follows:
- A pretrained model for inference in DiffSinger repository
- A pretrained model for fine-tuning in SingingVocoders repository (see release)
- A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)
Please note: the file and package names of this released model are different from the former release in December, 2022. You may have to edit your configuration files to switch from the old model to the new model.
Overview
Architecture: NSF-HiFiGAN
Training data: ~72h carefully selected singing voice
Training step: 110k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Notice
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.
Update (2024.08.03)
We significantly optimized the NSF efficiency in the ONNX model and uploaded a new attachment (nsf_hifigan_44.1k_hop512_128bin_2024.02_logE.oudep). Please also note that the new model accepts log E mel-spectrograms, instead of log10 like the old ones do.
NSF-HiFiGAN with 44.1 kHz sampling rate
This release contains the first formal public release of the DiffSinger Community Vocoder Project, which includes:
- A pretrained model for inference
- A pretrained model for fine-tuning
- An ONNX model for lightweight and portable deployment
Overview
Architecture: NSF-HiFiGAN
Training data: ~93h singing voice
Training step: over 1m
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Notice
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.