Skip to content

Releases: openvpi/vocoders

PC-NSF-HiFiGAN with ability of pitch shifting and extremely wide pitch range

27 Feb 18:05
Compare
Choose a tag to compare

This release is a major release of the DiffSinger Community Vocoder Project, with our first public model weight of a brand new vocoder architecture: PC-NSF-HiFiGAN. Main improvements:

  • The HN-NSF module is replaced by the super lightweight MiniNSF, which is much faster for computation and GPU acceleration.
  • By applying a special training paradigm, PC-NSF-HiFiGAN gains the ability to shift pitch while preserving formants (like WORLD vocoder), and still achieves the same level of audio quality as normal NSF-HiFiGAN.
  • An effective while universal augmentation workflow is used to expand the pitch range, pushing the typical upper limit to D#7 (2489.0Hz).

This release is distributed as follows:

  • A pretrained model for inference in DiffSinger repository
  • A pretrained model for fine-tuning in SingingVocoders repository (coming soon)
  • A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)

Please note: the file and package names of this released model are different from the former release in Feburary, 2024. You may have to edit your configuration files to switch from the old model to the new model.

Overview

Architecture: PC-NSF-HiFiGAN
Training data: ~79h carefully selected singing voice
Training steps: 40k+108k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Pitch shifting ability: -12 ~ +12 smt.
Pitch range (output): E2 ~ D#7 typical, may shrink with pitch shifting

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.

Special Statements

We regret to publish a verified Registry of Hostile Conduct (shown as below). This registry documents individuals/entities who have engaged in long-term destructive activities against the development team.

We solemnly declare:

  1. Strongly recommend all users review this registry before downloading and using this vocoder
  2. No technical or legal restrictions are currently imposed on listed parties, as the vocoder is still licensed under CC BY-NC-SA 4.0
  3. Reserve the right to apply further restrictions in case of persistent malicious acts

Registry of Hostile Conduct

Name Identifiers Reason
旋转_turning_point QQ: 2673587414;
Bilibili UID: 285801087;
Discord username: colstone233
Engaging in long-term hostile and personal attacks against developers, repeatedly spreading false information about DiffSinger and the development team, and interfering with the development process of the vocoder and other projects in the community

Finetuned NSF-HiFiGAN with expanded pitch range and improved audio quality

19 Feb 12:48
Compare
Choose a tag to compare

This release is an update to the first release of the DiffSinger Community Vocoder Project.

This release contains a model weight that has an expanded pitch range up to C6 (1046.5Hz) and significantly improved audio quality. It is distributed as follows:

  • A pretrained model for inference in DiffSinger repository
  • A pretrained model for fine-tuning in SingingVocoders repository (see release)
  • A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)

Please note: the file and package names of this released model are different from the former release in December, 2022. You may have to edit your configuration files to switch from the old model to the new model.

Overview

Architecture: NSF-HiFiGAN
Training data: ~72h carefully selected singing voice
Training step: 110k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.

Update (2024.08.03)

We significantly optimized the NSF efficiency in the ONNX model and uploaded a new attachment (nsf_hifigan_44.1k_hop512_128bin_2024.02_logE.oudep). Please also note that the new model accepts log E mel-spectrograms, instead of log10 like the old ones do.

NSF-HiFiGAN with 44.1 kHz sampling rate

11 Dec 06:34
Compare
Choose a tag to compare

This release contains the first formal public release of the DiffSinger Community Vocoder Project, which includes:

  • A pretrained model for inference
  • A pretrained model for fine-tuning
  • An ONNX model for lightweight and portable deployment

Overview

Architecture: NSF-HiFiGAN
Training data: ~93h singing voice
Training step: over 1m
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.