Releases: openvpi/DiffSinger
ONNX version of the duration predictor with FastSpeech2MIDI
This release contains ONNX models for phoneme duration prediction.
These models can be temporary tools to generate phoneme durations for MIDI-less acoustic models that have no ability to predict durations themselves.
Pretrained model for MIDI-less mode
1215_opencpop_ds1000_fix_label_nomidi
MIDI-less mode, strict pinyin dictionary, 44.1kHz sampling rate, fixed some phoneme label errors and trained for 320k steps.
Note: both ph_dur
and f0_seq
should be given to run inference.
Pretrained models for new 44.1 kHz vocoder
High quality, high performance pretrained acoustic models with 44.1 kHz full-band synthesis support.
To run inference with these models, a vocoder from DiffSinger Community Vocoder Project is required.
1117_opencpop_ds1000_strict_pinyin
MIDI-A mode, with 1k diffusion steps and 512x20 WaveNet, using the new strict pinyin dictionary.
1122_opencpop_ds1000_strict_pinyin_384x30
MIDI-A mode, same as above but with 384x30 WaveNet.
[Experimental] pretrained models
These are pretrained models from the OpenVPI team.
Note: models are experimental. They are currently consistent with the original repository but may not be compatible in the future. Using these models with main.py
is suggested. See more details in the code.
0814_opencpop_ds_rhythm_fix
MIDI-B mode, fixes rhythm errors described in this issue.
0823_opencpop_ds_enhancement
MIDI-B mode, improve performance in high pitch range by shifting pitch of training data with WORLD vocoder, but may cause worse sound quality.
0831_opencpop_ds1000
MIDI-B mode, trained with 1k diffusion steps for better sound quality and pndm and dpm-solver acceleration.
0909_opencpop_ds100_pitchcontrol
MIDI-A mode, support manually editing pitch. It is highly recommended that the pitch must be specified because the automatic predicted pitch is very bad and is supposed to be fixed in the future updates.
0920_opencpop_ds1000
MIDI-A mode, trained with 1k diffusion steps and more training epochs.