Time stretching, velocity control and optimized joint data augmentation
Overview
In this release:
- We introduced time stretching augmentation that allows you to control the frame-level velocity of any part of the singing (similar to but much more flexible than VEL parameter in VOCALOID). Here, we are glad to announce that our velocity parameter is a brand-new curve parameter that has probably never been introduced to modern singing voice synthesis architectures and products before. The velocity parameter will bring you free experience to control the texture of consonants and the transition of each part within vowels.
- We implemented a scaling algorithm for multiple types of augmentation that are enabled together. See the dataset making pipeline for more details.
- Custom learning rate decay ratio (gamma) is supported. You are able to control the lr schedule more freely to adapt to more complex datasets.
Random time stretching
Randomly changes the speed of your training data. This will probably improve the stability of long utterances (especially for speaking data) and allows you to control the brand-new velocity parameter as described above. This augmentation can be enabled together with either random or fixed pitch shifting augmentation.
To enable random time stretching augmentation for your former dataset, add the following configuration in the config file:
augmentation_args:
random_time_stretching:
range: [0.5, 2.0]
scale: 2.0
use_speed_embed: true
Control velocity curve in *.ds files
{
"velocity_timestep": "0.005", // timestep in seconds, like f0_timestep
"velocity": "0.5 0.6 0.7 ... 1.8 1.9 2.0", // sequence of float values, like f0_seq
... // other attributes
}
Export to ONNX format
python onnx/export/export_acoustic.py --exp YOUR_EXP_NAME --expose_velocity
Configure lr decay ratio
Add the following configuration in the config file:
gamma: 0.5 # This is the default value. You may use any positive value that is less that 1.
Pretrained models
0218_opencpop_ds1000_velocity
Pretrained model with time stretching augmentation and velocity control.
0223_opencpop_ds1000_joint_aug
Pretrained model with joint augmentation of random pitch shifting and random time stretching.