Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

Commit

Permalink
Add WORLD pitch estimators and F0 range as hyperparameters (openvpi#149)
Browse files Browse the repository at this point in the history
* Added WORLD pitch estimators

i also removed hardcoded F0 ranges because what the heck is that 800 Hz max pitch in parselmouth that is way too low

* Update README.md

okay maybe don't add ur own flare in the readme if u actually want to create a pull req

* Apply dtype change

i saw it in the parselmouth thing might as well put it in to make sure

* Update pw.py

oops

* fix dtype mismatch

for some reason pyworld only likes float64?

* add f0 range as a hyperparameter

why isn't it a hyperparameter in the first place

* move pad_frames to pw

i think world is p accurate with the frames stuff but it's just to ensure

* change padding algorithm

it's just to be similar to the parselmouth one.. it makes sense to not center the F0 after all

* remove duplicate line

yeah

* Remove DIO, change default range, add docs

* Add notice for F0 range
  • Loading branch information
UtaUtaUtau authored Nov 20, 2023
1 parent fbff2e8 commit 931df27
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 3 deletions.
2 changes: 2 additions & 0 deletions configs/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ train_set_name: 'train'
valid_set_name: 'valid'
pe: 'parselmouth'
pe_ckpt: ''
f0_min: 65
f0_max: 800
vocoder: ''
vocoder_ckpt: ''
num_valid_plots: 10
Expand Down
15 changes: 15 additions & 0 deletions docs/BestPractices.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,21 @@ pe: rmvpe
pe_ckpt: checkpoints/rmvpe/model.pt
```

### Harvest

[Harvest](https://github.com/mmorise/World) (Harvest: A high-performance fundamental frequency estimator from speech signals) is the recommended pitch extractor from Masanori Morise's WORLD, a free software for high-quality speech analysis, manipulation and synthesis. It is a state-of-the-art algorithmic pitch estimator designed for speech, but has seen use in singing voice synthesis. It runs the slowest compared to the others, but provides very accurate F0 on clean and normal recordings compared to parselmouth.

To use Harvest, simply include the following line in your configuration file:
```yaml
pe: harvest
```

**Note:** It is also recommended to change the F0 detection range for Harvest with accordance to your dataset, as they are hard boundaries for this algorithm and the defaults might not suffice for most use cases. To change the F0 detection range, you may include or edit this part in the configuration file:
```yaml
f0_min: 65 # Minimum F0 to detect
f0_max: 800 # Maximum F0 to detect
```

## Performance tuning

This section is about accelerating training and utilizing hardware.
Expand Down
3 changes: 3 additions & 0 deletions modules/pe/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from utils import hparams

from .pm import ParselmouthPE
from .pw import HarvestPE, DioPE
from .rmvpe import RMVPE


Expand All @@ -11,5 +12,7 @@ def initialize_pe():
return ParselmouthPE()
elif pe == 'rmvpe':
return RMVPE(pe_ckpt)
elif pe == 'harvest':
return HarvestPE()
else:
raise ValueError(f" [x] Unknown f0 extractor: {pe}")
25 changes: 25 additions & 0 deletions modules/pe/pw.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from basics.base_pe import BasePE
import numpy as np
import pyworld as pw
from utils.pitch_utils import interp_f0

class HarvestPE(BasePE):
def get_pitch(self, waveform, length, hparams, interp_uv=False, speed=1):
hop_size = int(np.round(hparams['hop_size'] * speed))

time_step = 1000 * hop_size / hparams['audio_sample_rate']
f0_floor = hparams['f0_min']
f0_ceil = hparams['f0_max']

f0, _ = pw.harvest(waveform.astype(np.float64), hparams['audio_sample_rate'], f0_floor=f0_floor, f0_ceil=f0_ceil, frame_period=time_step)
f0 = f0.astype(np.float32)

if f0.size < length:
f0 = np.pad(f0, (0, length - f0.size))
f0 = f0[:length]
uv = f0 == 0

if interp_uv:
f0, uv = interp_f0(f0, uv)
return f0, uv

6 changes: 3 additions & 3 deletions utils/binarizer_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ def get_pitch_parselmouth(wav_data, length, hparams, speed=1, interp_uv=False):
"""
hop_size = int(np.round(hparams['hop_size'] * speed))
time_step = hop_size / hparams['audio_sample_rate']
f0_min = 65
f0_max = 800
f0_min = hparams['f0_min']
f0_max = hparams['f0_max']

l_pad = int(np.ceil(1.5 / f0_min * hparams['audio_sample_rate']))
r_pad = hop_size * ((len(wav_data) - 1) // hop_size + 1) - len(wav_data) + l_pad + 1
wav_data = np.pad(wav_data, (l_pad, r_pad))

# noinspection PyArgumentList
s = parselmouth.Sound(wav_data, sampling_frequency=hparams['audio_sample_rate']).to_pitch_ac(
time_step=time_step, voicing_threshold=0.6,
Expand Down

0 comments on commit 931df27

Please sign in to comment.