Try to train Synthesizer #486

rlutsyshyn · 2020-08-12T12:25:52Z

Try to train synthesized on train-clean-100 data, but have the next one issue:

╰─ python synthesizer_preprocess_audio.py datasets --datasets_name LibriSpeech --subfolders train-clean-100
Arguments:
    datasets_root:   datasets
    out_dir:         datasets/SV2TTS/synthesizer
    n_processes:     None
    skip_existing:   False
    hparams:         
    no_alignments:   False
    datasets_name:   LibriSpeech
    subfolders:      train-clean-100

Using data from:
    datasets/LibriSpeech/train-clean-100
LibriSpeech: 100%|█████████████████████████████████████████| 251/251 [00:00<00:00, 6260.45speakers/s]
The dataset consists of 0 utterances, 0 mel frames, 0 audio timesteps (0.00 hours).
Traceback (most recent call last):
  File "synthesizer_preprocess_audio.py", line 59, in <module>
    preprocess_dataset(**vars(args))
  File "/home/roma/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 49, in preprocess_dataset
    print("Max input length (text chars): %d" % max(len(m[5]) for m in metadata))
ValueError: max() arg is an empty sequence

can you help me with this? In the next one steps I also want to try train vocoder on that data

The text was updated successfully, but these errors were encountered:

ghost · 2020-08-12T13:06:06Z

Do you have the LibriSpeech alignments? A link is on this page: https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training

If it can't find the alignment text files then it thinks there's nothing to process for LibriSpeech.

rlutsyshyn · 2020-08-12T13:44:31Z

Yep, see this, but how can I create own one for future fine tuning on my data?

ghost · 2020-08-12T14:08:38Z

An alignment file is used to split long utterances into smaller ones. It is unnecessary for datasets like LibriTTS where you can discard samples that are too long and still have a lot of data remaining. See the violin plot below

If you are making a custom dataset, just try to make your samples 2 to 7 seconds in length for training and don't bother with generating alignments. You can manually split long utterances yourself. If you have a very large number of files to work with and must automate it, use something like the Montreal Forced Aligner.

For finetuning your data just make your dataset look like: #437 (comment)

From https://arxiv.org/pdf/1904.02882v1.pdf

rlutsyshyn · 2020-08-12T18:58:29Z

Can you help me with creating dataset for training? I use data from https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ for Ukrainian.
Data looks like:

uk_UK
    |---by_book
          |----female
          |----male
              |---speaker_name
                        |---wavs
                        |---metadata.csv (which consists <filename.wav> | Text in that sound

            ......

Mb I can do it automatically or something like that?

ghost · 2020-08-12T19:37:01Z

@rlutsyshyn I suggest you write a script that does this:

Make a list of every metadata.csv
For each metadata.csv:
- Parse the metadata using something like this code snippet: Training a new model based on LibriTTS #449 (comment)
- You will have 3 variables: filename, text, and normalized text
- Save the normalized text into filename.txt (so you will have 1 file per utterance)

After you do this, you can move files around to make it look like #437 (comment) and the command I provided there should work.

rlutsyshyn · 2020-08-13T09:40:29Z

Hey, how can I contact you? I have some more questions but here is not comfortable to ask it.

ghost · 2020-08-13T10:26:46Z

I apologize, I am not available to provide consultation outside of the issues board here. For now, my priorities are 1) code development and 2) bug fixes. I answer support questions as time permits but that is not my purpose here.

rlutsyshyn · 2020-08-13T10:47:39Z

Okay, understood. I have created dataset for new training (just for testing used one speaker), when I start synthesizer_preprocess_audio.py first it seems good, but after I have an error like that:

Arguments:
    datasets_root:   datasets
    out_dir:         datasets/SV2TTS/synthesizer
    n_processes:     None
    skip_existing:   False
    hparams:         
    no_alignments:   True
    datasets_name:   Ukrainian
    subfolders:      female

Using data from:
    datasets/Ukrainian/female
Ukrainian:   0%|                                                         | 0/1 [01:03<?, ?speakers/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/roma/miniconda3/envs/work/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/roma/Стільниця/Work/NMT/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 76, in preprocess_speaker
    assert text_fpath.exists()
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "synthesizer_preprocess_audio.py", line 59, in <module>
    preprocess_dataset(**vars(args))
  File "/home/roma/Стільниця/Work/NMT/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 35, in preprocess_dataset
    for speaker_metadata in tqdm(job, datasets_name, len(speaker_dirs), unit="speakers"):
  File "/home/roma/miniconda3/envs/work/lib/python3.7/site-packages/tqdm/std.py", line 1129, in __iter__
    for obj in iterable:
  File "/home/roma/miniconda3/envs/work/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AssertionError

ghost · 2020-08-13T10:52:52Z

This is in your traceback: assert text_fpath.exists()

Please check that for every filename.wav in your folder, there is a corresponding filename.txt in the same location.

ghost · 2020-08-14T00:11:00Z

Thank you for reporting the bug with librosa 0.8.0 @rlutsyshyn .

rlutsyshyn closed this as completed Aug 13, 2020

ghost mentioned this issue Aug 14, 2020

Update librosa and numba versions in requirements.txt #461

Closed

ghost mentioned this issue Oct 31, 2020

Email #583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to train Synthesizer #486

Try to train Synthesizer #486

rlutsyshyn commented Aug 12, 2020 •

edited

Loading

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 12, 2020

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 12, 2020 •

edited

Loading

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 13, 2020

ghost commented Aug 13, 2020

rlutsyshyn commented Aug 13, 2020 •

edited

Loading

ghost commented Aug 13, 2020

ghost commented Aug 14, 2020

Try to train Synthesizer #486

Try to train Synthesizer #486

Comments

rlutsyshyn commented Aug 12, 2020 • edited Loading

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 12, 2020

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 12, 2020 • edited Loading

ghost commented Aug 12, 2020

rlutsyshyn commented Aug 13, 2020

ghost commented Aug 13, 2020

rlutsyshyn commented Aug 13, 2020 • edited Loading

ghost commented Aug 13, 2020

ghost commented Aug 14, 2020

rlutsyshyn commented Aug 12, 2020 •

edited

Loading

rlutsyshyn commented Aug 12, 2020 •

edited

Loading

rlutsyshyn commented Aug 13, 2020 •

edited

Loading