Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to train Synthesizer #486

Closed
rlutsyshyn opened this issue Aug 12, 2020 · 10 comments
Closed

Try to train Synthesizer #486

rlutsyshyn opened this issue Aug 12, 2020 · 10 comments

Comments

@rlutsyshyn
Copy link

rlutsyshyn commented Aug 12, 2020

Try to train synthesized on train-clean-100 data, but have the next one issue:

╰─ python synthesizer_preprocess_audio.py datasets --datasets_name LibriSpeech --subfolders train-clean-100
Arguments:
    datasets_root:   datasets
    out_dir:         datasets/SV2TTS/synthesizer
    n_processes:     None
    skip_existing:   False
    hparams:         
    no_alignments:   False
    datasets_name:   LibriSpeech
    subfolders:      train-clean-100

Using data from:
    datasets/LibriSpeech/train-clean-100
LibriSpeech: 100%|█████████████████████████████████████████| 251/251 [00:00<00:00, 6260.45speakers/s]
The dataset consists of 0 utterances, 0 mel frames, 0 audio timesteps (0.00 hours).
Traceback (most recent call last):
  File "synthesizer_preprocess_audio.py", line 59, in <module>
    preprocess_dataset(**vars(args))
  File "/home/roma/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 49, in preprocess_dataset
    print("Max input length (text chars): %d" % max(len(m[5]) for m in metadata))
ValueError: max() arg is an empty sequence

can you help me with this? In the next one steps I also want to try train vocoder on that data

@ghost
Copy link

ghost commented Aug 12, 2020

Do you have the LibriSpeech alignments? A link is on this page: https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training

If it can't find the alignment text files then it thinks there's nothing to process for LibriSpeech.

@rlutsyshyn
Copy link
Author

Yep, see this, but how can I create own one for future fine tuning on my data?

@ghost
Copy link

ghost commented Aug 12, 2020

An alignment file is used to split long utterances into smaller ones. It is unnecessary for datasets like LibriTTS where you can discard samples that are too long and still have a lot of data remaining. See the violin plot below

If you are making a custom dataset, just try to make your samples 2 to 7 seconds in length for training and don't bother with generating alignments. You can manually split long utterances yourself. If you have a very large number of files to work with and must automate it, use something like the Montreal Forced Aligner.

For finetuning your data just make your dataset look like: #437 (comment)

From https://arxiv.org/pdf/1904.02882v1.pdf
libritts_figure1

@rlutsyshyn
Copy link
Author

rlutsyshyn commented Aug 12, 2020

Can you help me with creating dataset for training? I use data from https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ for Ukrainian.
Data looks like:

uk_UK
    |---by_book
          |----female
          |----male
              |---speaker_name
                        |---wavs
                        |---metadata.csv (which consists <filename.wav> | Text in that sound

            ......

Mb I can do it automatically or something like that?

@ghost
Copy link

ghost commented Aug 12, 2020

@rlutsyshyn I suggest you write a script that does this:

  1. Make a list of every metadata.csv
  2. For each metadata.csv:

After you do this, you can move files around to make it look like #437 (comment) and the command I provided there should work.

@rlutsyshyn
Copy link
Author

Hey, how can I contact you? I have some more questions but here is not comfortable to ask it.

@ghost
Copy link

ghost commented Aug 13, 2020

I apologize, I am not available to provide consultation outside of the issues board here. For now, my priorities are 1) code development and 2) bug fixes. I answer support questions as time permits but that is not my purpose here.

@rlutsyshyn
Copy link
Author

rlutsyshyn commented Aug 13, 2020

Okay, understood. I have created dataset for new training (just for testing used one speaker), when I start synthesizer_preprocess_audio.py first it seems good, but after I have an error like that:

Arguments:
    datasets_root:   datasets
    out_dir:         datasets/SV2TTS/synthesizer
    n_processes:     None
    skip_existing:   False
    hparams:         
    no_alignments:   True
    datasets_name:   Ukrainian
    subfolders:      female

Using data from:
    datasets/Ukrainian/female
Ukrainian:   0%|                                                         | 0/1 [01:03<?, ?speakers/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/roma/miniconda3/envs/work/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/roma/Стільниця/Work/NMT/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 76, in preprocess_speaker
    assert text_fpath.exists()
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "synthesizer_preprocess_audio.py", line 59, in <module>
    preprocess_dataset(**vars(args))
  File "/home/roma/Стільниця/Work/NMT/Real-Time-Voice-Cloning/synthesizer/preprocess.py", line 35, in preprocess_dataset
    for speaker_metadata in tqdm(job, datasets_name, len(speaker_dirs), unit="speakers"):
  File "/home/roma/miniconda3/envs/work/lib/python3.7/site-packages/tqdm/std.py", line 1129, in __iter__
    for obj in iterable:
  File "/home/roma/miniconda3/envs/work/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AssertionError

@ghost
Copy link

ghost commented Aug 13, 2020

This is in your traceback: assert text_fpath.exists()

Please check that for every filename.wav in your folder, there is a corresponding filename.txt in the same location.

@ghost
Copy link

ghost commented Aug 14, 2020

Thank you for reporting the bug with librosa 0.8.0 @rlutsyshyn .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant