decoding error in preprocessing synthesizer #439

amintavakol · 2020-07-23T00:48:10Z

I get the following error while running synthesizer_preprocess_audio.py.

Arguments:
    datasets_root:   /home/amin/voice_cloning/libri_100
    out_dir:         /home/amin/voice_cloning/libri_100/SV2TTS/synthesizer
    n_processes:     None
    skip_existing:   True
    hparams:         

Using data from:
    /home/amin/voice_cloning/libri_100/LibriSpeech/train-clean-100
LibriSpeech:   0%|                                                                                                                                       | 0/502 [00:00<?, ?speakers/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/amin/voice_cloning/Real-Time-Voice-Cloning-master/synthesizer/preprocess.py", line 62, in preprocess_speaker
    alignments = [line.rstrip().split(" ") for line in alignments_file]
  File "/home/amin/voice_cloning/Real-Time-Voice-Cloning-master/synthesizer/preprocess.py", line 62, in <listcomp>
    alignments = [line.rstrip().split(" ") for line in alignments_file]
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 37: invalid start byte
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "synthesizer_preprocess_audio.py", line 52, in <module>
    preprocess_librispeech(**vars(args))    
  File "/home/amin/voice_cloning/Real-Time-Voice-Cloning-master/synthesizer/preprocess.py", line 36, in preprocess_librispeech
    for speaker_metadata in tqdm(job, "LibriSpeech", len(speaker_dirs), unit="speakers"):
  File "/home/amin/.local/lib/python3.6/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 37: invalid start byte

Can anyone help? It can save a lot of time for me.
Thanks.

The text was updated successfully, but these errors were encountered:

ghost · 2020-07-23T04:40:39Z

Can you try it with the 392_single_threaded_preprocess branch of my fork and post the traceback? It will help to know which alignment file it is breaking on.

ghost · 2020-07-23T06:53:35Z

Real-Time-Voice-Cloning/synthesizer/preprocess.py

Line 60 in eaf5ec4

with alignments_fpath.open("r") as alignments_file:

Try making this modification:

with alignments_fpath.open("r", encoding="ascii") as alignments_file:

ghost · 2020-07-24T06:22:11Z

@amintavakol Did you resolve the issue?

amintavakol · 2020-07-24T20:41:47Z

Yes, that fixes the issue.
Also changing the try, except block in synthesizer/preprocess.py to this:

try:
     alignments_fpath = next(book_dir.glob("*.alignment.txt"))
     with alignments_fpath.open("r") as alignments_file:
          alignments = [line.rstrip().split(" ") for line in alignments_file]
      except :
           # A few alignment files will be missing
           continue

keeps the preprocessing running for the non-problematic files.

shoegazerstella · 2020-08-13T08:48:07Z

I am having the same issue but appearing in synthesizer/synthesizer_dataset.py line 13 which I tried to solve like:

        metadata = []
        with metadata_fpath.open("r", encoding="ascii") as metadata_file:
            #metadata = [line.split("|") for line in metadata_file]
            try:
                for line in metadata_file:
                    metadata.append(line.split("|"))
            except Exception as e:
                ex = e

But now I have this much samples in the training dataset and I am not sure it is correct:
Found 24353 samples

ghost · 2020-08-13T10:33:43Z

@shoegazerstella My processed train-clean-100 and train-clean-360 for LibriTTS has 111,521 samples.

Can you print the exception?

shoegazerstella · 2020-08-13T10:41:56Z

2020-08-12 15:33:52 - INFO - b'Starting the training of Tacotron from scratch\n'
2020-08-12 15:33:52 - INFO - b'\n'
2020-08-12 15:33:52 - INFO - b'Using inputs from:\n'
2020-08-12 15:33:52 - INFO - b'\t/opt/ml/input/data/train/train.txt\n'
2020-08-12 15:33:52 - INFO - b'\t/opt/ml/input/data/train/mels\n'
2020-08-12 15:33:52 - INFO - b'\t/opt/ml/input/data/train/embeds\n'
2020-08-12 15:33:52 - INFO - b'Traceback (most recent call last):\n'
2020-08-12 15:33:52 - INFO - b'  File "synthesizer_train.py", line 33, in <module>\n'
2020-08-12 15:33:52 - INFO - b'    train(**vars(args))\n'
2020-08-12 15:33:52 - INFO - b'  File "/root/voicecloning/synthesizer/train.py", line 112, in train\n'
2020-08-12 15:33:52 - INFO - b'    dataset = SynthesizerDataset(metadata_fpath, mel_dir, embed_dir)\n'
2020-08-12 15:33:52 - INFO - b'  File "/root/voicecloning/synthesizer/synthesizer_dataset.py", line 14, in __init__\n'
2020-08-12 15:33:52 - INFO - b'    metadata = [line.split("|") for line in metadata_file]\n'
2020-08-12 15:33:52 - INFO - b'  File "/root/voicecloning/synthesizer/synthesizer_dataset.py", line 14, in <listcomp>\n'
2020-08-12 15:33:52 - INFO - b'    metadata = [line.split("|") for line in metadata_file]\n'
2020-08-12 15:33:52 - INFO - b'  File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode\n'
2020-08-12 15:33:52 - INFO - b'    return codecs.ascii_decode(input, self.errors)[0]\n'
2020-08-12 15:33:52 - INFO - b"UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1481: ordinal not in range(128)\n"

ghost · 2020-08-13T11:01:00Z

Does it help if you change line 13 synthesizer_dataset.py to:

with metadata_fpath.open("r", encoding="utf-8") as metadata_file:

I think your system locale causes files to be saved as utf-8 by default so certain characters are out of range when loading them as ascii.

shoegazerstella · 2020-08-13T11:09:01Z

You are right, now I see:
Found 76052 samples

I should also mention this log from the preprocessing:

The dataset consists of 76052 utterances, 21949820 mel frames, 6025708370 audio timesteps (75.91 hours).
Max input length (text chars): 158
Max mel frames length: 500
Max audio timesteps length: 137374

So this seems to be in line with it!

ghost · 2020-08-13T11:15:03Z

@shoegazerstella If restricting the max mel frames length to 500, I have 76,153 samples. (The other number is using the default of 900) So everything seems to be working well now!

amintavakol closed this as completed Jul 24, 2020

shoegazerstella mentioned this issue Aug 13, 2020

Training a new model based on LibriTTS #449

Closed

ghost reopened this Aug 13, 2020

ghost closed this as completed Aug 13, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decoding error in preprocessing synthesizer #439

decoding error in preprocessing synthesizer #439

amintavakol commented Jul 23, 2020 •

edited

Loading

ghost commented Jul 23, 2020

ghost commented Jul 23, 2020

ghost commented Jul 24, 2020

amintavakol commented Jul 24, 2020 •

edited

Loading

shoegazerstella commented Aug 13, 2020

ghost commented Aug 13, 2020

shoegazerstella commented Aug 13, 2020

ghost commented Aug 13, 2020

shoegazerstella commented Aug 13, 2020 •

edited

Loading

ghost commented Aug 13, 2020

decoding error in preprocessing synthesizer #439

decoding error in preprocessing synthesizer #439

Comments

amintavakol commented Jul 23, 2020 • edited Loading

ghost commented Jul 23, 2020

ghost commented Jul 23, 2020

ghost commented Jul 24, 2020

amintavakol commented Jul 24, 2020 • edited Loading

shoegazerstella commented Aug 13, 2020

ghost commented Aug 13, 2020

shoegazerstella commented Aug 13, 2020

ghost commented Aug 13, 2020

shoegazerstella commented Aug 13, 2020 • edited Loading

ghost commented Aug 13, 2020

amintavakol commented Jul 23, 2020 •

edited

Loading

amintavakol commented Jul 24, 2020 •

edited

Loading

shoegazerstella commented Aug 13, 2020 •

edited

Loading