-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double punctiation break phonemization #54
Comments
Hi, can I have a complete example of a failing command please, with input text and options? |
Ok I understood the bug, it occurs when trying to restore punctuation on an empty text. I'll publish a fix soon. Thanks for reporting. |
Fixed in ee591ed. |
Don't know if this is related or not, but: 000004280: Hélas! . ni l'un ni l'autre ne ressemblait au sien.
Traceback (most recent call last):
File "/home/muksihs/git/Cherokee-TTS/data/comvoi_ipa/generateTrainingData.py", line 59, in <module>
use_sampa=False)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/phonemize.py", line 172, in phonemize
text, separator=separator, strip=strip, njobs=njobs)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/backend/base.py", line 126, in phonemize
text = self._punctuator.restore(text, punctuation_marks)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 146, in restore
return cls._restore_aux(str2list(text), marks, 0)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range
|
Hi, indeed you should upgrade your phonemizer version: >>> from phonemizer import phonemize
>>> utt = "Hélas! . ni l'un ni l'autre ne ressemblait au sien."
>>> phonemize(utt, backend='espeak', language='fr-fr', preserve_punctuation=True)
'elas ! . ni lœ̃ ni lotʁ nə ʁəsɑ̃blɛt o sjɛ̃ .' I got the version $ phonemize --version
phonemizer-2.2.2
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I do not have an extensive list, but many double punctuation patterns break the phonemization. One example being
!'
Phonemizer from pip version 2.2
The text was updated successfully, but these errors were encountered: