Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of "... contains unknown char/phoneme...Symbol will be skipped." warnings popping up. #5890

Closed
godspirit00 opened this issue Jan 31, 2023 · 15 comments
Assignees
Labels
bug Something isn't working stale TTS

Comments

@godspirit00
Copy link

I tried to train a fastpitch model on the Blizzard 2013 dataset. When I started the training, lots of such warnings kept popping up, for example,

[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And then on her BRIY1FLIY0 IH0KSPREH1SIH0NG HHER1 sorrow for what HHIY1 must have suffered, HHIY1 replied,] contains unknown char/phoneme: [A].Original text: [And then on her briefly expressing her sorrow for what he must have suffered, he replied,]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [How different from what it was the last two dances.] contains unknown char/phoneme: [H].Original text: [How different from what it was the last two dances.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [which was VEH1RIY0 LIH1TAH0L relieved by the long speeches of MIH1STER0 Collins.] contains unknown char/phoneme: [C].Original text: [which was very little relieved by the long speeches of mister Collins.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [Henry is different. He LAH1VZ to be DUW1IH0NG.] contains unknown char/phoneme: [H].Original text: [Henry is different. He loves to be doing.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [Henry is different. He LAH1VZ to be DUW1IH0NG.] contains unknown char/phoneme: [H].Original text: [Henry is different. He loves to be doing.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And if I HHAE1V good luck, your mother SHAE1L have some.] contains unknown char/phoneme: [A].Original text: [And if I have good luck, your mother shall have some.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And if I HHAE1V good luck, your mother SHAE1L have some.] contains unknown char/phoneme: [I].Original text: [And if I have good luck, your mother shall have some.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [and I BIH0LIY1V there is SKEH1RSLIY0 a YAH1NG LEY1DIY0 in the United Kingdoms.] contains unknown char/phoneme: [I].Original text: [and I believe there is scarcely a young lady in the United Kingdoms.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [and I BIH0LIY1V there is SKEH1RSLIY0 a YAH1NG LEY1DIY0 in the United Kingdoms.] contains unknown char/phoneme: [U].Original text: [and I believe there is scarcely a young lady in the United Kingdoms.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [It is NAA1T AE1T AO1L what I LAY1K, HHIY1 continued.] contains unknown char/phoneme: [I].Original text: [It is not at all what I like, he continued.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [It is NAA1T AE1T AO1L what I LAY1K, HHIY1 continued.] contains unknown char/phoneme: [I].Original text: [It is not at all what I like, he continued.]. Symbol will be skipped.

But it seems that the original text is made up of just some ordinary words.
Is it normal?
What needs to be done about it?
Thanks!

@godspirit00 godspirit00 added the bug Something isn't working label Jan 31, 2023
@XuesongYang XuesongYang self-assigned this Jan 31, 2023
@XuesongYang
Copy link
Collaborator

could you please share more details? For example, what yaml config of your fastpitch and what version of NeMo you are using? Thanks.

We merged mixed-cases support for graphemes which should distinguish upper from lower if that option is enabled.

@godspirit00
Copy link
Author

@XuesongYang I'm using the latest repo on github, commit hash: c3eeae1a8f3d4276c2d4c92b7277202c341eea79, and config is fastpitch_align_44100.yaml

@daminnock
Copy link

I had the same problem. Now I am using v1.14.0 and it works fine for me...

@godspirit00
Copy link
Author

@daminnock thanks for telling, I'll have a try.

@XuesongYang
Copy link
Collaborator

@daminnock @godspirit00 I am closing this issue because the bug has been fixed. Please feel free to re-open if any problems. Thanks.

@godspirit00
Copy link
Author

@XuesongYang Thank you for the update!
I pulled the latest code, ran pip install -e .[tts] and ran the training again, but encountered another error:

Traceback (most recent call last):
  File "examples/tts/fastpitch.py", line 18, in <module>
    from nemo.collections.tts.models import FastPitchModel
  File "/root/NeMo/nemo/collections/tts/__init__.py", line 16, in <module>
    import nemo.collections.tts.models
  File "/root/NeMo/nemo/collections/tts/models/__init__.py", line 24, in <module>
    from nemo.collections.tts.models.vits import VitsModel
  File "/root/NeMo/nemo/collections/tts/models/vits.py", line 32, in <module>
    from nemo.collections.tts.torch.data import DistributedBucketSampler
  File "/root/NeMo/nemo/collections/tts/torch/data.py", line 30, in <module>
    from nemo.collections.asr.parts.preprocessing.features import WaveformFeaturizer
  File "/root/NeMo/nemo/collections/asr/__init__.py", line 15, in <module>
    from nemo.collections.asr import data, losses, models, modules
  File "/root/NeMo/nemo/collections/asr/models/__init__.py", line 18, in <module>
    from nemo.collections.asr.models.clustering_diarizer import ClusteringDiarizer
  File "/root/NeMo/nemo/collections/asr/models/clustering_diarizer.py", line 33, in <module>
    from nemo.collections.asr.parts.utils.speaker_utils import (
  File "/root/NeMo/nemo/collections/asr/parts/utils/speaker_utils.py", line 742, in <module>
    def fl2int(x: float, decimals: int = 3) -> int:
  File "/root/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 1310, in script
    fn = torch._C._jit_script_compile(
RuntimeError: 
Arguments for call are not valid.
The following variants are available:
  
  aten::round(Tensor self) -> (Tensor):
  Keyword argument decimals unknown.
  
  aten::round.out(Tensor self, *, Tensor(a!) out) -> (Tensor(a!)):
  Argument out not provided.
  
  aten::round.int(int a) -> (float):
  Keyword argument decimals unknown.
  
  aten::round.float(float a) -> (float):
  Keyword argument decimals unknown.
  
  aten::round.Scalar(Scalar a) -> (Scalar):
  Keyword argument decimals unknown.

The original call is:
  File "/root/NeMo/nemo/collections/asr/parts/utils/speaker_utils.py", line 746
    Convert floating point number to integer.
    """
    return torch.round(torch.tensor([x * (10 ** decimals)]), decimals=0).int().item()
           ~~~~~~~~~~~ <--- HERE

What am I missing?
Thank you!

@XuesongYang XuesongYang reopened this Feb 14, 2023
@XuesongYang
Copy link
Collaborator

@godspirit00 could you pls rebase to the latest main and run again ./reinstall.sh?

@godspirit00
Copy link
Author

@XuesongYang I pulled the latest code on main and ran ./reinstall.sh. The installation said All done!.
But when I ran the training again, the error still popped up.

@XuesongYang
Copy link
Collaborator

@tango4j any thoughts about this error?

@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Mar 18, 2023
@tango4j
Copy link
Collaborator

tango4j commented Mar 21, 2023

@XuesongYang Sorry, I missed this issue. I will remove all unnecessary @torch.jit.script decorator and notify here again. Jit script compiler is very strict on type annotation and the actual type. Will notify here after I do PR for this.

@XuesongYang XuesongYang removed the stale label Mar 21, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Apr 21, 2023
@tango4j
Copy link
Collaborator

tango4j commented Apr 26, 2023

@godspirit00
Sorry for the late reply. We removed all @torch.jit.script for the NeMo users who don't want to compile with jit script. This will remove the errors with jit compiler.

@XuesongYang This problem arises because of jit compiler decorator for diarization pipeline. I removed all the decorators.

@godspirit00
Could you pull the latest main again and run it? Please let us know if the problem stays there.

@github-actions github-actions bot removed the stale label Apr 26, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label May 26, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale TTS
Projects
None yet
Development

No branches or pull requests

4 participants