Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why unique lexicon is needed in Chinese ASR, but not in English ASR? #1639

Open
yzchen563 opened this issue May 31, 2024 · 0 comments
Open

Comments

@yzchen563
Copy link

yzchen563 commented May 31, 2024

To prepare phone based lang, I see generate_unique_lexicon.py is used in almost every Chinese ASR eg(e.g. aishell-*), but it's not in English ASR(e.g. gigaspeech, librispeech), what's the reason?

I want to use k2.ctc_loss to process multi-pronunciation transcription problem in Chinese ASR, just like the English corpus, in which no special process to make the lexicon unique, is that more accurate than unique_lexicon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant