Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance - Russian obscene STT #123

Open
snakers4 opened this issue Mar 19, 2019 · 2 comments
Open

Poor performance - Russian obscene STT #123

snakers4 opened this issue Mar 19, 2019 · 2 comments

Comments

@snakers4
Copy link

Hi!

Many thanks for your amazing easy to use STT product!
I have yet to learn how to use your text models, but STT seems to work out-of-the-box really fine.

My language in Russian, and you may know that it features a great deal of obscene words, that people commonly use in some contexts.

In our use-case we have to recognize these words as well as ordinary words.
Looks like your language model on top of acoustic model does not know them.
We could add our own language model, but in this case we would need raw acoustic model outputs.

Is is somehow possible with the current API?
Looks like the pywit it just a requests wrapper and 99% of work is done on server-side.

@patapizza
Copy link
Member

Hi @snakers4,

Thank you for the kind words.

Indeed, pywit is just a thin wrapper of our HTTP API. For tracking service-related questions and issues, we use https://github.com/wit-ai/wit/issues.

Personalized language models is something we want to support down the road. I'll share your input with the team. In the meantime, you can use the voice inbox to correct the transcripts.

@snakers4
Copy link
Author

I'll share your input with the team

Many thanks!

Turns our there are much simpler ways to check data at scale:

  • Check via calculating WER against another source of annotation;
  • Check the number of words / number of letters vs. duration of the clips - there should be direct correlation, if there is none, then STT quality is low;
  • Truncate clips that have less than 2 words or 10 symbols;
  • Truncate clips that have special symbols, latin symbols, etc;

A combination of these basically allows to build fast heuristics to take only the most relevant texts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants