-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin vocabulary / Multi-Language Support #134
Comments
Not terribly worried about the performance penalty--at this small scale, I don't think it will be a huge issue (but I could be wrong). For starters: an even less ambitious goal would be to provide an easier way to configure PocketSphinx and g2p to work with other languages (this will need to be done regardless) and then let users just write their plugins in the other language. A few notes:
I think this is probably a good idea regardless of the multi-lingual business, though. |
I would like to help for this issue because I need to translate the software in my language but I want to code something every translator may contribute to. After reading #280 I do understand your view for 2.0 milestone is way more complete than just enabling multi-language. I hope I can help you to boost the evolution of this software while reaching my own goal. |
Initial multilanguage support is in PR #383 (work-in-progress), although you can already test it by adding this to your language: 'de-DE' # default is 'en-US'
stt_engine: google # That's only STT engine supporting german at the moment
tts_engine: google-tts # ivona-tts will work too The only plugin that currently has german translations is the That PR is still word-based, I'll possibly add the phrase-based parsing in a different PR. |
I assume now the project only supports English? |
German works too when using the |
Hi, I just tested with the
But it does not work. I get the following error:
And with
The Hope you can help |
Hi, I realized I forgot something. |
And to be able to run the compile_translations.sh script you need to install gettext You'll probably also get a 403 error from google translate. To fix that install the latest version of gTTS After that everything should work fine :-) |
How about multi-language support? Language could be made configurable in
profile.yml
or by using thelocale
module. But how to translate the plugin vocabulary?I suppose that something like
gettext
can be applied tomodule.WORDS
, but unfortunately, the grammar is hardcoded in modules, too.A possible solution
Step 1: Using phrases instead of words
We could use a list of possible phrases instead of a list of words in each module. With this approach, whole phrases will be translated and thus the grammar will still be correct:
Step 2: Use variables in phrases
But what if I want to do something like:
'CHANGE MY BEDROOM LIGHTS COLOR TO BLUE'
The current (word-based) approach
With the current system, I would do something like this:
But unfortunately, this is not translateable and a pain to parse.
The phrase-based approach
But how to do that with phrases? Probably with
str.format()
placeholders:Sample output
Step 3: How to parse?
First we need to transform the base phrases into something that can be matched against another string. Unfortunately, Format strings are not matchable out of the box (at least I think so), but we can archieve that by using regexes.
Converting base phrases to regexes
Matching input phrases against regex phrases
Now we can match our phrase against the regex phrases and even extract the interesting values from them:
Step 4: Getting back from regex to base phrase
This is fairly easy: just match the regex on the base phrases.
Step 5: Connecting actions to matched phrases
We just replace the list
BASE_PHRASES
with a listACTIONS
that contains tuples(base_phrase, action)
, whereaction
is actually a callable object (function, etc.). Of course, the above methods need to be changed accordingly.Step 6: A working example
I provided a proof-of-concept implementation here.
Conclusion
In my opinion, this would not only give plugin developers to parse input easily, but also offers the chance to translate phrases and implement support for different languages. It also makes it possible to parse the base phrases in a way so that we can generate a grammar-based language model (I'm not an expert, but I think so).
The big con is the performance penalty because of the regex stuff, but I think it's worth it.
What do you think?
The text was updated successfully, but these errors were encountered: