Kirby

Kirby is a voice recognition mashup created for compiling commands on MacOS. It utilizes SoX for audio input, Google's Speech API for speech-to-text, and a mashup of other engines from there to process the request. Finally any audio output as a result is completed using the classic "say" command.

Prerequisites

A computer running MacOS
You'll need to install SoX with flax support, which can easily be done in Homebrew with the following command:

brew install sox --with-flac

You'll need a machine that can support cURL
[Google Speech Only] An API key for Google Speech, saved in a file called GSPEECH_KEY as the only contents. This is referenced by kirby.sh for making Google Speech Calls.

[api.ai Only] An access token, saved in a file called APIAI_ACCESS_TOKEN as the only contents. A subscription key for the agent, saved in a file called APIAI_SUBSCRIPTION_KEY as the only contents. These are both referenced by kirby.sh for making api.ai calls.

References

SoX Record command format reference

rec [outputfile.extension] \
  rate [sample rate] \
  silence \
    [above-period:0|1] [time] [threshold %] \
    [below-period:-1|1] [time] [threshold %]

For example, the following command records after silence is broken and ends 3 seconds after silence has started again:

rec recording.wav rate 8k silence 1 0.1 3% 1 3.0 3%

This roughly translates to "Record audio to the file recording.wav at a sample rate of 8kHz. Start recording 0.1sec after the audio is 3% above true silence, and continue recording until 3 seconds of audio that is 3% above true silence.

More info here: http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

Google Speech Response reference

For some reason or another Google Speech API includes two json responses in the HTTP response. I imagine it's for when they have 100% confidence, they can put that in the first one, hence the "alternative" tag in the second ;)

Please note that each response is on a single line, and that the second response here is manually formatted by me for readability.

{"result":[]}
{"result":[
  {"alternative": [
    {"transcript":"hello this is only a test","confidence":0.98590767},
    {"transcript":"yellow this is only a test"},
    {"transcript":"El oh this is only a test"},
    {"transcript":"hello this is only a text"},
    {"transcript":"well oh this is only a test"}
  ],
  "final":true}
],
"result_index":0}

api.ai API Documentation

Can be found here: https://docs.api.ai/docs/reference

Acknowledgements

Raspberry Pi voice recognition by Oscar Liang was a huge help: http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like-siri/

This wonderful reference from DigitalCardboard on SoX and silence trimming was also tremendously helpful: http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

This introduction to getting the api key for the Speech API was critical: http://www.chromium.org/developers/how-tos/api-keys

"Reverse Engineering" (author's name, not mine) of Google Speech V2 was an interesting read: https://aminesehili.wordpress.com/2015/02/08/on-the-use-of-googles-speech-recognition-api-version-2/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
webtools		webtools
README.md		README.md
kirby.sh		kirby.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kirby

Prerequisites

References

SoX Record command format reference

Google Speech Response reference

api.ai API Documentation

Acknowledgements

About

Releases

Packages

Languages

mr-z-ro/kirby

Folders and files

Latest commit

History

Repository files navigation

Kirby

Prerequisites

References

SoX Record command format reference

Google Speech Response reference

api.ai API Documentation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages