S.A.R.A.H. is an OpenSource client/server framework to control Internet of Things using Voice, Gesture, Face, QRCode recognition. It is heavily bound to Kinect v1 and v2 SDK.
This project contains C# Client for SARAH. And will communicate with NodeJS Server for SARAH.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2012 S.A.R.A.H. <sarah.project@encausse.net>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
This program is free software. It comes without any warranty, to
the extent permitted by applicable law. You can redistribute it
and/or modify it under the terms of the Do What The Fuck You Want
To Public License, Version 2, as published by S.A.R.A.H. See
http://www.wtfpl.net/ for more details.
Be Warned: some dependencies may require fees for commercial uses. Kinect XBox 360 can ONLY be used for development purpose
SARAH v4.0 Client is built on an AddOns Architecture sharing common properties. Each AddOn:
- communicate with each other using AddOnManager
- MUST implements IAddOn or AbstractAddOn
- MUST be declared in properties and can be prevent from starting with
{name}.enable
property - Use Tasks to work with stream asynchronously and tune CPU usage.
Client's Addons are stored in AddOns folder and have no relation with SARAH plugins. The idea is to sandbox features and library into subprojects.
This page describe Client's Core/AddOns specification. SARAH developers should see :
UNDER CONSTRUCTION
This AddOn use EmguCV to Detect and Draw blobs in camera frame according to color, shape, features, ...
SARAH provides NBody: an abstract representation of Body / Skeleton implemented by Kinect 1, Kinect 2, and other computer vision tools.
This AddOn draw body parts, joints and relevant areas.
- HTTP & XML
gesture
parameter to start/stop recognition
This AddOn implements logging features to output relevant data about SARAH:
- All logs in a log file
- Audio (wav) of matching commands
- Audio (wav) of SARAH voice speech
; Path to log file
log-file=AddOns/debug/${shortdate}.log
; Path to recognized command
dump=AddOns/debug/dump/${date}.wav
; Path to audio response
voice=AddOns/debug/voice/${date}.wav"
This AddOn use EmguCV to perform Face Detection and Face Recognition. In a Nutshell:
- Provide a GUI to train new faces
- Draw detected faces' rectangle
- Update NBody data with face rectangle
- Update NBody data with face name
- Forward recognition to ProfileManagers
- Store datas in AddOn's directory
- HTTP & XML
face
parameter to start/stop recognition
Choose face detection and recognition algorithms. See also CodeProject article.
; Cascade for Face detection
HaarCascade=AddOns/face/haarcascades/haarcascade_frontalface_default.xml
; Algorithm confidence
Eigen_Threshold=0
Fisher_Threshold=0
LBP_Threshold=30
This AddOn use GoogleSpeech API to perform Speech2Text on speech recognitions.
- Process grammar with dictation attribute
- Process rejected dynamic grammar
Since 2014 Google narrow it API v2 to 50 req/day ! An API Key MUST be provided to the plugin (See G+ tutorial and explanations).
This plugin handle HTTP communication with SARAH Server. It provides to other AddOns the ability to send GET/POST/FILE request with a custom token.
- Send request for SpeechRecognition
- Send requestion for MotionDetection
- All request are filled with contextual data (ClientId, Profile, ...)
- BODY response is forwarded with the token to other AddOns
[http.remote]
; Remote address of NodeJS server
server=http://127.0.0.1:8080
[http.local]
; Local HTTP server port
port=8888
temp=AddOns/http/temp/
- Start HTTP Server on 8888
UNDER CONSTRUCTION
This AddOn use EmguCV or nVLC to process frames of IP Camera
- but EmguCV hangs forever...
- but nVLC Crash ...
This AddOn use Kinect SDK v1.8 to start all connected Kinects v1 XBox360 or Windows.
- Start a SpeechEngine with Kinect audio stream with a given confidence
- Start a color stream of 1280 x 960 at 12 fps
- Start a skeleton stream for other AddOns
- Detect motion using depth stream
- Provides sub plugin framework
The motion detection stops as much as possible stream when there is no motion. Properties allow control of streams: Audio stream only or custom fps.
- Handle Gesture
- Handle Face
This AddOn use Kinect SDK v2.0 to start one connected Kinects v2 for Windows.
- Start a SpeechEngine with Kinect audio stream with a given confidence
- Start a color stream of 1920 x 1080 at 15/30 fps
- Start a skeleton stream for other AddOns
- Detect motion using depth stream
- Provides sub plugin framework
The motion detection stops as much as possible stream when there is no motion. Properties allow audio stream only.
- Handle Gesture
- Handle Face
This AddOn use Microphone to start a SpeechEngine with given confidence.
; Confidence of recognition (from 0 to 1)
confidence=0.7
; Define the audio device index (0 is the default device, see logs)
device=0
- Handle multiple device number
This AddOn handle all OS specific actions
- Watch specific folder to perform speech recognition
- HTTP & XML
run
andrunp
parameter to run a process - HTTP & XML
activate
parameter to foreground a process - HTTP & XML
keyText
parameter to simulate key press - HTTP & XML
keyUp
parameter to simulate key up - HTTP & XML
keyDown
parameter to simulate key down - HTTP & XML
keyPress
parameter to simulate key press - HTTP & XML
recognize
parameter to regognize path
This AddOn use the speech audio stream to compute voice pitch and update ProfileManagers.
This AddOn manage profile for people playing with SARAH.
- A profile is cross device
- Face is a major profile key
- Handle engagement with SARAH
- Handle people engagement
- Add profile timeout on properties
- Store profile to disk
- Forward profile to HTTP Request
- Display profile in GUI
This AddOn compute color frame of each device to draw most prominent color.
- Do nothing if device window is not displayed OR forward color to ProfileManager (is it relevant ?)
UNDER CONSTRUCTION
This AddOn start an RTP Client on a given Thread to listen to inbound stream
Can be called on RaspberryPi using ffmpeg
ffmpeg -ac 1 -f alsa -i hw:1,0 -ar 16000 -acodec pcm_s16le -f rtp rtp://192.168.0.8:7887
avconv -f alsa -ac 1 -i hw:0,0 -acodec mp2 -b 64k -f rtp rtp://{IP of your laptop}:1234
Or using Kinect on windows
ffmpeg -f dshow -i audio="Réseau de microphones (Kinect U" -ar 16000 -acodec pcm_s16le -f rtp rtp://127.0.0.1:7887
; Confidence of recognition (from 0 to 1)
confidence=0.6
; Define the RTP Port
port=7887
This AddOn manage speaker to play audio stream:
- List available speakers
- Handle voice strream
- Play stream asynchronously or not
- HTTP
speaking
parameter write "speaking" if client currently speaking - HTTP & XML
notts
parameter stop speaking - HTTP & XML
stop
parameter stop playing audio - HTTP & XML
play
parameter play given audio
; The device index to play on (use -1 for default device)
device=-1
; Volume level of device
volume=50
; Timeout in seconds to stop the player
timeout=480
- Play other streal (music, song, ...)
- Allow custom stream timeout
This AddOn build a Speech Engine for each provided audio stream (Kinect, Microphone, ...).
- Grammar Manager maintains a cache of XML of plugins
- Grammar can be in multiple languages
- SARAH name is hot replaced according to properties
- Context Manager run/stop grammars accortding to context
- Context Manager handle dynamic grammar
- The
bot.name
must match an improved confidence bot.name
is optional with user is engaged- Good match trigger engagement until a given timeout
- HTTP & XML
listen
parameter start/stop listening - HTTP & XML
context
parameter set the context - HTTP
grammar
parameter set the XML of a grammar - HTTP
sentences/tags
parameter for AskMe - HTTP & XML
asknext
parameter to call tts on a rule
- Context should apply to a given device instead of all. But it's really complicated.
- Improve behavior for optional "SARAH" if confidence is strong and for a given timeout.
This AddOn perform text to speech according to given actions:
- List available voices
- Text2Speech HTTP Body with token "speech"
- Text2Speech HTTP & XML
tts
parameter
- Select custom voice at a given time
UNDER CONSTRUCTION
This AddOn provide a window (and a menu item) to display other AddOns data.
- Window are sized according to color frame (ratio)
- Window have a sidebar where addons put new controls
- A dedicated footer can be used by ProfileManager