NAO Speech pipe
1. How to run
- Make sure you have Naoqi(official package, can be installed from here) setup and installed in your pc.
- Configure the .bash_profile file preset in your home directory.
- Make sure your NAO robot is connected to the same network as your pc.
- Then run the speech_convert.sh file present in the speech_pipe directory passing NAO’s ip address which can be accessed form pressing the button in it’s chest.
- Your NAO robot is now ready to listen to you whenever touched on its head.
- General Details:
1. Naoqi This python library is provided by Aldebaran , which provides various functions to control the various sensors and actuators in the robot. |
---|
2. Python version mismatch Naoqi is configured for python2.7 but all the latest speech models are compatible with python3 which created a major hurdle which is solved by doing all the computationin python3 files and all the functions that involve direct interaction with the robot has to be defined in a separate python2.7 file. python2.7 files: recording.py, touchreact_record.py and speak_nao.py python3 files: Speech_pipe.py |
- Latency:
The current latency of the model is ~4 seconds. The latency depends on factors like internet speed, speech model and local processing capabilities of the user, out of these three internet speed is a major factor which can increase or decrease the latency of the whose system by a factor of 2. Various speech models were tested and out of which Google’s Gemini and Openai’s chatGPT 4 shows minimum latency.
3. Installing Naoqi(for unix based systems):
- The Naoqi package can be installed from here.
- After installation of the Naoqi package, rename the folder as “pynaoqi” include the below mentioned commands into your .bash_profile file present in the home directory.
export PYTHONPATH=${PHTHONPATH}:path/to/pynaoqi/lib/python2.7/site-packages export DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}:path/to/pynaoqi/lib export QI_SDK_PREFIX=path/to/pynaoqi
- Now quit the terminal and run “source .bash_profile”
3. Description of code files inside the speech_pipe directory
- speech_convert.sh: This script triggers the touchreact_record.py file and monitor_nao.sh script and looks out for any changes in the uploads folder which stores the user voice input which is later passed to chatGPT.
- monitor_nao.sh: This script is responsible for separately calling the Speech_pipe.py and speak_nao.py file.
- touchreact_record.py: This file checks whether the robot is touched on it’s head. As soon as the user touches the head of the robot this file calls the Recording_thread function defined in the Recording.py file.
- Recording.py: This file defines the Recording_thread function which stores the users recording into the robot local home directory and then transfers the .wav audio file to the uploads folder by establishing an ssh connection.
- Speech_pipe.py:
- Speech_2_txt function: This converts the audio file in to a text file by using audio transcribe provided by Openai and moves the audio file from uploads folder to the old_file folder.
- model function: This function stores the response of chatGPT into ans.txt file taking the transcribed test file as input.
- speak_nao.py: This file simply just calls the ALTextToSpeech functionality provided by Naoqi and makes the NAO robot speak the response stored in ans.txt.