Replace ad-hoc subprocess with python-sounddevice stream callbacks in TTS #33

PeterBowman · 2023-07-24T21:38:39Z

In #32, I reworked the microphone data acquisition pipeline for our ASR app by using a simple raw input stream callback managed by PortAudio (through the Python sounddevice package, see sounddevice.RawInputStream):

speech/programs/speechRecognition/speechRecognition.py

Lines 263 to 283 in 349a097

    
           with sd.RawInputStream(blocksize=int(2880 / 2), # FIXME: hardcoded for pocketpshinx, vosk used to have 8000 here 
        
                                  device=args.device, 
        
                                  dtype='int16', 
        
                                  channels=1, 
        
                                  callback=lambda indata, frames, time, status: q.put(bytes(indata))) as stream: 
        
               responder = responder_factory.create(stream) 
        
               responder.yarp().attachAsServer(configPort) 
        
               while True: 
        
                   frame = q.get() 
        
                   isPartial, transcription = responder.transcribe(frame) 
        
                   if transcription: 
        
                       if not isPartial: 
        
                           print('result: %s' % transcription) 
        
                           b = asrPort.prepare() 
        
                           b.clear() 
        
                           b.addString(transcription) 
        
                           asrPort.write() 
        
                       else: 
        
                           print('partial: %s' % transcription)

I presume a similar data flow could be implemented in the TTS app with sounddevice.RawOutputStream, thus replacing the ad-hoc subprocess and temporary file hacks currently used:

speech/programs/speechSynthesis/speechSynthesis.py

Line 19 in 349a097

PLAY_PROGRAMS = ['paplay', 'play -q', 'aplay -q']

speech/programs/speechSynthesis/speechSynthesis.py

Lines 102 to 122 in 349a097

    
           with tempfile.NamedTemporaryFile(mode='wb+', suffix='.wav') as wav_file: 
        
               wav_file.write(wav_bytes) 
        
               wav_file.seek(0) 
        
               for play_program in reversed(PLAY_PROGRAMS): 
        
                   play_cmd = shlex.split(play_program) 
        
                   if not shutil.which(play_cmd[0]): 
        
                       continue 
        
                   play_cmd.append(wav_file.name) 
        
                   self.is_playing = True 
        
                   with subprocess.Popen(play_cmd) as self.p: 
        
                       try: 
        
                           self.p.wait() 
        
                       except: # e.g. on keyboard interrupt 
        
                           self.p.kill() 
        
                   self.is_playing = False 
        
                   break

This would also pave the way for the migration from Mimic 3 to Piper, see #30 (comment).

The text was updated successfully, but these errors were encountered:

PeterBowman · 2023-12-12T20:59:31Z

Done at c74f814 (I seized the opportunity to migrate to Piper).

PeterBowman self-assigned this Dec 12, 2023

PeterBowman closed this as completed Dec 12, 2023

This was referenced Dec 12, 2023

Implement Mimic 3 engine (TTS) #30

Closed

Implement a voice engine selector roboticslab-uc3m/teo-self-presentation#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ad-hoc subprocess with python-sounddevice stream callbacks in TTS #33

Replace ad-hoc subprocess with python-sounddevice stream callbacks in TTS #33

PeterBowman commented Jul 24, 2023

PeterBowman commented Dec 12, 2023

Replace ad-hoc subprocess with python-sounddevice stream callbacks in TTS #33

Replace ad-hoc subprocess with python-sounddevice stream callbacks in TTS #33

Comments

PeterBowman commented Jul 24, 2023

PeterBowman commented Dec 12, 2023