Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input device for external STT via ACTION_RECOGNIZE_SPEECH #294

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Stypox
Copy link
Owner

@Stypox Stypox commented Feb 26, 2025

I hate the ActivityForResultManager but I don't know if there is any other way to start an activity for result from an object that does not strictly belong to an activity.

Fixes part of #197 (the other part would be implementing yet another InputDevice with RecognitionService). This PR is anyway quite experimental as I didn't plan for the possibility to start the popup of another app to obtain input, but it seems to work.

References: #169 #293 woheller69/whisperIME#53 (comment). Note: with whisperIME all I get as speech is always just "you" independently of what I say, though I don't understand well how the whisperIME popup works, and I am not sure if my code is correct in the first place.

Please test this Dicio APK app-debug.apk along with this whisperIME APK woheller69/whisperIME#53 (comment) . Thank you @woheller69!

TODO:

  • Add an explanation in the settings, along with suggestions on apps to use

Edit: I tested this with FUTO Voice input and the code in this PR seems to be behaving correctly:

Screen_recording_20250227_005009.webm

@Inhishonor
Copy link
Contributor

Inhishonor commented Feb 26, 2025

I downloaded the apk, and using it with whisperIME, it only outputs "you" every time. I also tested it with Futo Voice Input and it worked well.
A little extra feedback:

  • System Popup, to me seems a little confusing as the name for an option, maybe something like "System STT" or "External STT"
  • If you cancel the STT after clicking on the STT button but before you select a provider, it pops up an error message with Error Code 0, maybe it should just say kind of what Futo Voice Input outputs, and output "intent was canceled".
  • Dicio is appearing as an option and if you click on this, nothing happens. Is there away to exclude Dicio as an option? I saw something within the pull request about that.

Thanks for implementing this!

@woheller69
Copy link

tested it. If you press and hold the button in whisper input while speaking it works.

@woheller69
Copy link

in a updated version I moved the button to the bottom, like in the input method. That makes it easier to use

woheller69/whisperIME#53

@woheller69
Copy link

When using an "external" voice input you should maybe not play the sound and display "listening". Otherwise this might interfere with the behavior of the external input. E.g. my whisper input does not use voice activity detection. In order to start it you have to press and hold the button, but your display already says "listening" before the user presses my mic button.
Instead of the sound I have a haptic feedback.

@Stypox
Copy link
Owner Author

Stypox commented Feb 28, 2025

Thank you for the feedback! New testing APK: https://github.com/Stypox/testing-apks/releases/download/14/app-debug.apk

System Popup, to me seems a little confusing as the name for an option, maybe something like "System STT" or "External STT"

I wanted to distinguish it from "System service" which is the other way to access other app's STT, which would feel seamless because it would run in the background. So one would be called "popup" and the other "service". But I will add an explanation to the settings to explain everything better anyway.

If you cancel the STT after clicking on the STT button but before you select a provider, it pops up an error message with Error Code 0, maybe it should just say kind of what Futo Voice Input outputs, and output "intent was canceled".

Fixed, now it just switches back to the loaded-not-listening state (which is what would happen with Vosk too).

Dicio is appearing as an option and if you click on this, nothing happens. Is there away to exclude Dicio as an option? I saw something within the pull request about that.

It turns out EXTRA_EXCLUDE_COMPONENTS only works on chooser intents, so I don't know if there is a way to exclude Dicio from choosing itself, unfortunately.

// Unfortunately the user could choose Dicio itself (starting SttPopupActivity), but there
// is no way to avoid this unless we use an Intent.createChooser() with
// `EXTRA_EXCLUDE_COMPONENTS`. A chooser, however, wouldn't allow the user to press
// "Always"/"Just once", and also wouldn't make it possible to check for availability like
// with .resolveIntent() (since the intent always resolves to the choooser).

When using an "external" voice input you should maybe not play the sound and display "listening". Otherwise this might interfere with the behavior of the external input. E.g. my whisper input does not use voice activity detection. In order to start it you have to press and hold the button, but your display already says "listening" before the user presses my mic button.

Done, now it shows "Waiting...". It still plays the "no input" sound if the request is canceled though, which I think is ok.

@woheller69
Copy link

woheller69 commented Feb 28, 2025

Done, now it shows "Waiting...". It still plays the "no input" sound if the request is canceled though, which I think is ok.

Much better

@Stypox
Copy link
Owner Author

Stypox commented Feb 28, 2025

Improved settings descriptions:

image

@Inhishonor
Copy link
Contributor

Looks great! Thanks again!

@paolo-caroni
Copy link

I have tested it, seems to be functional, no bugs detected.

@woheller69
Copy link

putExtra(RecognizerIntent.EXTRA_LANGUAGE, locale) 

should probably be changed to

putExtra(RecognizerIntent.EXTRA_LANGUAGE, locale.toString())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants