Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome #23084

Open
guest271314 opened this issue Apr 18, 2020 · 5 comments

Comments

@guest271314
Copy link
Contributor

guest271314 commented Apr 18, 2020

There is no way (without non-standard workarounds) to programmatically test if window.speechSynthesis.speak() outputs audio at all at Chromium or Chrome browsers.

It is possible to programmatically test if window.speechSynthesis.speak() outputs audio at Firefox and Nightly.

AFAICT there is currently no manual WPT test for audio output for window.speechSynthesis.speak().

This issue is for the purposes of at least adding a manual test for window.speechSynthesis.speak(); and further to track efforts to find a means to programmatically test audio output of window.spechSynthesis.speak() or conclusively determine if such a test is not possible at Chromium and Chrome browser.

@guest271314 guest271314 changed the title window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome [type: untestable][Missing coverage] window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome [type:untestable][Missing coverage] Apr 18, 2020
@guest271314
Copy link
Contributor Author

cc @foolip @sjdallst

@guest271314 guest271314 changed the title window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome [type:untestable][Missing coverage] window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome [type:untestable][type:missing-coverage] Apr 18, 2020
@stephenmcgruer
Copy link
Contributor

It is possible to programmatically test if window.speechSynthesis.speak() outputs audio at Firefox and Nightly.

Can you explain, or give a short code snippet showing, how one can programmatically test this in Firefox?

@guest271314 guest271314 changed the title window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome [type:untestable][type:missing-coverage] window.speechSynthesis.speak() audio output is untestable at Chromium or Chrome Apr 19, 2020
@guest271314
Copy link
Contributor Author

@stephenmcgruer

At Firefox or Nightly it is possible to select a Monitor of <device> source at navigator.mediaDevices.getUserMedia()

https://wiki.archlinux.org/index.php/PulseAudio/Examples

ALSA monitor source

To be able to record from a monitor source (a.k.a. "What-U-Hear", "Stereo Mix")

<!DOCTYPE html>

<html>

  <head>
  </head>

  <body>
    <script>
      let track,
        stream,
        recorder,
        u,
        voices,
        voice,
        // create 1 speechSynthesis object reference
        // instead of using multiple literal window.speechSynthesis.speak()
        // which could result in unexpected behaviour
        synth = window.speechSynthesis;
      navigator.mediaDevices
        .getUserMedia({
          audio: true
        })
        .then(stream => {
          // make sure to clear the queue
          synth.cancel();
          [track] = stream.getAudioTracks();
          track.contentHint = 'speech';
          recorder = new MediaRecorder(stream);
          recorder.ondataavailable = e => {
            // use FileReader here to avoid Firefox Blob URL bug
            // https://bugzilla.mozilla.org/show_bug.cgi?id=1628906#c7
            fr = new FileReader();
            fr.onload = _ => {
              // analyze audio output here
              var audio = new Audio(fr.result);
              document.body.appendChild(audio);
              audio.controls = true;
              track.stop();
            };
            fr.readAsDataURL(e.data);
          };
          // start here to avoid clipping initial audio output
          recorder.start();
          voices = synth.getVoices();
          if (!voices) {
            synth.onvoiceschanged = e => {
              voices = synth.getVoices();
            };
          }
          voice = voices.find(({
            lang,
            name
          }) => /^English_(America)/.test(name));
          u = new SpeechSynthesisUtterance();
          u.text = 'test '.repeat(50);
          u.voice = voice;
          console.log(u);
          // can clip initial audio output,
          // which could be ongoing for 
          // fraction of 1 second when start() is called
          // u.onstart = _ => recorder.start();
          u.onend = _ => recorder.stop();
          synth.speak(u);
        })
        .catch(console.trace);

    </script>
  </body>

</html>

Chrome, Chromium do not provide selection of Monitor of <device> at getUserMedia() prompt and there is no way to select that device with constraints, TL;DR

though technically, it is possible to select Monitor of <device> for Chromium at *nix at native OS (PulseAudio) Volume Control during audio output at Recording guest271314/SpeechSynthesisRecorder#14, which generally means at first run input text "test" will not provide enough time to select the option, so use 'test'.repeat(50) for first run to select that option at local volume control dialog, afterwards the MediaStreamTrack from subsequent getUserMedia() calls for audio will stream from that device, which allows capturing only audio output, not microphone, even when headphones are plugged in and user has background noise playing during capture.
Screenshot_2020-04-19_14-09-57
Screenshot_2020-04-19_14-21-31

@guest271314
Copy link
Contributor Author

@stephenmcgruer For completeness, note, voices are loaded asynchronously from a socket connection to speech-dispatcher, the call to getVoices() at the above code should probably look more like

    const synth = window.speechSynthesis;
    synth.cancel();
    let voices, voice;
    handleVoicesChanged = _ => {
      if (_) {
        alert(_.type);
      }
      alert(voices.length);
      voice = voices.find(({lang, name}) => /* filter voice */);
      const utterance = new SpeechSynthesisUtterance();
      utterance.text = text;
      utterance.voice = voice;
      synth.speak(utterance);
    };
    onload = e => {
      document.getElementById("test").onclick = _ => {
        voices = synth.getVoices();
        if (voices.length === 0) {
          synth.onvoiceschanged = handleVoicesChanged;
        } else {
          handleVoicesChanged();
        };
      };
    };

https://bugs.chromium.org/p/chromium/issues/detail?id=959362#c4

@guest271314
Copy link
Contributor Author

Note also the *oogle Chrome is shipped with custom voices created by *oogle - not Chromium - which requires enabling speech-dispatcher and having a local speech synthesis engine installed, and Firefox and Chrome do not both set lang attribute of voice, so /^English_(America)/.test(name) may not be applicable where that is an espeak or espeak-ng voice, and if neither of those native applications are installed the default voice could be selected, which should not necessarily matter in this instance where the goal is to capture the output for any case. Just another note that Chrome and Chromium could and will behave differently here. (Am at 32-bit machine so have not tried at Chrome since version 46 or 47 when 32-bit support for Chrome was deprecated; Chrome may circumvent to ordinary call process to speech-dispatcher to supplant its own voices).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants