Merge pull request #51 from ReadAlongs/async_considered_harmful

Make the API less asynchronous
ReadAlongs · Dec 19, 2022 · 4a003fd · 4a003fd
2 parents 3ba551c + 81867df
commit 4a003fd
Show file tree

Hide file tree

Showing 8 changed files with 483 additions and 488 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -4,7 +4,7 @@ include(CheckSymbolExists)
 include(CheckLibraryExists)
 include(TestBigEndian)
 
-project(soundswallower VERSION 0.4.3
+project(soundswallower VERSION 0.5.0
   DESCRIPTION "An even smaller speech recognizer")
 
 if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME)

diff --git a/js/README.md b/js/README.md
@@ -1,27 +1,24 @@
-SoundSwallower: an even smaller speech recognizer
-=================================================
+# SoundSwallower: an even smaller speech recognizer
 
 > "Time and change have a voice; eternity is silent. The human ear is
 > always searching for one or the other."<br>
-> Leena Krohn, *Datura, or a delusion we all see*
+> Leena Krohn, _Datura, or a delusion we all see_
 
-SoundSwallower is a refactored version of PocketSphinx intended for
-embedding in web applications.  The goal is not to provide a fast
-implementation of large-vocabulary continuous speech recognition, but
-rather to provide a *small*, *asynchronous* implementation of simple,
-useful speech technologies.
+SoundSwallower is a refactored version of PocketSphinx intended for embedding in
+web applications. The goal is not to provide a fast implementation of
+large-vocabulary continuous speech recognition, but rather to provide a _small_
+implementation of simple, useful speech technologies.
 
 With that in mind the current version is limited to finite-state
 grammar recognition.
 
-Installation
-------------
+## Installation
 
 SoundSwallower can be installed in your NPM project:
 
     # From the Internets
     npm install soundswallower
-    
+
 You can also build and install it from source, provided you have
 Emscripten and CMake installed:
 
@@ -88,11 +85,10 @@ Look at the [SoundSwallower-Demo
 repository](https://github.com/dhdaines/soundswallower-demo) for an
 example.
 
-Basic Usage
------------
+## Basic Usage
 
 The entire package is contained within a module compiled by
-Emscripten.  The NPM package includes only the compiled code, but you
+Emscripten. The NPM package includes only the compiled code, but you
 can rebuild it yourself using [the full source code from
 GitHub](https://github.com/ReadAlongs/SoundSwallower) which also
 includes C and Python implementations.
@@ -103,54 +99,43 @@ that returns a promise that is fulfilled with the actual module once
 the WASM code is fully loaded:
 
 ```js
-const ssjs = await require('soundswallower')();
+const ssjs = await require("soundswallower")();
 ```
 
 Once you figure out how to get the module, you can try to initialize
 the recognizer and recognize some speech.
 
-Great, so let's initialize the recognizer.  Anything that changes the
-state of the recognizer is an async function.  So everything except
-getting the current recognition result.  We follow the
-construct-then-initialize pattern:
+Great, so let's initialize the recognizer. This possibly involves some long I/O
+operations so it's asynchronous. We follow the construct-then-initialize
+pattern. You can use `Promise`s too of course.
 
 ```js
 let decoder = new ssjs.Decoder({
-    loglevel: "INFO",
-    backtrace: true
+  loglevel: "INFO",
+  backtrace: true,
 });
 await decoder.initialize();
 ```
 
 The optional `loglevel` and `backtrace` options will make it a bit
-more verbose, so you can be sure it's actually doing something.  Now
-we will create and enable the world's stupidest grammar, which
-recognizes one sentence:
-
-```js
-await decoder.set_fsg("goforward", 0, 4, [
-    {from: 0, to: 1, prob: 1.0, word: "go"},
-    {from: 1, to: 2, prob: 1.0, word: "forward"},
-    {from: 2, to: 3, prob: 1.0, word: "ten"},
-    {from: 3, to: 4, prob: 1.0, word: "meters"}
-]);
-```
+more verbose, so you can be sure it's actually doing something.
 
-If you actually want to just recognize a single sentence, in order to
-get time alignments (this is known as "force-alignment"), we have a
-better method for you:
+The simplest use case is to recognize some text we already know, which is called
+"force alignment". In this case you set this text, which must already be
+preprocessed to be a whitespace-separated string containing only words in the
+dictionary, using `set_align_text`:
 
 ```js
-await decoder.set_align_text("go forward ten meters");
+decoder.set_align_text("go forward ten meters");
 ```
 
 It is also possible to parse a grammar in
 [JSGF](https://en.wikipedia.org/wiki/JSGF) format, see below for an
 example.
 
-Okay, let's wreck a nice beach!  Record yourself saying something,
+Okay, let's wreck a nice beach! Record yourself saying something,
 preferably the sentence "go forward ten meters", using SoX, for
-example.  Note that we record at 44.1kHz in 32-bit floating point
+example. Note that we record at 44.1kHz in 32-bit floating point
 format as this is the default under JavaScript (due to WebAudio
 limitations).
 
@@ -162,13 +147,13 @@ Now you can load it and recognize it with:
 
 ```js
 let audio = await fs.readFile("goforward.raw");
-await decoder.start();
-await decoder.process(audio, false, true);
-await decoder.stop();
+decoder.start();
+decoder.process(audio, false, true);
+decoder.stop();
 ```
 
 The results can be obtained with `get_hyp()` or in a more detailed
-format with time alignments using `get_hypseg()`.  These are not
+format with time alignments using `get_hypseg()`. These are not
 asynchronous methods, as they do not depend on or change the state of
 the decoder:
 
@@ -178,19 +163,19 @@ console.log(decoder.get_hypseg());
 ```
 
 If you want even more detailed segmentation (phone and HMM state
-level) you can use `get_alignment_json`.  For more detail on this
+level) you can use `get_alignment_json`. For more detail on this
 format, see [the PocketSphinx
 documentation](https://github.com/cmusphinx/pocketsphinx#usage) as it
-is borrowed from there.  Since this is JSON, you can create an object
+is borrowed from there. Since this is JSON, you can create an object
 from it and iterate over it:
 
 ```js
-const result = JSON.parse(await decoder.get_alignment_json());
+const result = JSON.parse(decoder.get_alignment_json());
 for (const word of result.w) {
-    console.log(`word ${word.t} at ${word.b} has duration ${word.d}`);
-    for (const phone of word.w) {
-        console.log(`phone ${phone.t} at ${phone.b} has duration ${phone.d}`);
-    }
+  console.log(`word ${word.t} at ${word.b} has duration ${word.d}`);
+  for (const phone of word.w) {
+    console.log(`phone ${phone.t} at ${phone.b} has duration ${phone.d}`);
+  }
 }
 ```
 
@@ -202,47 +187,44 @@ awful:
 decoder.delete();
 ```
 
-Loading models
---------------
+## Loading models
 
 By default, SoundSwallower will use a not particularly good acoustic
-model and a reasonable dictionary for US English.  A model for French
+model and a reasonable dictionary for US English. A model for French
 is also available, which you can load by default by setting the
 `defaultModel` property in the module object before loading:
 
 ```js
 const ssjs = {
-	defaultModel: "fr-fr"
+  defaultModel: "fr-fr",
 };
-await require('soundswallower')(ssjs);
+await require("soundswallower")(ssjs);
 ```
 
 The default model is expected to live under the `model/` directory
 relative to the current web page (on the web) or the `soundswallower`
-module (in Node.js).  You can modify this by setting the `modelBase`
+module (in Node.js). You can modify this by setting the `modelBase`
 property in the module object when loading, e.g.:
 
 ```js
 const ssjs = {
-	modelBase: "/assets/models/", /* Trailing slash is necessary */
-	defaultModel: "fr-fr",
+  modelBase: "/assets/models/" /* Trailing slash is necessary */,
+  defaultModel: "fr-fr",
 };
-await require('soundswallower')(ssjs);
+await require("soundswallower")(ssjs);
 ```
 
 This is simply concatenated to the model name, so you should make sure
 to include the trailing slash, e.g. "model/" and not "model"!
 
+## Using grammars
 
-Using grammars
---------------
-
-We currently support JSGF for writing grammars.  You can parse one
+We currently support JSGF for writing grammars. You can parse one
 from a JavaScript string and set it in the decoder like this (a
 hypothetical pizza-ordering grammar):
 
 ```js
-    await decoder.set_jsgf(`#JSGF V1.0;
+decoder.set_jsgf(`#JSGF V1.0;
 grammar pizza;
 public <order> = [<greeting>] [<want>] [<quantity>] [<size>] [pizza] <toppings>;
 <greeting> = hi | hello | yo | howdy;
@@ -255,27 +237,28 @@ public <order> = [<greeting>] [<want>] [<quantity>] [<size>] [pizza] <toppings>;
 ```
 
 Note that all the words in the grammar must first be defined in the
-dictionary.  You can add custom dictionary words using the `add_word`
+dictionary. You can add custom dictionary words using the `add_word`
 method on the `Decoder` object, as long as you speak ArpaBet (or
-whatever phoneset the acoustic model uses).  IPA and
+whatever phoneset the acoustic model uses). IPA and
 grapheme-to-phoneme support may become possible in the near future.
 If you are going to add a bunch of words, pass `false` as the third
 argument for all but the last one, as this will delay the reloading of
 the internal state.
 
 ```js
-    await decoder.add_word("supercalifragilisticexpialidocious",
-	    "S UW P ER K AE L IH F R AE JH IH L IH S T IH K EH K S P IY AE L IH D OW SH Y UH S");
+decoder.add_word(
+  "supercalifragilisticexpialidocious",
+  "S UW P ER K AE L IH F R AE JH IH L IH S T IH K EH K S P IY AE L IH D OW SH Y UH S"
+);
 ```
 
-Voice activity detection / Endpointing
---------------------------------------
+## Voice activity detection / Endpointing
 
 This is a work in progress, but it is also possible to detect the
 start and end of speech in an input stream using an `Endpointer`
-object.  This requires you to pass buffers of a specific size, which
-is understandably difficult since WebAudio also only wants to *give*
-you buffers of a specific (and entirely different) size.  A better
+object. This requires you to pass buffers of a specific size, which
+is understandably difficult since WebAudio also only wants to _give_
+you buffers of a specific (and entirely different) size. A better
 example is forthcoming but it looks a bit like this (copied directly
 from [the
 documentation](https://soundswallower.readthedocs.io/en/latest/soundswallower.js.html#Endpointer.get_in_speech):
@@ -285,14 +268,12 @@ let prev_in_speech = ep.get_in_speech();
 let frame_size = ep.get_frame_size();
 // Presume `frame` is a Float32Array of frame_size or less
 let speech;
-if (frame.size < frame_size)
-    speech = ep.end_stream(frame);
-else
-    speech = ep.process(frame);
+if (frame.size < frame_size) speech = ep.end_stream(frame);
+else speech = ep.process(frame);
 if (speech !== null) {
-    if (!prev_in_speech)
-        console.log("Speech started at " + ep.get_speech_start());
-    if (!ep.get_in_speech())
-        console.log("Speech ended at " + ep.get_speech_end());
+  if (!prev_in_speech)
+    console.log("Speech started at " + ep.get_speech_start());
+  if (!ep.get_in_speech())
+    console.log("Speech ended at " + ep.get_speech_end());
 }
 ```