V1.2 (#8)

v1.2
Picovoice · Apr 26, 2019 · 2a9e568 · 2a9e568
1 parent 6f39d6f
commit 2a9e568
Show file tree

Hide file tree

Showing 67 changed files with 252 additions and 178 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
 
-Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from speech commands within a given context of
+Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of
 interest in real-time. For example, given a speech command "*Can I have a small double-shot espresso with a lot of sugar
  and some milk*" it infers that the user wants to *order a drink* with the following specific requirements.
 
@@ -20,16 +20,15 @@ Rhino is
 
 * intuitive. It allows users to utter their intention in a natural and conversational fashion.
 * using deep neural networks trained in **real-world situations**.
-* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 100 KB of RAM.
-* cross-platform. It is implemented in fixed-point ANSI C. Currently **ARM Cortex-M**, **ARM Cortex-A**,
-**Raspberry Pi**, **Android**, **iOS**, **watchOS**, **Linux**, **Mac**, **Windows**, and **WebAssembly** are supported.
+* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 90 KB of
+RAM on an MCU.
+* cross-platform. It is implemented in fixed-point ANSI C. Currently **Raspberry Pi**, **Beagle Bone** **Android**,
+**iOS**, **Linux**, **Mac**, **Windows**, and **web browsers** (**WebAssembly**) are supported. Additionally support for
+various **ARM Cortex-A**, **ARM Cortex-M** (M4/M7) and **DSP cores** is available for commercial customers.
 * customizable. It can be customized for any given domain.
 
 [![Rhino in Action](https://img.youtube.com/vi/WadKhfLyqTQ/0.jpg)](https://www.youtube.com/watch?v=WadKhfLyqTQ)
 
-NOTE: Currently Raspberry Pi, Android, and Linux builds are available to the open-source community. But we do have plans
-to make other platforms available as well in upcoming releases.
-
 ## Table of Contents
 * [Try It Out](#try-it-out)
 * [Motivation](#motivation)
@@ -43,9 +42,11 @@ to make other platforms available as well in upcoming releases.
 * [Running Demo Applications](#running-demo-applications)
  * [Running Python Demo Application](#running-python-demo-application)
  * [Running C Demo Application](#running-c-demo-application)
+ * [Runnning Android Demo Application](#running-android-demo-application)
 * [Integration](#integration)
  * [C](#c)
  * [Python](#python)
+ * [Android](#android)
 * [Releases](#releases)
 * [License](#license)
 
@@ -64,17 +65,17 @@ requires significant CPU and memory for an on-device implementation.
 
 Rhino solves this problem by providing a tightly-coupled speech recognition and NLU engine that are jointly optimized
 for a specific domain (use case). Rhino is quite lean and can even run on small embedded processors
-(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 100 KB) making it ideal for
+(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 90 KB) making it ideal for
 resource-constrained IoT applications.
 
 ## Metrics
 
-The table shows the average CPU usage on three different platforms (1) Raspberry Pi zero, (2) Raspberry Pi 3, and an
-Ubuntu box (i5-6500 CPU @ 3.20GHz). You can recreate this using the [C demo application](/demo/c).
+The table shows the average CPU usage on two different platforms (1) Raspberry Pi zero and (2) Raspberry Pi 3. You can
+recreate this using the [C demo application](/demo/c).
 
-Raspberry Pi zero | Raspberry Pi 3 | Ubuntu Desktop (i5-6500 CPU @ 3.20GHz)
-:---: | :---: | :---:
-48.7% | 8.9% | 1.2%
+Raspberry Pi zero | Raspberry Pi 3
+:---: | :---:
+46.4% | 7.2%
 
 ## Terminology
 
@@ -93,8 +94,8 @@ of spoken commands:
 
 ### Expression
 
-A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines a mapping between
-a (or a set of) spoken commands and its (their) corresponding intent. For example
+A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines
+a mapping between a (or a set of) spoken commands and its (their) corresponding intent. For example
 
 * {turnCommand} the lights. -> {turnIntent}
 * Make the {location} light {intensityChange}. -> {changeIntensityIntent}
@@ -143,7 +144,7 @@ python demo/python/rhino_demo.py \
 --rhino_context_file_path ./resources/contexts/linux/coffee_maker_linux.rhn \
 --porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \
 --porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
---porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn
+--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey\ pico_linux.ppn
 ```
 
 The following runs the engine on a *Raspberry Pi 3* to infer intent within the context of smart lighting system
@@ -155,14 +156,19 @@ python demo/python/rhino_demo.py \
 --rhino_context_file_path ./resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn \
 --porcupine_library_path ./resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so \
 --porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
---porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn
+--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey\ pico_raspberrypi.ppn
 ```
 
 ### Running C Demo Application
 
 This [demo application](demo/c) is mainly used to show how Rhino can be integrated into an efficient C/C++ application.
 Furthermore it can be used to measure runtime metrics of the engine on various supported platforms.
 
+### Running Android Demo Application
+
+Using Android Studio open [demo/android](/demo/android) as an Android project and then run the application. Note that
+you need an android phone with developer options enabled connected to your machine in order to run the application.
+
 ## Integration
 
 Below are code snippets showcasing how Rhino can be integrated into different applications.
@@ -282,8 +288,52 @@ collector.
 rhino.delete()
 ```
 
+### Android
+
+Rhino provides a binding for Android using JNI. It can be initialized using.
+
+```java
+ final String modelFilePath = ... // It is available at lib/common/rhino_params.pv
+ final String contextFilePath = ...
+
+ Rhino rhino = new Rhino(modelFilePath, contextFilePath);
+```
+
+once initialized `rhino` can be used for intent inference.
+
+
+```java
+ private short[] getNextAudioFrame();
+
+ while (rhino.process(getNextAudioFrame()));
+
+ if (rhino.isUnderstood()) {
+ RhinoIntent intent = rhino.getIntent();
+ // logic to perform an action given the intent object.
+ } else {
+ // logic for handling out of context or unrecognized command
+ }
+```
+
+when finalized the processing be sure to reset the object before processing a new stream of audio via
+
+```java
+ rhino.reset()
+```
+
+finally, prior to exiting the application be sure to release resources acquired via
+
+```java
+ rhino.delete()
+```
+
 ## Releases
 
+### v1.2.0 April 26, 2019
+
+* Accuracy improvements.
+* Runtime optimizations.
+
 ### v1.1.0 December 23rd, 2018
 
 * Accuracy improvements.

diff --git a/binding/android/rhino/src/main/java/ai/picovoice/rhino/Rhino.java b/binding/android/rhino/src/main/java/ai/picovoice/rhino/Rhino.java
@@ -17,19 +17,18 @@
 
 package ai.picovoice.rhino;
 
-import java.util.HashMap;
+import java.util.LinkedHashMap;
 import java.util.Map;
 
 /**
- * Binding for Picovoice's speech-to-intent engine (aka Rhino).
- * The object directly infers intent from speech commands within a given context of interest in
- * real-time. It processes incoming audio in consecutive frames (chunks) and at the end of each
- * frame indicates if the intent extraction is finalized. When finalized, the intent can be
- * retrieved as structured data in form of an intent string and pairs of slots and values
- * representing arguments (details) of intent. The number of samples per frame can be attained by
- * calling {@link #frameLength()}. The incoming audio needs to have a sample rate equal to
- * {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore, Rhino operates on single
- * channel audio.
+ * Binding for Picovoice's speech-to-intent engine (aka Rhino). The object directly infers intent
+ * from speech commands within a given context of interest in real-time. It processes incoming audio
+ * in consecutive frames (chunks) and at the end of each frame indicates if the intent extraction is
+ * finalized. When finalized, the intent can be retrieved as structured data in form of an intent
+ * string and pairs of slots and values representing arguments (details) of intent. The number of
+ * samples per frame can be attained by calling {@link #frameLength()}. The incoming audio needs to
+ * have a sample rate equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
+ * Rhino operates on single channel audio.
  */
 public class Rhino {
  static {
@@ -40,10 +39,11 @@ public class Rhino {
 
  /**
  * Constructor.
- * @param modelFilePath Absolute path to file containing model parameters.
- * @param contextFilePath Absolute path to file containing context parameters. A context
- * represents the set of expressions (commands), intents, and intent
- * arguments (slots) within a domain of interest.
+ *
+ * @param modelFilePath Absolute path to file containing model parameters.
+ * @param contextFilePath Absolute path to file containing context parameters. A context
+ * represents the set of expressions (commands), intents, and intent
+ * arguments (slots) within a domain of interest.
  * @throws RhinoException On failure.
  */
  public Rhino(String modelFilePath, String contextFilePath) throws RhinoException {
@@ -55,7 +55,8 @@ public Rhino(String modelFilePath, String contextFilePath) throws RhinoException
  }
 
  /**
- * Destructor. This is needs to be called explicitly as we do not rely on garbage collector.
+ * Destructor. This needs to be called explicitly as we do not rely on garbage collector.
+ *
  * @throws RhinoException On failure.
  */
  public void delete() throws RhinoException {
@@ -69,7 +70,8 @@ public void delete() throws RhinoException {
  /**
  * Processes a frame of audio and emits a flag indicating if the engine has finalized intent
  * extraction. When finalized, {@link #isUnderstood()} should be called to check if the command
- * was valid (is within context of interest).
+ * was valid (is within context of interest) and is understood.
+ *
  * @param pcm A frame of audio samples. The number of samples per frame can be attained by
  * calling {@link #frameLength()}. The incoming audio needs to have a sample rate
  * equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
@@ -79,7 +81,7 @@ public void delete() throws RhinoException {
  */
  public boolean process(short[] pcm) throws RhinoException {
  try {
- return process(object, pcm) == 1;
+ return process(object, pcm);
  } catch (Exception e) {
  throw new RhinoException(e);
  }
@@ -88,13 +90,14 @@ public boolean process(short[] pcm) throws RhinoException {
  /**
  * Indicates if the spoken command is valid, is within the domain of interest (context), and the
  * engine understood it.
+ *
  * @return Flag indicating if the spoken command is valid, is within the domain of interest
  * (context), and the engine understood it.
  * @throws RhinoException On failure.
  */
  public boolean isUnderstood() throws RhinoException {
  try {
- return isUnderstood(object) == 1;
+ return isUnderstood(object);
  } catch (Exception e) {
  throw new RhinoException(e);
  }
@@ -105,21 +108,22 @@ public boolean isUnderstood() throws RhinoException {
  * string and pairs of slots and their values. It should be called only after intent extraction
  * is finalized and it is verified that the spoken command is valid and understood via calling
  * {@link #isUnderstood()}.
+ *
  * @return Inferred intent object.
  * @throws RhinoException On failure.
  */
  public RhinoIntent getIntent() throws RhinoException {
  final String intentPacked = getIntent(object);
  String[] parts = intentPacked.split(",");
  if (parts.length == 0) {
- throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
+ throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
  }
 
- Map<String, String> slots = new HashMap<>();
+ Map<String, String> slots = new LinkedHashMap<>();
  for (int i = 1; i < parts.length; i++) {
  String[] slotAndValue = parts[i].split(":");
  if (slotAndValue.length != 2) {
- throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
+ throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
  }
  slots.put(slotAndValue[0], slotAndValue[1]);
  }
@@ -130,6 +134,7 @@ public RhinoIntent getIntent() throws RhinoException {
  /**
  * Resets the internal state of the engine. It should be called before the engine can be used to
  * infer intent from a new stream of audio.
+ *
  * @throws RhinoException On failure.
  */
  public void reset() throws RhinoException {
@@ -143,6 +148,7 @@ public void reset() throws RhinoException {
  /**
  * Getter for expressions. Each expression maps a set of spoken phrases to an intent and
  * possibly a number of slots (intent arguments).
+ *
  * @return Expressions.
  * @throws RhinoException On failure.
  */
@@ -154,35 +160,38 @@ public String getContextExpressions() throws RhinoException {
  }
  }
 
- private native long init(String model_file_path, String context_file_path);
-
- private native long delete(long object);
-
- private native int process(long object, short[] pcm);
-
- private native int isUnderstood(long object);
-
- private native String getIntent(long object);
-
- private native boolean reset(long object);
-
- private native String contextExpressions(long object);
-
  /**
  * Getter for length (number of audio samples) per frame.
+ *
  * @return Frame length.
  */
  public native int frameLength();
 
  /**
  * Audio sample rate accepted by Picovoice.
+ *
  * @return Sample rate.
  */
  public native int sampleRate();
 
  /**
  * Getter for version string.
+ *
  * @return Version string.
  */
  public native String version();
+
+ private native long init(String model_file_path, String context_file_path);
+
+ private native void delete(long object);
+
+ private native boolean process(long object, short[] pcm);
+
+ private native boolean isUnderstood(long object);
+
+ private native String getIntent(long object);
+
+ private native boolean reset(long object);
+
+ private native String contextExpressions(long object);
 }
diff --git a/binding/python/rhino.py b/binding/python/rhino.py
@@ -20,7 +20,7 @@
 
 
 class Rhino(object):
- """Python binding for Picovoice's Speech to Intent (a.k.a Rhino) engine."""
+ """Python binding for Picovoice's Speech-to-Intent (a.k.a Rhino) engine."""
 
  class PicovoiceStatuses(Enum):
  """Status codes corresponding to 'pv_status_t' defined in 'include/picovoice.h'"""

diff --git a/binding/python/test_rhino.py b/binding/python/test_rhino.py
@@ -88,25 +88,34 @@ def _library_path(cls):
  system = platform.system()
  machine = platform.machine()
 
- if system == 'Linux':
+ if system == 'Darwin':
+ return cls._abs_path('lib/mac/x86_64/libpv_rhino.dylib')
+ elif system == 'Linux':
  if machine == 'x86_64':
  return cls._abs_path('lib/linux/x86_64/libpv_rhino.so')
  elif machine.startswith('arm'):
  return cls._abs_path('lib/raspberry-pi/arm11/libpv_rhino.so')
-
- raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
+ elif system == 'Windows':
+ return cls._abs_path('lib/windows/amd64/libpv_rhino.dll')
+ else:
+ raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
 
  @classmethod
  def _context_file_path(cls):
  system = platform.system()
  machine = platform.machine()
 
- if system == 'Linux' and machine == 'x86_64':
- return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
- elif system == 'Linux' and machine.startswith('arm'):
- return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')
-
- raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
+ if system == 'Darwin':
+ return cls._abs_path('resources/contexts/mac/coffee_maker_mac.rhn')
+ elif system == 'Linux':
+ if machine == 'x86_64':
+ return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
+ elif machine.startswith('arm'):
+ return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')
+ elif system == 'Windows':
+ return cls._abs_path('resources/contexts/windows/coffee_maker_windows.rhn')
+ else:
+ raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
 
 
 if __name__ == '__main__':