Skip to content

Commit

Permalink
V1.2 (#8)
Browse files Browse the repository at this point in the history
v1.2
  • Loading branch information
kenarsa authored Apr 26, 2019
1 parent 6f39d6f commit 2a9e568
Show file tree
Hide file tree
Showing 67 changed files with 252 additions and 178 deletions.
84 changes: 67 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from speech commands within a given context of
Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of
interest in real-time. For example, given a speech command "*Can I have a small double-shot espresso with a lot of sugar
and some milk*" it infers that the user wants to *order a drink* with the following specific requirements.

Expand All @@ -20,16 +20,15 @@ Rhino is

* intuitive. It allows users to utter their intention in a natural and conversational fashion.
* using deep neural networks trained in **real-world situations**.
* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 100 KB of RAM.
* cross-platform. It is implemented in fixed-point ANSI C. Currently **ARM Cortex-M**, **ARM Cortex-A**,
**Raspberry Pi**, **Android**, **iOS**, **watchOS**, **Linux**, **Mac**, **Windows**, and **WebAssembly** are supported.
* compact and computationally-efficient making it suitable for **IoT** applications. It can run with as low as 90 KB of
RAM on an MCU.
* cross-platform. It is implemented in fixed-point ANSI C. Currently **Raspberry Pi**, **Beagle Bone** **Android**,
**iOS**, **Linux**, **Mac**, **Windows**, and **web browsers** (**WebAssembly**) are supported. Additionally support for
various **ARM Cortex-A**, **ARM Cortex-M** (M4/M7) and **DSP cores** is available for commercial customers.
* customizable. It can be customized for any given domain.

[![Rhino in Action](https://img.youtube.com/vi/WadKhfLyqTQ/0.jpg)](https://www.youtube.com/watch?v=WadKhfLyqTQ)

NOTE: Currently Raspberry Pi, Android, and Linux builds are available to the open-source community. But we do have plans
to make other platforms available as well in upcoming releases.

## Table of Contents
* [Try It Out](#try-it-out)
* [Motivation](#motivation)
Expand All @@ -43,9 +42,11 @@ to make other platforms available as well in upcoming releases.
* [Running Demo Applications](#running-demo-applications)
* [Running Python Demo Application](#running-python-demo-application)
* [Running C Demo Application](#running-c-demo-application)
* [Runnning Android Demo Application](#running-android-demo-application)
* [Integration](#integration)
* [C](#c)
* [Python](#python)
* [Android](#android)
* [Releases](#releases)
* [License](#license)

Expand All @@ -64,17 +65,17 @@ requires significant CPU and memory for an on-device implementation.

Rhino solves this problem by providing a tightly-coupled speech recognition and NLU engine that are jointly optimized
for a specific domain (use case). Rhino is quite lean and can even run on small embedded processors
(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 100 KB) making it ideal for
(think ARM Cortex-M or fixed-point DSPs) with very limited RAM (as low as 90 KB) making it ideal for
resource-constrained IoT applications.

## Metrics

The table shows the average CPU usage on three different platforms (1) Raspberry Pi zero, (2) Raspberry Pi 3, and an
Ubuntu box (i5-6500 CPU @ 3.20GHz). You can recreate this using the [C demo application](/demo/c).
The table shows the average CPU usage on two different platforms (1) Raspberry Pi zero and (2) Raspberry Pi 3. You can
recreate this using the [C demo application](/demo/c).

Raspberry Pi zero | Raspberry Pi 3 | Ubuntu Desktop (i5-6500 CPU @ 3.20GHz)
:---: | :---: | :---:
48.7% | 8.9% | 1.2%
Raspberry Pi zero | Raspberry Pi 3
:---: | :---:
46.4% | 7.2%

## Terminology

Expand All @@ -93,8 +94,8 @@ of spoken commands:

### Expression

A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines a mapping between
a (or a set of) spoken commands and its (their) corresponding intent. For example
A context is made of a collection of spoken commands mapped to the user's intent. An expression is an entity that defines
a mapping between a (or a set of) spoken commands and its (their) corresponding intent. For example

* {turnCommand} the lights. -> {turnIntent}
* Make the {location} light {intensityChange}. -> {changeIntensityIntent}
Expand Down Expand Up @@ -143,7 +144,7 @@ python demo/python/rhino_demo.py \
--rhino_context_file_path ./resources/contexts/linux/coffee_maker_linux.rhn \
--porcupine_library_path ./resources/porcupine/lib/linux/x86_64/libpv_porcupine.so \
--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey_alfred_linux.ppn
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/linux/hey\ pico_linux.ppn
```

The following runs the engine on a *Raspberry Pi 3* to infer intent within the context of smart lighting system
Expand All @@ -155,14 +156,19 @@ python demo/python/rhino_demo.py \
--rhino_context_file_path ./resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn \
--porcupine_library_path ./resources/porcupine/lib/raspberry-pi/cortex-a53/libpv_porcupine.so \
--porcupine_model_file_path ./resources/porcupine/lib/common/porcupine_params.pv \
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey_alfred_raspberrypi.ppn
--porcupine_keyword_file_path ./resources/porcupine/resources/keyword_files/raspberrypi/hey\ pico_raspberrypi.ppn
```

### Running C Demo Application

This [demo application](demo/c) is mainly used to show how Rhino can be integrated into an efficient C/C++ application.
Furthermore it can be used to measure runtime metrics of the engine on various supported platforms.

### Running Android Demo Application

Using Android Studio open [demo/android](/demo/android) as an Android project and then run the application. Note that
you need an android phone with developer options enabled connected to your machine in order to run the application.

## Integration

Below are code snippets showcasing how Rhino can be integrated into different applications.
Expand Down Expand Up @@ -282,8 +288,52 @@ collector.
rhino.delete()
```

### Android

Rhino provides a binding for Android using JNI. It can be initialized using.

```java
final String modelFilePath = ... // It is available at lib/common/rhino_params.pv
final String contextFilePath = ...

Rhino rhino = new Rhino(modelFilePath, contextFilePath);
```

once initialized `rhino` can be used for intent inference.


```java
private short[] getNextAudioFrame();

while (rhino.process(getNextAudioFrame()));

if (rhino.isUnderstood()) {
RhinoIntent intent = rhino.getIntent();
// logic to perform an action given the intent object.
} else {
// logic for handling out of context or unrecognized command
}
```

when finalized the processing be sure to reset the object before processing a new stream of audio via

```java
rhino.reset()
```

finally, prior to exiting the application be sure to release resources acquired via

```java
rhino.delete()
```

## Releases

### v1.2.0 April 26, 2019

* Accuracy improvements.
* Runtime optimizations.

### v1.1.0 December 23rd, 2018

* Accuracy improvements.
Expand Down
79 changes: 44 additions & 35 deletions binding/android/rhino/src/main/java/ai/picovoice/rhino/Rhino.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,18 @@

package ai.picovoice.rhino;

import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;

/**
* Binding for Picovoice's speech-to-intent engine (aka Rhino).
* The object directly infers intent from speech commands within a given context of interest in
* real-time. It processes incoming audio in consecutive frames (chunks) and at the end of each
* frame indicates if the intent extraction is finalized. When finalized, the intent can be
* retrieved as structured data in form of an intent string and pairs of slots and values
* representing arguments (details) of intent. The number of samples per frame can be attained by
* calling {@link #frameLength()}. The incoming audio needs to have a sample rate equal to
* {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore, Rhino operates on single
* channel audio.
* Binding for Picovoice's speech-to-intent engine (aka Rhino). The object directly infers intent
* from speech commands within a given context of interest in real-time. It processes incoming audio
* in consecutive frames (chunks) and at the end of each frame indicates if the intent extraction is
* finalized. When finalized, the intent can be retrieved as structured data in form of an intent
* string and pairs of slots and values representing arguments (details) of intent. The number of
* samples per frame can be attained by calling {@link #frameLength()}. The incoming audio needs to
* have a sample rate equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
* Rhino operates on single channel audio.
*/
public class Rhino {
static {
Expand All @@ -40,10 +39,11 @@ public class Rhino {

/**
* Constructor.
* @param modelFilePath Absolute path to file containing model parameters.
* @param contextFilePath Absolute path to file containing context parameters. A context
* represents the set of expressions (commands), intents, and intent
* arguments (slots) within a domain of interest.
*
* @param modelFilePath Absolute path to file containing model parameters.
* @param contextFilePath Absolute path to file containing context parameters. A context
* represents the set of expressions (commands), intents, and intent
* arguments (slots) within a domain of interest.
* @throws RhinoException On failure.
*/
public Rhino(String modelFilePath, String contextFilePath) throws RhinoException {
Expand All @@ -55,7 +55,8 @@ public Rhino(String modelFilePath, String contextFilePath) throws RhinoException
}

/**
* Destructor. This is needs to be called explicitly as we do not rely on garbage collector.
* Destructor. This needs to be called explicitly as we do not rely on garbage collector.
*
* @throws RhinoException On failure.
*/
public void delete() throws RhinoException {
Expand All @@ -69,7 +70,8 @@ public void delete() throws RhinoException {
/**
* Processes a frame of audio and emits a flag indicating if the engine has finalized intent
* extraction. When finalized, {@link #isUnderstood()} should be called to check if the command
* was valid (is within context of interest).
* was valid (is within context of interest) and is understood.
*
* @param pcm A frame of audio samples. The number of samples per frame can be attained by
* calling {@link #frameLength()}. The incoming audio needs to have a sample rate
* equal to {@link #sampleRate()} and be 16-bit linearly-encoded. Furthermore,
Expand All @@ -79,7 +81,7 @@ public void delete() throws RhinoException {
*/
public boolean process(short[] pcm) throws RhinoException {
try {
return process(object, pcm) == 1;
return process(object, pcm);
} catch (Exception e) {
throw new RhinoException(e);
}
Expand All @@ -88,13 +90,14 @@ public boolean process(short[] pcm) throws RhinoException {
/**
* Indicates if the spoken command is valid, is within the domain of interest (context), and the
* engine understood it.
*
* @return Flag indicating if the spoken command is valid, is within the domain of interest
* (context), and the engine understood it.
* @throws RhinoException On failure.
*/
public boolean isUnderstood() throws RhinoException {
try {
return isUnderstood(object) == 1;
return isUnderstood(object);
} catch (Exception e) {
throw new RhinoException(e);
}
Expand All @@ -105,21 +108,22 @@ public boolean isUnderstood() throws RhinoException {
* string and pairs of slots and their values. It should be called only after intent extraction
* is finalized and it is verified that the spoken command is valid and understood via calling
* {@link #isUnderstood()}.
*
* @return Inferred intent object.
* @throws RhinoException On failure.
*/
public RhinoIntent getIntent() throws RhinoException {
final String intentPacked = getIntent(object);
String[] parts = intentPacked.split(",");
if (parts.length == 0) {
throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
}

Map<String, String> slots = new HashMap<>();
Map<String, String> slots = new LinkedHashMap<>();
for (int i = 1; i < parts.length; i++) {
String[] slotAndValue = parts[i].split(":");
if (slotAndValue.length != 2) {
throw new RhinoException(String.format("Failed to retrieve intent from %s", intentPacked));
throw new RhinoException(String.format("failed to retrieve intent from %s", intentPacked));
}
slots.put(slotAndValue[0], slotAndValue[1]);
}
Expand All @@ -130,6 +134,7 @@ public RhinoIntent getIntent() throws RhinoException {
/**
* Resets the internal state of the engine. It should be called before the engine can be used to
* infer intent from a new stream of audio.
*
* @throws RhinoException On failure.
*/
public void reset() throws RhinoException {
Expand All @@ -143,6 +148,7 @@ public void reset() throws RhinoException {
/**
* Getter for expressions. Each expression maps a set of spoken phrases to an intent and
* possibly a number of slots (intent arguments).
*
* @return Expressions.
* @throws RhinoException On failure.
*/
Expand All @@ -154,35 +160,38 @@ public String getContextExpressions() throws RhinoException {
}
}

private native long init(String model_file_path, String context_file_path);

private native long delete(long object);

private native int process(long object, short[] pcm);

private native int isUnderstood(long object);

private native String getIntent(long object);

private native boolean reset(long object);

private native String contextExpressions(long object);

/**
* Getter for length (number of audio samples) per frame.
*
* @return Frame length.
*/
public native int frameLength();

/**
* Audio sample rate accepted by Picovoice.
*
* @return Sample rate.
*/
public native int sampleRate();

/**
* Getter for version string.
*
* @return Version string.
*/
public native String version();

private native long init(String model_file_path, String context_file_path);

private native void delete(long object);

private native boolean process(long object, short[] pcm);

private native boolean isUnderstood(long object);

private native String getIntent(long object);

private native boolean reset(long object);

private native String contextExpressions(long object);
}
2 changes: 1 addition & 1 deletion binding/python/rhino.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@


class Rhino(object):
"""Python binding for Picovoice's Speech to Intent (a.k.a Rhino) engine."""
"""Python binding for Picovoice's Speech-to-Intent (a.k.a Rhino) engine."""

class PicovoiceStatuses(Enum):
"""Status codes corresponding to 'pv_status_t' defined in 'include/picovoice.h'"""
Expand Down
27 changes: 18 additions & 9 deletions binding/python/test_rhino.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,25 +88,34 @@ def _library_path(cls):
system = platform.system()
machine = platform.machine()

if system == 'Linux':
if system == 'Darwin':
return cls._abs_path('lib/mac/x86_64/libpv_rhino.dylib')
elif system == 'Linux':
if machine == 'x86_64':
return cls._abs_path('lib/linux/x86_64/libpv_rhino.so')
elif machine.startswith('arm'):
return cls._abs_path('lib/raspberry-pi/arm11/libpv_rhino.so')

raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
elif system == 'Windows':
return cls._abs_path('lib/windows/amd64/libpv_rhino.dll')
else:
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))

@classmethod
def _context_file_path(cls):
system = platform.system()
machine = platform.machine()

if system == 'Linux' and machine == 'x86_64':
return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
elif system == 'Linux' and machine.startswith('arm'):
return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')

raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))
if system == 'Darwin':
return cls._abs_path('resources/contexts/mac/coffee_maker_mac.rhn')
elif system == 'Linux':
if machine == 'x86_64':
return cls._abs_path('resources/contexts/linux/coffee_maker_linux.rhn')
elif machine.startswith('arm'):
return cls._abs_path('resources/contexts/raspberrypi/coffee_maker_raspberrypi.rhn')
elif system == 'Windows':
return cls._abs_path('resources/contexts/windows/coffee_maker_windows.rhn')
else:
raise NotImplementedError('Rhino is not supported on %s/%s yet!' % (system, machine))


if __name__ == '__main__':
Expand Down
Loading

0 comments on commit 2a9e568

Please sign in to comment.