PercepSync provides a single, unified interface for synchronized perceptual data streamed from heterogeneous devices. Develop your embodied AI agent using your local webcam, and deploy the same agent with the HoloLens without any code changes!
Connect a webcam and a microphone to your machine, then download the latest binary for your operating system from the Releases page and run it.
# First make it executable
$ chmod +x PercepSync
# Now, run it!
$ ./PercepSync local
# If it complains about missing libasound, install it by running the following command
$ sudo apt install libasound2-dev
# If you want the preview window specify --enable-preview
$ ./PercepSync --enable-preview local
You can then run the sample Python script in another terminal to see what's being streamed. Make sure you have access to a display. Please use this Python script as a reference on how to stream data from PercepSync
using Python.
# Install the required packages
$ pip install -r samples/requirements.txt
# Now, run it!
$ python samples/simple_subscriber.py
First, install the latest version of PercepSyncHoloLensCapture
from the Releases page and install it on your HoloLens 2 by following the steps below:
- Unzip the
PercepSyncHoloLensCapture
package. - Go to the Windows Device Portal for your HoloLens 2, then Views > Apps.
- Under the Deploy apps section, ensure that you're on the Local Storage tab and choose the
.msixbundle
file from the unzipped package. - Press the install button.
Once installed, run PercepSyncHoloLensCapture
. It automatically places a default config file called PercepSyncHoloLensCaptureConfig.toml
in the Documents
folder if it's not already there. The config file is pretty self-explanatory. You can set the address for the PercepSync
server, as well as pick and choose which sensor to turn on. Note that not all sensors are currently supported by PercepSync
. You can download PercepSyncHoloLensCaptureConfig.toml
using the Windows Device Portal (System > Device Manager), modify it as desired and then reupload it.
Once you're satisfied with your config, download the matching version of PercepSync
from the Releases page and run it.
# First make it executable
$ chmod +x PercepSync
# Now, run it!
$ ./PercepSync hololens
# If you want the preview window specify --enable-preview
$ ./PercepSync --enable-preview hololens
Now run PercepSyncHoloLensCapture
on your HoloLens and start capturing. It'll automatically connect to PercepSync
. You can then run the same Python script in another terminal to see what's being streamed.
# Install the required packages
$ pip install -r samples/requirements.txt
# Now, run it!
$ python samples/simple_subscriber.py
You can enable text-to-speech and speech-to-text by passing the --enable-tts
and --enable-stt
options. PercepSync
relies on Microsoft Azure Speech Service to handle speech, so make sure you also pass in your Azure credentials via a config file. In the local mode, the speech will be played via the speaker, while in the HoloLens mode, the speech will be played on the HoloLens.
NOTE: Microsoft Azure Speech Service SDK relies on OpenSSL 1.x, which is no longer shipped with Ubuntu 22.04. As a result, you need to install OpenSSL 1.x from sources. Instructions can be found here. Please make sure you set the environment variable SSL_CERT_DIR=/etc/ssl/certs
.
$ cat config.toml
[azure_speech_config]
subscription_key = "your-azure-subscription-key"
region = "your-region"
# local mode
$ ./PercepSync --config-file config.toml --enable-tts local
# hololens mode
$ ./PercepSync --config-file config.toml --enable-tts hololens
Now in another terminal, run the sample scripts.
# Install the required packages
$ pip install -r samples/requirements.txt
# TTS
$ python samples/simple_tts.py
TTS Text: Hello, world!
# SST
$ python samples/simple_subscriber.py
Transcribed Text: Hello, world!
You can configure PercepSync
via command line options as well as a configuration file. All options are available via the configuration file, but not via command line options. You can refer to the help message via the command line, or Config.cs for more details. Note that the Pascal Case property names, e.g., EnablePreview
, are translated into snake case, e.g., enable_preview
in the toml configuration file.
You can configure PercepSyncHoloLensCapture
via a configuration file PercepSyncHoloLensCaptureConfig.toml
placed into the Documents
folder of the HoloLens 2. If it's already not there, PercepSyncHoloLensCapture
will create a default one. In order to modify PercepSyncHoloLensCaptureConfig.toml
download it using the Windows Device Portal (System > Device Manager), modify it as desired and then reupload it. You can refer to Config.cs to see which configuration options are available.
PercepSync
uses ZeroMQ to publish data from different input devices. Data that can be synchronized will be synchronized and published to a single topic. The serialization format is MessagePack.
Currently, one topic for synchronized perception data is available:
perception
"""
This packet of data is synchronized based on the FPS rate, which can be configured via the configuration file. For example, if the FPS rate is 5 (default), PercepSync will generate this packet roughly every 1/5 = 0.2 seconds, and the audio buffer will roughly be 0.2 seconds long.
If speech is detected, a transcribed text will be included in the packet. Otherwise, it'll be an empty string. Note that the transcribed text will be included in the packet at the end of the speech, since we can't go back in time.
"""
{
"message": {
"frame": {
"pixelData": bytes, # raw pixels in RGB 24-bit for a single frame
"width": int,
"height": int,
"stride": int,
},
"audio": {
"buffer": bytes, # audio buffer in 16KHz, 1 channel, 16-bit PCM
},
"transcribedText": {
"text": str, # if no speech is detected, empty string.
},
},
"originatingTime": int,
}
PercepSync
uses ZeroMQ to accept text-to-speech requests data from different clients. It uses the Push-Pull pattern. The serialization format is MessagePack. Please see the sample script for more details. See below for the request format:
{
"text": str
}
By default, PercepSync
uses /dev/video0
, but if you want to use another video device, you can pass it in using the --camera-device-id
option.
# First find out available video devices.
$ ls -ltrh /dev/video*
crw-rw----+ 1 root video 81, 1 Sep 21 08:50 /dev/video1
crw-rw----+ 1 root video 81, 0 Sep 21 08:50 /dev/video0
# Let's use /dev/video1
$ ./PercepSync local --camera-device-id /dev/video1
By default, PercepSync
uses plughw:0,0
as both input and output devices, but if you want to use another audio device, you can pass it in using the --audio-input-device-name
and --audio-output-device-name
options. The first number refers to the "card" number, and the second number refers to the "device" number. You can find out all the output devices with aplay -l
, and input devices with arecord -l
.
# For output devices.
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: Device [Device], device 3: HDMI 0 [HDMI 0]
Subdevices: 1/1
Subdevice #0: subdevice #0
...
$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Device [Device], device 0: USB Audio [USB Audio]
Subdevices: 0/1
Subdevice #0: subdevice #0
...
$ ./PercepSync local --audio-output-device-name plughw:0,3 --audio-input-device-name plughw:1,0
The main server application PercepSync
is a cross-platform application that targets both Linux and Windows. On Linux, it runs on .NET 7.0 while on Windows it runs on .NET Framework 4.7.2. All of these versions of .NET may be confusing, and you can read more about the history of .NET here.
If you're making changes only to PercepSync
, it's often most convenient to develop on Linux. We recommend using Visual Studio Code to do so as it supports C# and .NET quite well. Open the root of the repository using Visual Studio Code, and place the following files under the .vscode
folder to set up debugging with PercepSync
.
tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "build-percepsync-net7.0",
"command": "dotnet",
"type": "process",
"args": [
"build",
"${workspaceFolder}/src/PercepSync/PercepSync.csproj",
"-f",
"net7.0",
"/property:GenerateFullPaths=true",
"/consoleloggerparameters:NoSummary;ForceNoAlign"
],
"problemMatcher": "$msCompile"
},
{
"label": "publish-percepsync-net7.0",
"command": "dotnet",
"type": "process",
"args": [
"publish",
"-f",
"net7.0",
"${workspaceFolder}/src/PercepSync/PercepSync.csproj",
"/property:GenerateFullPaths=true",
"/consoleloggerparameters:NoSummary;ForceNoAlign"
],
"problemMatcher": "$msCompile"
},
{
"label": "watch-percepsync-net7.0",
"command": "dotnet",
"type": "process",
"args": [
"watch",
"run",
"--project",
"${workspaceFolder}/src/PercepSync/PercepSync.csproj",
"${workspaceFolder}/PercepSync.sln",
"-f",
"net7.0"
],
"problemMatcher": "$msCompile"
}
]
}
launch.json
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Debug PercepSync",
"type": "coreclr",
"request": "launch",
"preLaunchTask": "build-percepsync-net7.0",
"program": "${workspaceFolder}/src/PercepSync/bin/Debug/net7.0/PercepSync.dll",
"args": [
// Your command line arguments go here...
// For example...
"--config-file",
"config.toml",
"--enable-tts",
"hololens"
],
"cwd": "${workspaceFolder}",
"env": {
"SSL_CERT_DIR": "/etc/ssl/certs"
},
"console": "integratedTerminal",
"stopAtEntry": false,
"justMyCode": false
},
{
"name": ".NET Core Attach",
"type": "coreclr",
"request": "attach"
}
]
}
We currently don't have integration testing between PercepSync
and PercepSyncHoloLensCapture
, so if you're making a substantial change, you should manually test it by running an official release of PercepSyncHoloLensCapture
on a HoloLens 2 (or even an emulator).
There may be cases where you need to develop PercepSync
on Windows, e.g., you need to make changes for both PercepSync
and PercepSyncHoloLensCapture
as described below. In this case, you can use Visual Studio to do so.
PercepSyncHoloLensCapture
is a UWP application, and therefore can only be developed on a Windows machine using Visual Studio. Open the solution file PercepSync.sln
at the root of the repository with Visual Studio to start developing. You can follow the instructions from the psi-samples
repository to set things up so that you can load your code directly onto your HoloLens 2.
Sometimes you may need to make changes on both applications. In this case, your best bet is to use Visual Studio on Windows. You can set it up so that both can be run at the same time by setting up multiple startup projects.
We use various tools to maintain code quality automatically. These tools are automatically run on every PR against the main branch and every PR committed to it. They are also run locally on every commit as pre-commit hooks. You can also set up some of the tools with your IDE so they are run every time you save a file, which makes the whole process a lot smoother.
# Install pre-commit by following the instructions specified here: https://pre-commit.com/#install
# For MacOS, we recommend using homebrew.
# For Linux, use the 0-dependency zipapp. If you choose tor un `sudo pip install pre-commit` instead,
# just be mindful that it may affect your virtualenvs.
# Install pre-commit hooks to the repo
$ pre-commit install
# or with 0-dependency zipapp (version 3.4.0)
$ python path/to/pre-commit-3.4.0.pyz install
# Install all the necessary local tools
$ dotnet tool restore
We use CSharpier as our automatic code-formatter. It is automatically run against every commit as part of pre-commit hooks. However, it is highly recommended that you set up your text editor or IDE to run it automatically after each save. For VSCode, you can install the official extension.
Simply create a tag of form v\d+.\d+.\d+
and push it. Both Linux and Windows versions of PercepSync
as well as PercepSyncHoloLensCapture
will be automatically built, then added it to an auto-generated release page.