HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance
We present HMD-EgoPose: a deep learning-based framework making hand and rigid object pose estimation available to commercially available optical see-through head-mounted displays (OST-HMDs) using a low-latency streaming approach and a computing workstation.
This work contains code adapted from the following repositories:
- EfficientPose by ybkscht: Tensorflow/Keras implementation with scripts for model structure, data augmentation, and evaluation
- Yet-Another-EfficientDet-Pytorch by zylo117: PyTorch network implementation for the EfficientNet and EfficientDet backbones
- node-dss by bengreenier: Simple signalling server for WebRTC
- Project setup
- Data
- Debug
- Training
- Evaluation
- Run Sample with the HoloLens 2 and a Workstation
- Citation
Folder setup for the project sample.
datasets/
... real_colibri_v1/
... syn_colibri_v1/
format-labels/
pytorch-sandbox/
... efficientdet/
... efficientnet/
... efficientpose/
... eval/
... generators/
... onnx-models/
... utils/
... weights/
unity-sandbox/
... HoloLens2-Machine-Learning-WebRTC/
... node-dss/
... OpenCVDNNSandboxNetCore/
... WebRTCNetCoreSandbox/
Download the Synthetic and Real datasets for the Colibri II drill, unzip the folders to a specified destination, and convert data format to prepare for training link to project site for data access.
- Navigate to the
format-labels/
directory and create a minimal virtualenv to run the extraction scripts - Download Python for Windows (tested with Python 3.7.9)
python -m virtualenv -p 3.7 venv
.\venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
- After unzipping the downloaded datasets, begin formatting the Synthetic and Real datasets with the following commands (this can take a while)
# Prepare synthetic dataset
python .\pkl_to_formatted_txt.py --dataset syn_colibri --in_dir "C:/Users/Mitch/Downloads/" --out_dir "C:/git/public/hmd-ego-pose/datasets/"
# Prepare real dataset
python .\pkl_to_formatted_txt.py --dataset real_colibri --in_dir "C:/Users/Mitch/Downloads/" --out_dir "C:/git/public/hmd-ego-pose/datasets/"
After successful completion of these commands, you should have the following output and directory structure for both the syn_colibri_v1
and real_colibri_v1
folders.
The .yml
file should contain rotation and translation matrices for the camera to model, the object bounding box and class, and the drill tip transform as the below sample.
0:
- cam_R_m2c: [0.7887479987958447, -0.6285408366572683, 0.03812666487227634, -0.601701596829051, -0.8060479489556668, 0.16356679108674974, -0.07601779699325562, 0.09633129835128784, -0.9960060119628906]
cam_t_m2c: [-29.27876390254138, -52.119556134360224, 494.5744672954083]
obj_bb: [210, 271, 6, -33]
obj_id: 1
drill_tip_transform: [101.554, 338.261, -326.146, 1]
- We include an optional sample script
debug.py
to view the ground truth data projected onto input RGB image data and confirm correct data formatting - To check the formatting and correctness of training data, run the following sample commands
# View the synthetic dataset fold 0
python .\debug.py --dataset syn_colibri --fold 0
# View the real dataset fold 0
python .\debug.py --dataset real_colibri --fold 0
- The images will show as below:
Includes the resources for data formatting, loading, training, and export of a custom machine learning framework for rigid surgical drill pose estimation
- Ensure that the correct CUDA and cuDNN library versions are installed to support PyTorch 1.8.2 LTS (tested with CUDA 11.1)
- Download Python for Windows (tested with Python 3.7.9)
- In the
pytorch-sandbox/
directory, create the virtual environment setup for training as below
python -m virtualenv -p 3.7 venv
.\venv\Scripts\activate
# For training on NVIDIA 3000 series GPUs (CUDA 11.1 support)
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio===0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
# Install requirements
pip install -r requirements.txt
# Compile the cython modules
python setup.py build_ext --inplace
- From the
pytorch-sandbox/
directory, we can begin training HMD-EgoPose using theEfficientNet-B0
backbone - To monitor training/validation accuracy/loss, a TensorBoard training log is output to the
runs/
folder - At the end of each epoch, the model weights with the lowest transformation error will be saved to the
train_weights
folder - After completing training on the dataset, the model weights which achieved the highest accuracy on the validation set are selected for evaluation on the test set
# Begin training on fold 0 of synthetic colibri dataset
python main.py --dataset='syn_colibri' --iter=0 --img_size='256,256' --batch_size=16 --fold=0
- After training on the synthetic colibri dataset, the best performing training weights can be used for fine-tuning performance on the smaller real colibri dataset
# Fine tune model using best performing weights from the synthetic colibri dataset training on fold 0 of the real colibri dataset
python main.py --dataset='real_colibri' --iter=0 --img_size='256,256' --batch_size=16 --fold=0 --ckpt=train_weights/syn_colibri__fold_3__iter_0__mixed_t_mean_10.29__epo_49.pth --fine_tune=True
- To evaluate the model with the best performing weights for that specific dataset (real or synthetic colibri) use:
# Best performing weights on real colibri dataset
python evaluate.py --dataset='real_colibri' --iter=0 --img_size='256,256' --batch_size=1 --fold=4 --n_workers=6 --weights='train_weights/real_colibri__fold_2__iter_0__mixed_t_mean_13.19__epo_44.pth'
# Best performing weights on synthetic colibri dataset
python evaluate.py --dataset='syn_colibri' --iter=0 --img_size='256,256' --batch_size=1 --fold=4 --n_workers=6 --weights='train_weights/syn_colibri__fold_3__iter_0__mixed_t_mean_10.29__epo_49.pth'
- After running, the relevant results will be printed to the terminal window as in the below sample for the real dataset
- For each fold the following metrics are presented:
- Tool ADD (in mm) is given by
TranslationErrorMean_in_mm
with standard deviationTranslationErrorStd_in_mm
- Drill tip error (in mm) is given by
TranslationErrorTipMean_in_mm
with standard deviationTranslationErrorTipStd_in_mm
- Drill bit direction error (in deg) is given by
RotationErrorMean_in_degree
with standard deviationRotationErrorStd_in_degree
- Hand ADD (in mm) is given by
TranslationErrorHandMean_in_mm
with standard deviationTranslationErrorHandStd_in_mm
- Tool ADD (in mm) is given by
- Tested with Unity 2019.4 LTS, Visual Studio 2019, and the HoloLens 2 with build 19041.1161 (Windows Holographic, version 20H2 - August 2021 Update) though will likely work fine with newer versions
- The 10.0.19041.1161 HoloLens 2 build can be downloaded from the following link and installed using the Advanced Recovery Companion
- Input video frames of size
(1, 3, 512, 512)
for online inference(NCWH)
- Pretrained PyTorch weights of the HMD-EgoPose framework were formatted and exported to an ONNX model format for use
To run the sample on the HoloLens 2, there are four required components:
HoloLens2-Machine-Learning-WebRTC
: the Unity project which can be deployed on the computing workstation (desktop) or HoloLens 2WebRTCNetCoreSandbox
: the WebRTC C# project for receiving video frames and performing network inference on the computing workstation (desktop)OpenCVDNNSandboxNetCore
: a sample C# project used for optional model validation against the Python implementationnode-dss
: the signaling server for WebRTC based communication
We use node-dss as a simple signaling server for WebRTC and streaming our video/pose predictions from the HoloLens 2 to the computing workstation. To install, follow the directions below:
- Install nodejs to leverage this service
- Install dependencies with npm - from the
unity-sandbox/node-dss
open a Git Bash window and run:
npm install
export DEBUG=dss*
set DEBUG=dss*
npm start
- You should see the Git Bash window output as below:
- Open
HoloLens2-Machine-Learning-WebRTC
project in Unity, load theWebcamVideoLocal
scene - Navigate to the
NodeDssSignaler
component in the Unity window and adjust theHttpServerAddress
to the IP of your current machine- For example,
http://xxx.xxx.x.xx:3000/
, while keeping the port as 3000 - Leave the
LocalPeerId
andRemotePeerId
values as is
- For example,
- On playing the Unity scene, you should be able to see video from a webcam connected to the PC (on the
LocalMedia
→VideoPlayer
component) and you should see that your program has begun polling in the node dss window as:
2022-06-17T14:33:28.852Z dss:boot online @ 3000
2022-06-17T14:44:32.776Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 1.051 ms
2022-06-17T14:44:33.265Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.073 ms
2022-06-17T14:44:33.768Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.085 ms
2022-06-17T14:44:34.270Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.066 ms
2022-06-17T14:44:34.773Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.065 ms
- Now, open the
WebRTCNetCoreSandbox
project in Visual studio - In the
Program.cs
file, make the following adjustments based on your own setup- Update the file path to the location of the
onnx-models/
folder
// Update the file path to point to the location of the onnx-models/ folder const string basePath = "C:/git/public/hmd-ego-pose/pytorch-sandbox/onnx-models/";
- Change the
HttpServerAddress
to the IP of your PC
// Initialize the signaler signaler = new NodeDssSignaler() { // Set to your IP (same as in the Unity project), leave the // LocalPeerId and RemotePeerId values HttpServerAddress = "http://192.168.2.29:3000/", LocalPeerId = "WebRTCSandbox", RemotePeerId = "HoloLens2MachineLearningWebRTC", };
- Update the file path to the location of the
- After ensuring the Unity project is already running, set the project to
Release
andx64
and run the sample - As indicated in the below image, you can either create a 3D printed drill to test the network, OR aim your webcam at your computer monitor and view images/video from the surgical drill dataset
- In the image, the left window is the Unity window running the
HoloLens2-Machine-Learning-WebRTC
Unity project; the right window is theWebRTCNetCoreSandbox
project; and the bottom right window is the Git Bash window runningnode-dss
- The predicted pose for the surgical drill identified in the webcam video is applied to the virtual drill model in the Unity scene (in red)
- Open
HoloLens2-Machine-Learning-WebRTC
project in Unity and load to theHoloLensVideoLocal
scene - Navigate to the
NodeDssSignaler
component in the Unity window and adjust theHttpServerAddress
to the IP of your current machine (desktop not the HoloLens 2)- For example,
http://xxx.xxx.x.xx:3000/
, while keeping the port as 3000 - Leave the
LocalPeerId
andRemotePeerId
values as is
- For example,
- Switch build platform to
Universal Windows Platform
, selectHoloLens
for target device, andARM
as the target platform - Build the Visual Studio project and deploy to the HoloLens 2
- On opening the app on the HoloLens 2, you should be able to see that your program has begun polling in the node dss window as:
2022-06-17T15:22:56.034Z dss:boot online @ 3000
2022-06-17T15:22:56.424Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.908 ms
2022-06-17T15:22:56.520Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.074 ms
2022-06-17T15:22:57.044Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.075 ms
2022-06-17T15:22:57.617Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.070 ms
2022-06-17T15:22:58.188Z dss GET /data/HoloLens2MachineLearningWebRTC 404 - - 0.062 ms
- Now, open the
WebRTCNetCoreSandbox
project in Visual studio - In the
Program.cs
file, make the following adjustments based on your own setup- Update the file path to the location of the
onnx-models/
folder
// Update the file path to point to the location of the onnx-models/ folder const string basePath = "C:/git/public/hmd-ego-pose/pytorch-sandbox/onnx-models/";
- Change the
HttpServerAddress
to the IP of your PC
// Initialize the signaler signaler = new NodeDssSignaler() { // Set to your IP (same as in the Unity project), leave the // LocalPeerId and RemotePeerId values HttpServerAddress = "http://192.168.2.29:3000/", LocalPeerId = "WebRTCSandbox", RemotePeerId = "HoloLens2MachineLearningWebRTC", };
- Update the file path to the location of the
- After ensuring the HoloLens 2 application is already running, set the project to
Release
andx64
and run the sample - As indicated in the below image, you can either create a 3D printed drill to test the network, OR aim your webcam at your computer monitor and view images/video from the surgical drill dataset
- In the image, the virtual surgical drill model is visible in red, with the
WebRTCNetCoreSandbox
project; and the bottom right window is the Git Bash window runningnode-dss
inb the background - The predicted pose for the surgical drill identified in the HoloLens 2 front facing camera video is applied to the virtual drill model in the Unity scene (in red)
- Though the drill pose was not perfect in this sample, there was some additional error introduced through the mixed reality capture API and the lack of camera extrinsic parameter streaming with WebRTC and the HoloLens
Test data is included in the onnx-models
folder.
- The
scratchpad.py
script demonstrates loading of data and the ONNX model with basic inference in Python (no additional output filtering) - Filtering and regression of network outputs is demonstrated in
evaluate.py
to arrive at a rotation and translation prediction for the rigid object - The output of
scratchpad.py
andevaluate.py
can be used to compare to theOpenCVDNNSandboxNetCore
C# implementation to assess accuracy of predictions - Running
evaluate.py
with sample input image000000.png
, we arrive at the following prediction result
// Desired final results of prediction (from evaluate.py script with input 000000.png)
camera_matrix:
[[480. 0. 128.]
[ 0. 480. 128.]
[ 0. 0. 1.]]
rotation_gt: [[-2.92229713 1.00365327 0.09353241]]
translation_gt: [[-0.02927876 -0.05211956 0.49457447]]
rotation_pred: [[-2.9054394 1.0276762 0.1723399]]
translation_pred: [[-0.02811211 -0.05858146 0.48664188]]
- Running the C#
OpenCVDNNSandboxNetCore
project with sample input image000000.png
, we arrive at the following pre-processed input frame:- Note: You will need to adapt the paths in this project to be relevant to your local path structure
- And output to the terminal window, we can observe that the C# produced results are nearly identical to the Python produced results:
- Build TensorRT from source for processing
- Download Nuget command line (https://www.nuget.org/downloads) and add to environment variables
- Resources from ONNX runtime on how to build the package link 1 and link 2
xxx\Release\Release\nuget-artifacts
- Requirements
- Cuda 11.0
- cuDNN v8.0.5
- TensorRT 7.2.3.4
- Below are the commands used for building the ONNX runtime library from source on my hardware, successful completion of these builds will result in a NuGet package (like those in the
unity-sandbox/
directory)
# CPU
# Debug
# Release
.\build.bat --config Release --build_shared_lib --parallel --cmake_generator "Visual Studio 16 2019" --build_nuget --build_dir cpu --skip_tests
# CUDA
# Using Cuda 11.0, cuDNN v8.0.5
# Debug
# Release
.\build.bat --config Release --build_shared_lib --parallel --cmake_generator "Visual Studio 16 2019" --build_nuget --skip_tests --use_cuda --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --cuda_version 11.0 --build_dir cuda
# TensorRT
# Using Cuda 11.0, cuDNN v8.0.5, TensorRT 7.2.3.4
# Debug
# Release
.\build.bat --config Release --build_shared_lib --parallel --cmake_generator "Visual Studio 16 2019" --build_nuget --skip_tests --use_cuda --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0" --use_tensorrt --tensorrt_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\TensorRT-7.2.3.4" --cuda_version 11.0 --build_dir tensorrt
# DirectML
# Debug
# Release
.\build.bat --config Release --build_shared_lib --parallel --cmake_generator "Visual Studio 16 2019" --build_nuget --use_dml --build_dir directml --skip_tests
If you found this code repo useful, please consider citing the associated publication:
@article{doughty2022hmd,
title = {{HMD}-{EgoPose}: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance},
issn = {1861-6429},
url = {https://doi.org/10.1007/s11548-022-02688-y},
doi = {10.1007/s11548-022-02688-y},
journaltitle = {International Journal of Computer Assisted Radiology and Surgery},
author = {Doughty, Mitchell and Ghugre, Nilesh R.},
date = {2022-06-14},
}