(Unofficial) Implementation of the paper "Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS" by Chen et al. Paper link. Before starting please visit the original repo. In the original repo, the authors provide us with 2D pose keypoints (detections) and 3D pose (targets) tracking results along with camera calibration and visualizations functions. We extend the functionality to implement the algorithm itself. For 3D pose visualization, we use matplotlib only rather than vispy due to some difficulties with the installations for the latter. Thanks to the original authors for their excellent work.
Currently, the code is only tested for the Campus dataset. But since the values are not hard coded this code should ideally run without errors for other datasets as well. Campus dataset info link - Campus dataset download link
Please download the dataset in Onedrive and extract the zip into a folder named Campus_Seq1. The dataset folder should therefore look like this.
- numpy
- pandas
- opencv
- scipy
- tqdm
- matplotlib
- Thanks to the author of the original repo for making the visualization and calibration code publically available. The graph partitioning problem solver was also provided by the author here. Kudos.
- I have extended the original camera.py and calibration.py to support my implementation. To make the code very easy to use I put everything from config, other helper functions, algorithms and visualization inside a single notebook.
- The original repo suggests using vispy but installation is sometimes complicated. I thought it will be more convenient to use matplotlib animation therefore we need not worry about vispy here.
- At the end of the run of the algorithm, we save the details in the log file. Please go through the logs after running the code to get a feel of the algorithm.
- The code runs below 100 FPS as it is severely unoptimized now. This code was meant to quickly implement the paper to the best of my ability.
- The authors provide the original IDs for the 2D key points (detections) and 3D poses (targets) in the files annotation_2d and annotation_3d respectively. In this current implementation, we only use the 2D pose keypoints only and not the IDs as provided in the annotation_2d (as they are preassigned and you could directly use them for triangulation).
- After looking at the IDs in the annotation_3D we see that the authors probably implemented ReID to get respective results. I have not implemented ReiD since it was not mentioned in the algo. 1 in the paper.
Please see the tracking results for a single timestamp (at 41.72 sec) from 3 different angles below:
cross_view_tracking_3D_plot_campus_dataset_rotate_30_fps.mp4
- Implement ReID to deal with the person re-entering the scene.
- Optimize the codebase to support higher execution speeds.
- Improve velocity estimation:- Currently, calculated by two-point difference rather than multi-point linear regression.
- Refine triangulation methodology:- Currently, we use linear eigen ideally we should atleast use non-iterative L2 methods. Please help us mrcal.
- Enhance visualization to include skeleton or SMPL models.
- Add quantitative results.
The repo is currently unlicensed as the license information is unclear to me in the original repo. I will update this repo when that becomes clear. I have no affiliation with AiFi Inc.