Skip to content

inakierregueab/YOLO11-JDE

Repository files navigation

YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID

We introduce YOLO11-JDE, a fast and accurate multi-object tracking (MOT) solution that combines real-time object detection with self-supervised Re-Identification (Re-ID). By incorporating a dedicated Re-ID branch into YOLO11s, our model performs Joint Detection and Embedding (JDE), generating appearance features for each detection. The Re-ID branch is trained in a fully self-supervised setting while simultaneously training for detection, eliminating the need for costly identity-labeled datasets. The triplet loss, with hard positive and semi-hard negative mining strategies, is used for learning discriminative embeddings. Data association is enhanced with a custom tracking implementation that successfully integrates motion, appearance, and location cues. YOLO11-JDE achieves competitive results on MOT17 and MOT20 benchmarks, surpassing existing JDE methods in terms of FPS and using up to ten times fewer parameters. Thus, making our method a highly attractive solution for real-world applications.

Note: This paper has been accepted for presentation at the 5th Real-World Surveillance: Applications and Challenges workshop at WACV 2025. Read the full paper on arXiv.


Key Features

  • Real-Time Performance: Achieves competitive FPS rates while maintaining high tracking accuracy on MOT17 and MOT20 benchmarks.
  • Self-Supervised Re-ID Training: Eliminates the need for costly identity-labeled datasets through Mosaic data augmentation and triplet loss with hard and semi-hard mining strategies.
  • Custom Data Association: Integrates motion, appearance, and location cues for enhanced object tracking, including robust handling of occlusions.
  • Lightweight Architecture: Uses up to 10x fewer parameters than other JDE methods, making it efficient and scalable for diverse applications.

Dataset Information

The datasets used for training YOLO11-JDE are:

  1. CrowdHuman

    • Description: Contains a wide range of crowded scenes with rich annotations.
    • Download: Please download from the official website: https://www.crowdhuman.org.
    • Comments: The original training and validation splits are preserved
  2. MOT17

    • Description: Provides sequences for multiple object tracking. Only bounding box annotations are used for training (track IDs are only used for validation).
    • Download: Please download from the official website: https://motchallenge.net/data/MOT17/.
    • Comments: Following previous work (e.g., Towards Real-Time Multi-Object Tracking and Boost-track: boosting the similarity measure and detection confidence for improved multiple object tracking), we construct a validation set by using the second half of each training sequence and removing videos in ETH that overlap with the MOT16 benchmark.

Both datasets must be converted to YOLO format. The config file used is "crowdhuman.yaml" which should be mapped to a folder containing both datasets merged.


Download Model Weights

Pre-trained model weights for YOLO11s-JDE are available for download.


Results

Benchmarks

MOT17 and MOT20 results under private detection protocols:

Metric MOT17 MOT20
HOTA 56.6 53.1
MOTA 65.8 70.9
IDF1 70.3 66.4
FPS 35.9 18.9

Compared to state-of-the-art methods, YOLO11-JDE offers superior FPS and competitive tracking accuracy with significantly fewer parameters.


Acknowledgements

This work was partially supported by:

  • The Spanish project PID2022-136436NB-I00.
  • ICREA under the ICREA Academia programme.
  • The Milestone Research Program at the University of Barcelona.

The code for YOLO11-JDE is based on the Ultralytics YOLO repository, which provides a robust foundation for real-time object detection models.


Citation

If you find YOLO11-JDE useful in your research or applications, please cite our paper:

@misc{erregue2025yolo11jdefastaccuratemultiobject,
      title={YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID}, 
      author={Iñaki Erregue and Kamal Nasrollahi and Sergio Escalera},
      year={2025},
      eprint={2501.13710},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.13710}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages