Skip to content

Paige-Norton/HTAMotr

Repository files navigation

HTAMotr: A Novel Half-To-All MOTR Approach for Robust Video Text Tracking with Incomplete Annotations

Introduction

Overview

Abstract. This paper presents a novel Half-To-All Multiple Object Tracking and Recognition (HTAMotr) approach tailored to address the challenges posed by incomplete data annotations in video text tracking. The proposed method introduces three key strategies: rotated queries to improve anchor alignment with text regions, the Proposal-For-Groundtruth Strong Correlation (PForG) strategy to mitigate the negative effects of incomplete annotations, and an overlapping anchor filter to resolve ID switching issues. Experiments conducted on the DSText dataset demonstrate the effectiveness of HTAMotr, achieving state-of-the-art performance without requiring additional pre-training data or extensive epochs. By addressing the limitations of traditional MOTR paradigms, this work contributes to advancing video text tracking techniques and facilitating the development of more robust and efficient algorithms. The code and datasets are available at https://github.com/Paige-Norton/HTAMotr, complete with usage guides to facilitate the reproduction of experimental results.

Main Results

Methods MOTR paradigm End2End MOTA MOTP IDF1 Mostly Matched Mostly Lost
TransDETR 27.55 78.40 44.28 1583 9891
Liu et al. × × 36.87 79.24 48.99 2123 6829
HTAMotr(No filter) 48.91 75.03 63.07 6394 2295
HTAMotr × 56.22 75.15 65.08 6275 2361

Notes

  • HTAMotr's already trained models can be found in Google Drive
  • The results of the HTAMotr are available in Google Drive
  • All experiments were conducted using PyTorch on NVIDIA GeForce RTX 3090 GPUs.
  • All experiments were not pre-trained on other datasets

Visualization

TransDETR HTAMotr

Installation

The codebase is built on top of MOTRv2, and the idea of generating queries through proposals originates from this paper. Thanks for providing us with the code framework and idea.

Requirements

  • Install pytorch using conda (optional)

    conda create -n HTAMotr python=3.8
    conda activate HTAMotr
    conda install pytorch=1.8.1 torchvision=0.9.1 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

Since the incomplete DSText data was completed when the author wrote the article, please download the videos and incomplete DSText annotation to reproduce this code and organize them as following:

.
├── data
│	├── DSText
│	   ├── images
│	       ├── train
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│	       ├── test
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│	   ├── labels_with_ids
│	       ├── train
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│
│	├── r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint.pth

The data annotations we provide are incomplete, which is a key focus of this study and holds significant importance for reproducing the experiments presented in this paper.

Training

Similar to MOTRv2, you may download the coco pretrained weight from Deformable DETR (+ iterative bounding box refinement), and modify the --pretrained argument to the path of the weight. Then training HTAMotr on 2 GPUs as following:

./tools/train.sh configs/motrv2DSText.args

Inference on DSText Test Set

You can download the trained model checkpoint0006.pth in Google Drive to perform inference directly.

# run a simple inference on our pretrained weights
./tools/simple_inference.sh ./exps/motrv2DSText/run1_20query/checkpoint0006.pth

# then zip the results
zip motrv2.zip tracker/ -r

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages