HTAMotr: A Novel Half-To-All MOTR Approach for Robust Video Text Tracking with Incomplete Annotations

Introduction

Abstract. This paper presents a novel Half-To-All Multiple Object Tracking and Recognition (HTAMotr) approach tailored to address the challenges posed by incomplete data annotations in video text tracking. The proposed method introduces three key strategies: rotated queries to improve anchor alignment with text regions, the Proposal-For-Groundtruth Strong Correlation (PForG) strategy to mitigate the negative effects of incomplete annotations, and an overlapping anchor filter to resolve ID switching issues. Experiments conducted on the DSText dataset demonstrate the effectiveness of HTAMotr, achieving state-of-the-art performance without requiring additional pre-training data or extensive epochs. By addressing the limitations of traditional MOTR paradigms, this work contributes to advancing video text tracking techniques and facilitating the development of more robust and efficient algorithms. The code and datasets are available at https://github.com/Paige-Norton/HTAMotr, complete with usage guides to facilitate the reproduction of experimental results.

Main Results

DSText 2023

Methods	MOTR paradigm	End2End	MOTA	MOTP	IDF1	Mostly Matched	Mostly Lost
TransDETR	√	√	27.55	78.40	44.28	1583	9891
Liu et al.	×	×	36.87	79.24	48.99	2123	6829
HTAMotr(No filter)	√	√	48.91	75.03	63.07	6394	2295
HTAMotr	√	×	56.22	75.15	65.08	6275	2361

Notes

HTAMotr's already trained models can be found in Google Drive
The results of the HTAMotr are available in Google Drive
All experiments were conducted using PyTorch on NVIDIA GeForce RTX 3090 GPUs.
All experiments were not pre-trained on other datasets

Visualization

TransDETR	HTAMotr

Installation

The codebase is built on top of MOTRv2, and the idea of generating queries through proposals originates from this paper. Thanks for providing us with the code framework and idea.

Requirements

Install pytorch using conda (optional)

conda create -n HTAMotr python=3.8
conda activate HTAMotr
conda install pytorch=1.8.1 torchvision=0.9.1 cudatoolkit=10.2 -c pytorch

Other requirements
```
pip install -r requirements.txt
```
Build MultiScaleDeformableAttention
```
cd ./models/ops
sh ./make.sh
```

Usage

Dataset preparation

Since the incomplete DSText data was completed when the author wrote the article, please download the videos and incomplete DSText annotation to reproduce this code and organize them as following:

.
├── data
│	├── DSText
│	   ├── images
│	       ├── train
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│	       ├── test
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│	   ├── labels_with_ids
│	       ├── train
│	           ├── Activity
│	           ├── Driving
│	           ├── Game
│	           ├── ....
│
│	├── r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint.pth

The data annotations we provide are incomplete, which is a key focus of this study and holds significant importance for reproducing the experiments presented in this paper.

Training

Similar to MOTRv2, you may download the coco pretrained weight from Deformable DETR (+ iterative bounding box refinement), and modify the --pretrained argument to the path of the weight. Then training HTAMotr on 2 GPUs as following:

./tools/train.sh configs/motrv2DSText.args

Inference on DSText Test Set

You can download the trained model checkpoint0006.pth in Google Drive to perform inference directly.

# run a simple inference on our pretrained weights
./tools/simple_inference.sh ./exps/motrv2DSText/run1_20query/checkpoint0006.pth

# then zip the results
zip motrv2.zip tracker/ -r

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
configs		configs
datasets		datasets
doc		doc
models		models
tools		tools
tools_ds		tools_ds
util		util
.gitignore		.gitignore
Concat_fig.py		Concat_fig.py
DrawGT.py		DrawGT.py
DrawLine.py		DrawLine.py
README.md		README.md
caculate.py		caculate.py
det_db_DBnet_test_DS.json		det_db_DBnet_test_DS.json
det_db_DBnet_test_DS_scoreOver0.5.json		det_db_DBnet_test_DS_scoreOver0.5.json
det_db_DBnet_train_DS.json		det_db_DBnet_train_DS.json
engine.py		engine.py
frame2video.py		frame2video.py
main.py		main.py
makezip.sh		makezip.sh
requirements.txt		requirements.txt
smoothL1loss.py		smoothL1loss.py
submit_dance.py		submit_dance.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTAMotr: A Novel Half-To-All MOTR Approach for Robust Video Text Tracking with Incomplete Annotations

Introduction

Main Results

DSText 2023

Notes

Visualization

Installation

Requirements

Usage

Dataset preparation

Training

Inference on DSText Test Set

About

Releases

Packages

Languages

Paige-Norton/HTAMotr

Folders and files

Latest commit

History

Repository files navigation

HTAMotr: A Novel Half-To-All MOTR Approach for Robust Video Text Tracking with Incomplete Annotations

Introduction

Main Results

DSText 2023

Notes

Visualization

Installation

Requirements

Usage

Dataset preparation

Training

Inference on DSText Test Set

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages