Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
anet_tsp.py		anet_tsp.py
charades_i3d_rgb.py		charades_i3d_rgb.py
charades_videomae_l.py		charades_videomae_l.py
ego4d_egovlp.py		ego4d_egovlp.py
ego4d_internvideo.py		ego4d_internvideo.py
ego4d_slowfast.py		ego4d_slowfast.py
epic_kitchens_slowfast_noun.py		epic_kitchens_slowfast_noun.py
epic_kitchens_slowfast_verb.py		epic_kitchens_slowfast_verb.py
fineaction_videomae_h.py		fineaction_videomae_h.py
fineaction_videomaev2_g.py		fineaction_videomaev2_g.py
hacs_slowfast.py		hacs_slowfast.py
multithumos_i3d.py		multithumos_i3d.py
multithumos_i3d_rgb.py		multithumos_i3d_rgb.py
thumos_i3d.py		thumos_i3d.py

README.md

ActionFormer

ActionFormer: Localizing Moments of Actions with Transformers
Chen-Lin Zhang, Jianxin Wu, Yin Li

Abstract

Self-attention based Transformer models have demonstrated impressive results for image classification and object detection, and more recently for video understanding. Inspired by this success, we investigate the application of Transformer networks for temporal action localization in videos. To this end, we present ActionFormer -- a simple yet powerful model to identify actions in time and recognize their categories in a single shot, without using action proposals or relying on pre-defined anchor windows. ActionFormer combines a multiscale feature representation with local self-attention, and uses a light-weighted decoder to classify every moment in time and estimate the corresponding action boundaries. We show that this orchestrated design results in major improvements upon prior works. Without bells and whistles, ActionFormer achieves 71.0% mAP at tIoU=0.5 on THUMOS14, outperforming the best prior model by 14.1 absolute percentage points. Further, ActionFormer demonstrates strong results on ActivityNet 1.3 (36.6% average mAP) and EPIC-Kitchens 100 (+13.5% average mAP over prior works).

Results and Models

ActivityNet-1.3 with CUHK classifier.

Features	mAP@0.5	mAP@0.75	mAP@0.95	ave. mAP	Config	Download
TSP	55.08	38.27	8.91	37.07	config	model \| log

THUMOS-14

Features	mAP@0.3	mAP@0.4	mAP@0.5	mAP@0.6	mAP@0.7	ave. mAP	Config	Download
I3D	83.78	80.06	73.16	60.46	44.72	68.44	config	model \| log

HACS

Features	mAP@0.5	mAP@0.75	mAP@0.95	ave. mAP	Config	Download
SlowFast	56.18	37.97	11.05	37.71	config	model \| log

Epic-Kitchens-100

Subset	Features	mAP@0.1	mAP@0.2	mAP@0.3	mAP@0.4	mAP@0.5	ave. mAP	Config	Download
Noun	SlowFast	25.78	24.73	22.83	20.84	17.45	22.33	config	model \| log
Verb	SlowFast	27.68	26.79	25.62	24.06	20.48	24.93	config	model \| log

Ego4D-MQ

Features	mAP@0.1	mAP@0.2	mAP@0.3	mAP@0.4	mAP@0.5	ave. mAP	Config	Download
SlowFast	20.90	18.12	15.81	14.25	12.21	16.26	config	model \| log
EgoVLP	27.79	24.97	22.37	19.25	16.25	22.13	config	model \| log
InternVideo	32.59	30.28	27.53	25.09	22.13	27.52	config	model \| log

MultiTHUMOS

Features	mAP@0.2	mAP@0.5	mAP@0.7	ave. mAP (0.1:0.9:0.1)	Config	Download
I3D (rgb)	53.52	39.05	19.69	34.02	config	model \| log
I3D (rgb+flow)	60.18	45.01	24.56	39.19	config	model \| log

Charades

Features	mAP@0.2	mAP@0.5	mAP@0.7	ave. mAP (0.1:0.9:0.1)	Config	Download
I3D (rgb)	31.33	23.07	13.60	20.60	config	model \| log
VideoMAE-L	38.87	29.67	17.52	26.04	config	model \| log

FineAction with InternVideo classifier

Features	mAP@0.5	mAP@0.75	mAP@0.95	ave. mAP	Config	Download
VideoMAE_H_K700	29.44	19.46	5.06	19.32	config	model \| log
VideoMAEv2_g_K710	29.85	19.72	5.17	19.62	config	model \| log

Train

You can use the following command to train a model.

torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/train.py ${CONFIG_FILE} [optional arguments]

Example: train ActionFormer on ActivityNet dataset.

torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/train.py configs/actionformer/anet_tsp.py

For more details, you can refer to the Training part in the Usage.

Test

You can use the following command to test a model.

torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/test.py ${CONFIG_FILE} --checkpoint ${CHECKPOINT_FILE} [optional arguments]

Example: test ActionFormer on ActivityNet dataset.

torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/test.py configs/actionformer/anet_tsp.py --checkpoint exps/anet/actionformer_tsp/gpu1_id0/checkpoint/epoch_14.pth

For more details, you can refer to the Test part in the Usage.

Citation

@inproceedings{zhang2022actionformer,
  title={Actionformer: Localizing moments of actions with transformers},
  author={Zhang, Chen-Lin and Wu, Jianxin and Li, Yin},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part IV},
  pages={492--510},
  year={2022},
  organization={Springer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

actionformer

actionformer

README.md

ActionFormer

Abstract

Results and Models

Train

Test

Citation

Files

actionformer

Directory actions

More options

Directory actions

More options

Latest commit

History

actionformer

Folders and files

parent directory

README.md

ActionFormer

Abstract

Results and Models

Train

Test

Citation