Skip to content

TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

Notifications You must be signed in to change notification settings

sming256/TimeLoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

TimeLoc

This is the official implementation of the paper TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos.

Abstract

Temporal localization in untrimmed videos, which aims to identify specific timestamps, is crucial for video understanding but remains challenging. This task encompasses several subtasks, including temporal action localization, temporal video grounding, moment retrieval, and generic event boundary detection. Existing methods in each subfield are typically designed for specific tasks and lack generalizability across domains. In this paper, we propose TimeLoc, a unified end-to-end framework for timestamp localization that can handle multiple tasks. First, our approach employs a simple yet effective one-stage localization model that supports text queries as input and multiple actions as output. Second, we jointly train the video encoder and localization model in an end-to-end manner. To efficiently process long videos, we introduce temporal chunking, enabling the handling of videos with over 30k frames. Third, we find that fine-tuning pre-trained text encoders with a multi-stage training strategy further enhances text-conditioned localization. TimeLoc achieves state-of-the-art results across multiple benchmarks.

About

TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

Topics

Resources

Stars

Watchers

Forks