Skip to content

Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

License

Notifications You must be signed in to change notification settings

SCZwangxiao/video-ReTaKe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributions

  • The training-free RETAKE is the first to jointly model temporal and knowledge redundancy for long video understanding, which enables 4x longer video sequence with less than 1% performance drop.

  • We propose a novel keyframe selection method DPSelect to reduce low-level temporal redundancy, and a novel KV cache compression method PivotKV to reduce high-level knowledge redundancy in long videos.

Prepare environment

# For GPU users
conda env create -f environment.yaml

# For NPU users
# conda env create -f environment_npu.yaml

apt-get install ffmpeg # NOTE: Quick demo does not require ffmpeg

Quick demo

  • Step 1: Change hf_qwen2vl7b_path in ./demo.py into your path to Qwen2-VL-7B-Instruct. Note that for NPU users, you need also change config_path into 'configs/retake_demo_npu.yaml'.

  • Step 2: Run demo

python demo.py

Reproduce ReTaKe

  • Prepare the datasets following the docs.
  • Run script
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_videomme.yaml 8
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_mlvu.yaml 8
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_lvbench.yaml 8

The above script perform inference and evaluation all in one. Results can be found in ./results

About

Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published