Skip to content

QuantaAlpha/VideoDR-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VideoDR: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Task Paradigm Benchmark

VideoDR Task Overview


🔥 News

  • 2026.01.15 🌐 Our Official LeaderBoard is now live! Welcome to test and submit.
  • 2026.01.14 🏷️ We update VideoDR.csv with additional Category and Difficulty labels.
  • 2026.01.12 🌟 We release VideoDR benchmark data. You can download it from there.
  • 2026.01.11 🌟 We are very proud to launch VideoDR, the first-ever video deep research benchmark!

🎥 Video Deep Research | VideoDR

🚀 VideoDR is the first video deep research benchmark!

It is designed to evaluate the capability of Video Agent to perform complex reasoning based on video content while leveraging the Open Web 🌐.

👇 VideoDR requires the Agent to possess the following core capabilities:

  • 🎞️ Multi-frame Visual Cues: Accurately identify continuous key information from multiple video frames.
  • 🌍 Interactive Search: Interact with a browser environment to perform multi-hop deep search.
  • 🧩 Evidence Synthesis: Combine video clues and web evidence to provide a verifiable factual answer.

🔧 Evaluation Tools

We provide LLM-based evaluation tools (llm_as_judge) for model evaluation and failure analysis.

Installation

cd llm_as_judge
pip install -r requirements.txt

Configuration

Create a .env file in the llm_as_judge directory:

LLM_BASE_URL=your_api_base_url
LLM_API_KEY=your_api_key

LLM as Judge

python llm_as_judge/src/judge_answers.py \
    --workers 5 \
    --predictions llm_as_judge/data/predictions.json

Failure analysis

python llm_as_judge/src/analyze_failure_types.py \
    --excel_file llm_as_judge/data/Video-LLM.xlsx \
    --trace_dir results/traces \
    --max_workers 4

📚 Citation

If you find this benchmark useful for your research, please cite:

@article{liu2026watching,
  title={Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning},
  author={Liu, Chengwen and Yu, Xiaomin and Chang, Zhuoyue and Huang, Zhe and Zhang, Shuo and Lian, Heng and Wang, Kunyi and Xu, Rui and Hu, Sen and Hou, Jianheng and others},
  journal={arXiv preprint arXiv:2601.06943},
  year={2026}
}

✉️ Contact

Have a question? If you have any questions or just want to say hi, feel free to reach out:

📧 Email: yuxm02@gmail.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages