Upload Tasks: CinePile #343

JARVVVIS · 2024-10-22T23:25:12Z

Added support for evaluating models on CinePIle.

CinePile is a question-answering-based, long-form video understanding dataset. It has been created using advanced large language models (LLMs) with human-in-the-loop pipeline leveraging existing human-generated raw data. It consists of approximately 300,000 training data points and 5,000 test data points.

kcz358

Hi @JARVVVIS , all files look good. Can you post a screenshot of the evaluation result that you use lmms-eval on your dataset for further reference? Then I will merge this PR. Thank you!

JARVVVIS · 2024-10-23T22:48:52Z

For command:

python3 -m accelerate.commands.launch     --num_processes=1     -m lmms_eval     --model video_llava  --tasks cinepile     --batch_size 1     --log_samples     --log_samples_suffix video_llava_cinepile     --output_path ./logs/ --verbosity=DEBUG

* Added CinePile * corrected linting errors

JohnlNguyen · 2024-11-08T22:10:50Z

@JARVVVIS I keep getting Sign in to confirm you’re not a bot. This helps protect our community. Learn more. How do I sign in such that I can download the videos?

JARVVVIS · 2024-11-09T14:16:18Z

Hi @JohnlNguyen.

I think this might be occurring due to how LMMS handles downloading YouTube videos here.

One way I addressed this issue was by using a more robust method to download the videos and then manually moving them to the appropriate location in the cache where LLMS expects them. Specifically, LMMS downloads each video at -- video_path = os.path.join(hf_home, task), where hf_home = os.getenv("HF_HOME", "~/.cache/huggingface/") and task = 'cinepile' (as defined in the config). However, it's not the cleanest solution since LLMS will still attempt to download the videos. It doesn't check for their existence based on the paths but rather relies on a {task}_download_status.json file. Once LLMS completes its download attempts for all videos, it should still function correctly. Alternatively, you could modify the lmms_eval/api/task.py file to handle this more effectively.

Sharing below the script that works well for me for downloading videos, please add the ROOT_DIR, and subdir variables appropriately based on the target download location:

import os
import tqdm
from datasets import load_dataset
import yt_dlp


def download_video(video_url, filename, root, subdir):
   """
   Downloads a video from the given URL using yt_dlp and saves it to the specified root directory.
   """
   dir_path = f"{root}/{subdir}"
   os.makedirs(dir_path, exist_ok=True)

   output_path = f"{dir_path}/{filename}.mp4"

   ydl_opts = {
       "format": "bestvideo[height<=224][ext=mp4]+bestaudio[ext=m4a]/best[height<=224][ext=mp4]/best[ext=mp4]/best",
       "outtmpl": output_path,
       "merge_output_format": "mp4",
   }

   try:
       with yt_dlp.YoutubeDL(ydl_opts) as ydl:
           print(f"Attempting to download: {video_url}")
           print(f"Saving path: {output_path}")
           ydl.download([video_url])

       if os.path.exists(output_path):
           print(f"Downloaded: {output_path}; {video_url}")
           return output_path, True
       else:
           print(f"Failed to download {video_url}.")
           return None, False

   except Exception as e:
       print(f"Exception during download of {video_url}: {e}")
       return None, False


def main():
   """
   Main function to download videos listed in the CinePile dataset.
   """
   cinepile = load_dataset("tomg-group-umd/cinepile", split="test")
   eval_df = cinepile.to_pandas()

   ROOT_DIR = (
       ""  ## TODO: Set this to the root directory where you want to save the videos
   )
   subdir = ""  ## TODO: Set this to the subdirectory where you want to save the videos

   for idx, row in tqdm.tqdm(eval_df.iterrows(), total=len(eval_df), leave=True):
       yt_link = row["yt_clip_link"]
       video_filename = f"{row['movie_name']}_{yt_link.split('/')[-1]}"
       local_video_path = f"{ROOT_DIR}/{subdir}/{video_filename}.mp4"

       try:
           if not os.path.isfile(local_video_path):
               video_path, did_download = download_video(
                   yt_link, video_filename, root=ROOT_DIR, subdir=subdir
               )
               assert video_path is not None and did_download
           else:
               print(f"Skipping download. Video already exists at {local_video_path}.")
       except Exception as e:
           print(f"Got Exception for video {yt_link}: {e}")


if __name__ == "__main__":
   main()

yt-dlp version -- "yt-dlp==2024.8.6"

* Added CinePile * corrected linting errors

JARVVVIS added 3 commits October 23, 2024 01:03

Added CinePile

9793a61

synchronized

be851c9

corrected linting errors

08c9bda

kcz358 approved these changes Oct 23, 2024

View reviewed changes

kcz358 merged commit 100ab6f into EvolvingLMMs-Lab:main Oct 24, 2024
1 check passed

KairuiHu pushed a commit that referenced this pull request Oct 24, 2024

Upload Tasks: CinePile (#343)

655cfed

* Added CinePile * corrected linting errors

ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024

Upload Tasks: CinePile (EvolvingLMMs-Lab#343)

ae84ec0

* Added CinePile * corrected linting errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload Tasks: CinePile #343

Upload Tasks: CinePile #343

JARVVVIS commented Oct 22, 2024

kcz358 left a comment

JARVVVIS commented Oct 23, 2024

JohnlNguyen commented Nov 8, 2024

JARVVVIS commented Nov 9, 2024

Upload Tasks: CinePile #343

Upload Tasks: CinePile #343

Conversation

JARVVVIS commented Oct 22, 2024

Added support for evaluating models on CinePIle.

kcz358 left a comment

Choose a reason for hiding this comment

JARVVVIS commented Oct 23, 2024

JohnlNguyen commented Nov 8, 2024

JARVVVIS commented Nov 9, 2024