Add custom dataset fine-tuning workflow for Cosmos Reason 1 #33

sauravn-hub · 2025-11-07T23:39:27Z

Summary

This PR adds a complete workflow for fine-tuning Cosmos Reason 1 on custom datasets with local video files and human-labeled physical plausibility scores.

Added Components

Dataset preparation scripts (create_dataset_with_split.py, add_conversations_to_dataset.py)
Model evaluation script with HTML report generation (evaluate_model.py)
Training configuration for custom datasets (custom_dataset_sft_config.toml)
Documentation integrated into the existing physical plausibility recipe

Features

Stratified train/eval splitting with label balancing
Label scaling and distribution management (binary to 1-5 scale)
Integration with Cosmos Transfer-generated videos
Comprehensive evaluation metrics (accuracy, MAE, F1, confusion matrix)
Generic, reusable examples for easy adaptation

Context

Extends the existing VideoPhy-2 recipe in the physical plausibility post-training guide to enable practitioners to fine-tune on domain-specific video quality assessment tasks. The workflow follows cookbook conventions where users copy scripts to their cosmos-reason1 workspace.

Files Changed

docs/recipes/post_training/reason1/physical-plausibility-check/post_training.md - Added custom dataset section
docs/recipes/post_training/reason1/physical-plausibility-check/assets/custom_dataset_sft_config.toml - New training config
scripts/examples/reason1/physical-plausibility-check/create_dataset_with_split.py - New dataset prep script
scripts/examples/reason1/physical-plausibility-check/add_conversations_to_dataset.py - New format converter
scripts/examples/reason1/physical-plausibility-check/evaluate_model.py - New evaluation script

This contribution adds a complete workflow for fine-tuning Cosmos Reason 1 on custom datasets with local video files and human-labeled quality scores. Added components: - Dataset preparation scripts for creating train/eval splits - Conversation format conversion for SFT training - Model evaluation script with HTML report generation - Training configuration for custom datasets - Documentation in physical plausibility recipe The workflow supports: - Stratified train/eval splitting with label balancing - Label scaling and distribution management - Integration with Cosmos Transfer-generated videos - Comprehensive evaluation metrics and reporting This extends the existing VideoPhy-2 recipe to enable practitioners to fine-tune on domain-specific video quality assessment tasks. Signed-off-by: Saurav Nanda <sauravn@nvidia.com>

sauravn-hub marked this pull request as draft November 7, 2025 23:40

sauravn-hub marked this pull request as ready for review November 7, 2025 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add custom dataset fine-tuning workflow for Cosmos Reason 1 #33

Add custom dataset fine-tuning workflow for Cosmos Reason 1 #33

Uh oh!

sauravn-hub commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add custom dataset fine-tuning workflow for Cosmos Reason 1 #33

Are you sure you want to change the base?

Add custom dataset fine-tuning workflow for Cosmos Reason 1 #33

Uh oh!

Conversation

sauravn-hub commented Nov 7, 2025

Summary

Added Components

Features

Context

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant