Skip to content

Conversation

@sauravn-hub
Copy link

Summary

This PR adds a complete workflow for fine-tuning Cosmos Reason 1 on custom datasets with local video files and human-labeled physical plausibility scores.

Added Components

  • Dataset preparation scripts (create_dataset_with_split.py, add_conversations_to_dataset.py)
  • Model evaluation script with HTML report generation (evaluate_model.py)
  • Training configuration for custom datasets (custom_dataset_sft_config.toml)
  • Documentation integrated into the existing physical plausibility recipe

Features

  • Stratified train/eval splitting with label balancing
  • Label scaling and distribution management (binary to 1-5 scale)
  • Integration with Cosmos Transfer-generated videos
  • Comprehensive evaluation metrics (accuracy, MAE, F1, confusion matrix)
  • Generic, reusable examples for easy adaptation

Context

Extends the existing VideoPhy-2 recipe in the physical plausibility post-training guide to enable practitioners to fine-tune on domain-specific video quality assessment tasks. The workflow follows cookbook conventions where users copy scripts to their cosmos-reason1 workspace.

Files Changed

  • docs/recipes/post_training/reason1/physical-plausibility-check/post_training.md - Added custom dataset section
  • docs/recipes/post_training/reason1/physical-plausibility-check/assets/custom_dataset_sft_config.toml - New training config
  • scripts/examples/reason1/physical-plausibility-check/create_dataset_with_split.py - New dataset prep script
  • scripts/examples/reason1/physical-plausibility-check/add_conversations_to_dataset.py - New format converter
  • scripts/examples/reason1/physical-plausibility-check/evaluate_model.py - New evaluation script

This contribution adds a complete workflow for fine-tuning Cosmos Reason 1
on custom datasets with local video files and human-labeled quality scores.

Added components:
- Dataset preparation scripts for creating train/eval splits
- Conversation format conversion for SFT training
- Model evaluation script with HTML report generation
- Training configuration for custom datasets
- Documentation in physical plausibility recipe

The workflow supports:
- Stratified train/eval splitting with label balancing
- Label scaling and distribution management
- Integration with Cosmos Transfer-generated videos
- Comprehensive evaluation metrics and reporting

This extends the existing VideoPhy-2 recipe to enable practitioners
to fine-tune on domain-specific video quality assessment tasks.

Signed-off-by: Saurav Nanda <sauravn@nvidia.com>
@sauravn-hub sauravn-hub marked this pull request as draft November 7, 2025 23:40
@sauravn-hub sauravn-hub marked this pull request as ready for review November 7, 2025 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant