Skip to content

Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

License

Notifications You must be signed in to change notification settings

nvidia-cosmos/cosmos-predict2.5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA Cosmos

Product Website  | 🤗 Hugging Face  | Paper  | Paper Website | Cosmos Cookbook

NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.

Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason.

News!

  • [November 8, 2025] Added a new pedagogical README in docs/ detailing the Rectified Flow formulation and its integration with the UniPC solver.
  • [November 7, 2025] We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.
  • [October 28, 2025] We added Cosmos Cookbook, a collection of step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
  • [October 28, 2025] We fixed action-conditioned inference bug, improved LoRA post-training and unified across text2world, image2world, video2world, sped up tokenization with CP + torch.compile for Transfer2, updated guardrails, added multi-storage support, and introduced the cosmos-oss package.
  • [October 21, 2025] We added LoRA (Low-Rank Adaptation) post-training for both Video2World and Text2World, and gr00t-dreams dataset for post-training. Also, updated Docker base image version, and Gradio related documentation.
  • [October 14, 2025] We released the Cosmos-Predict2.5 robot/action-cond: Inference Guide and Post-Training Guide. Also released Auto Multview Post-Training.
  • [October 6, 2025] We released Cosmos-Predict2.5 and Cosmos-Transfer2.5 - the next generation of our world simulation models!

Cosmos-Predict2.5

We introduce Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.

Image2World

Input prompt A nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple double-decker buses are parked under the glow of overhead lights, with a central bus labeled '87D' facing forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban night scene.
Input image Output video
Input image
bus_terminal.mp4

Video2World

Input prompt A robotic arm, primarily white with black joints and cables, is shown in a clean, modern indoor setting with a white tabletop. The arm, equipped with a gripper holding a small, light green pitcher, is positioned above a clear glass containing a reddish-brown liquid and a spoon. The robotic arm is in the process of pouring a transparent liquid into the glass. To the left of the pitcher, there is an opened jar with a similar reddish-brown substance visible through its transparent body. In the background, a vase with white flowers and a brown couch are partially visible, adding to the contemporary ambiance. The lighting is bright, casting soft shadows on the table. The robotic arm's movements are smooth and controlled, demonstrating precision in its task. As the video progresses, the robotic arm completes the pour, leaving the glass half-filled with the reddish-brown liquid. The jar remains untouched throughout the sequence, and the spoon inside the glass remains stationary. The other robotic arm on the right side also stays stationary throughout the video. The final frame captures the robotic arm with the pitcher finishing the pour, with the glass now filled to a higher level, while the pitcher is slightly tilted but still held securely by the gripper.
Input Video Output Video
robot_pouring.mp4
robot_pouring.mp4

Cosmos-Predict2.5 Model Family

Our world simulation models, Cosmos-Predict's fundamental capability is predicting future world states in video form supporting multimodal inputs. We have open sourced both pre-trained foundation models as well as post-trained models accelerating multiple domains. Please check back as we continue to add more specialized models and capabilities to the Predict family!

Cosmos-Predict2.5: Base checkpoints, trained from the ground up for Physical AI and robotics.

Cosmos-Predict2.5/auto/multiview: Specialized checkpoints, post-trained for Autonomous Vehicle applications.

Model Name Capability Input
Cosmos-Predict2.5 base
Cosmos-Predict2.5-2B/pre-trained pre-trained base text + image or video
Cosmos-Predict2.5-2B/post-trained post-trained base text + image or video
Cosmos-Predict2.5 auto
Cosmos-Predict2.5-2B/auto/multiview driving, 7-camera view text + image or video
Cosmos-Predict2.5-2B robot
Cosmos-Predict2.5-2B/robot/action-cond robotic, action-conditioned action

User Guide

Contributing

We thrive on community collaboration! NVIDIA-Cosmos wouldn't be where it is without contributions from developers like you. Check out our Contributing Guide to get started, and share your feedback through issues.

Big thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

About

Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages