Product Website | 🤗 Hugging Face | Paper | Paper Website | Cosmos Cookbook
NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.
Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason.
- [November 8, 2025] Added a new pedagogical README in docs/ detailing the Rectified Flow formulation and its integration with the UniPC solver.
- [November 7, 2025] We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.
- [October 28, 2025] We added Cosmos Cookbook, a collection of step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
- [October 28, 2025] We fixed action-conditioned inference bug, improved LoRA post-training and unified across text2world, image2world, video2world, sped up tokenization with CP + torch.compile for Transfer2, updated guardrails, added multi-storage support, and introduced the cosmos-oss package.
- [October 21, 2025] We added LoRA (Low-Rank Adaptation) post-training for both Video2World and Text2World, and gr00t-dreams dataset for post-training. Also, updated Docker base image version, and Gradio related documentation.
- [October 14, 2025] We released the Cosmos-Predict2.5 robot/action-cond: Inference Guide and Post-Training Guide. Also released Auto Multview Post-Training.
- [October 6, 2025] We released Cosmos-Predict2.5 and Cosmos-Transfer2.5 - the next generation of our world simulation models!
We introduce Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.
Input prompt
A nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple double-decker buses are parked under the glow of overhead lights, with a central bus labeled '87D' facing forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban night scene.| Input image | Output video |
|---|---|
![]() |
bus_terminal.mp4 |
Input prompt
A robotic arm, primarily white with black joints and cables, is shown in a clean, modern indoor setting with a white tabletop. The arm, equipped with a gripper holding a small, light green pitcher, is positioned above a clear glass containing a reddish-brown liquid and a spoon. The robotic arm is in the process of pouring a transparent liquid into the glass. To the left of the pitcher, there is an opened jar with a similar reddish-brown substance visible through its transparent body. In the background, a vase with white flowers and a brown couch are partially visible, adding to the contemporary ambiance. The lighting is bright, casting soft shadows on the table. The robotic arm's movements are smooth and controlled, demonstrating precision in its task. As the video progresses, the robotic arm completes the pour, leaving the glass half-filled with the reddish-brown liquid. The jar remains untouched throughout the sequence, and the spoon inside the glass remains stationary. The other robotic arm on the right side also stays stationary throughout the video. The final frame captures the robotic arm with the pitcher finishing the pour, with the glass now filled to a higher level, while the pitcher is slightly tilted but still held securely by the gripper.| Input Video | Output Video |
|---|---|
robot_pouring.mp4 |
robot_pouring.mp4 |
Our world simulation models, Cosmos-Predict's fundamental capability is predicting future world states in video form supporting multimodal inputs. We have open sourced both pre-trained foundation models as well as post-trained models accelerating multiple domains. Please check back as we continue to add more specialized models and capabilities to the Predict family!
Cosmos-Predict2.5: Base checkpoints, trained from the ground up for Physical AI and robotics.
Cosmos-Predict2.5/auto/multiview: Specialized checkpoints, post-trained for Autonomous Vehicle applications.
| Model Name | Capability | Input |
|---|---|---|
| Cosmos-Predict2.5 base | ||
| Cosmos-Predict2.5-2B/pre-trained | pre-trained base | text + image or video |
| Cosmos-Predict2.5-2B/post-trained | post-trained base | text + image or video |
| Cosmos-Predict2.5 auto | ||
| Cosmos-Predict2.5-2B/auto/multiview | driving, 7-camera view | text + image or video |
| Cosmos-Predict2.5-2B robot | ||
| Cosmos-Predict2.5-2B/robot/action-cond | robotic, action-conditioned | action |
We thrive on community collaboration! NVIDIA-Cosmos wouldn't be where it is without contributions from developers like you. Check out our Contributing Guide to get started, and share your feedback through issues.
Big thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

