- Introduction
- Quick Start Guide
- Model User Guide
- Community Contribution
- Training
- Join Us!
- Acknowledgement
LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. The model is trained on a large-scale dataset of diverse videos and can generate high-resolution videos with realistic and diverse content.
The model is accessible right away via following links:
The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2.
git clone https://github.com/Lightricks/LTX-Video.git
cd LTX-Video
# create env
python -m venv env
source env/bin/activate
python -m pip install -e .\[inference-script\]
Then, download the model from Hugging Face
from huggingface_hub import hf_hub_download
model_path = 'PATH' # The local directory to save downloaded checkpoint
hf_hub_download(repo_id="Lightricks/LTX-Video", filename="ltx-video-2b-v0.9.safetensors", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
To use our model, please follow the inference code in inference.py:
python inference.py --ckpt_path 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
python inference.py --ckpt_path 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.
To use our model with the Diffusers Python library, check out the official documentation.
When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:
- Start with main action in a single sentence
- Add specific details about movements and gestures
- Describe character/object appearances precisely
- Include background and environment details
- Specify camera angles and movements
- Describe lighting and colors
- Note any changes or sudden events
- See examples for more inspiration.
- Resolution Preset: Higher resolutions for detailed scenes, lower for faster generation and simpler scenes. The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames. The model works best on resolutions under 720 x 1280 and number of frames below 257
- Seed: Save seed values to recreate specific styles or compositions you like
- Guidance Scale: 3-3.5 are the recommended values
- Inference Steps: More steps (40+) for quality, fewer steps (20-30) for speed
A community project providing additional nodes for enhanced control over the LTX Video model. It includes implementations of advanced techniques like RF-Inversion, RF-Edit, FlowEdit, and more. These nodes enable workflows such as Image and Video to Video (I+V2V), enhanced sampling via Spatiotemporal Skip Guidance (STG), and interpolation with precise frame settings.
- Repository: ComfyUI-LTXTricks
- Features:
- ๐ RF-Inversion: Implements RF-Inversion with an example workflow here.
- โ๏ธ RF-Edit: Implements RF-Solver-Edit with an example workflow here.
- ๐ FlowEdit: Implements FlowEdit with an example workflow here.
- ๐ฅ I+V2V: Enables Video to Video with a reference image. Example workflow.
- โจ Enhance: Partial implementation of STGuidance. Example workflow.
- ๐ผ๏ธ Interpolation and Frame Setting: Nodes for precise control of latents per frame. Example workflow.
LTX-VideoQ8 is an 8-bit optimized version of LTX-Video, designed for faster performance on NVIDIA ADA GPUs.
- Repository: LTX-VideoQ8
- Features:
- ๐ Up to 3X speed-up with no accuracy loss
- ๐ฅ Generate 720x480x121 videos in under a minute on RTX 4060 (8GB VRAM)
- ๐ ๏ธ Fine-tune 2B transformer models with precalculated latents
- Community Discussion: Reddit Thread
...is welcome! If you have a project or tool that integrates with LTX-Video, please let us know by opening an issue or pull request.
Diffusers implemented LoRA support, with a training script for fine-tuning. More information and training script in finetrainers.
An experimental training framework with pipeline parallelism, enabling fine-tuning of large models like LTX-Video across multiple GPUs.
- Repository: Diffusion-Pipe
- Features:
- ๐ ๏ธ Full fine-tune support for LTX-Video using LoRA
- ๐ Useful metrics logged to Tensorboard
- ๐ Training state checkpointing and resumption
- โก Efficient pre-caching of latents and text embeddings for multi-GPU setups
Want to work on cutting-edge AI research and make a real impact on millions of users worldwide?
At Lightricks, an AI-first company, weโre revolutionizing how visual content is created.
If you are passionate about AI, computer vision, and video generation, we would love to hear from you!
Please visit our careers page for more information.
We are grateful for the following awesome projects when implementing LTX-Video:
- DiT and PixArt-alpha: vision transformers for image generation.
๐ Our tech report is out! If you find our work helpful, please โญ๏ธ star the repository and cite our paper.
@article{HaCohen2024LTXVideo,
title={LTX-Video: Realtime Video Latent Diffusion},
author={HaCohen, Yoav and Chiprut, Nisan and Brazowski, Benny and Shalem, Daniel and Moshe, Dudu and Richardson, Eitan and Levin, Eran and Shiran, Guy and Zabari, Nir and Gordon, Ori and Panet, Poriya and Weissbuch, Sapir and Kulikov, Victor and Bitterman, Yaki and Melumian, Zeev and Bibi, Ofir},
journal={arXiv preprint arXiv:2501.00103},
year={2024}
}