DCVC-RT: Towards Practical Real-Time Neural Video Compression

CVPR2025

DCVC-RT is the first neural video codec (NVC) achieving 100+ FPS 1080p coding and 4K real-time coding with a comparable compression ratio with ECM. Beyond this, DCVC-RT pursue a more practical neural video codec solution and supports various practical features, including:

Wide bitrate range in single model: A single model enables continuous and controllable bitrate adjustments. DCVC-RT can compress at a wide bitrate range for different coding scenarios.
Rate control: By adjusting quantization parameters, DCVC-RT effectively supports dynamic and various network conditions during real communication scenario.
Unified YUV and RGB coding: While DCVC-RT is primarily optimized for the widely adopted YUV format, it can seamlessly adapt to RGB content coding.

We are continuously exploring additional practical functionalities and will provide further NVC solutions in this repository.

📖 Overview

Welcome to the official implementation of DCVC-RT and the broader DCVC-family models. The DCVC (Deep Contextual Video Compression) family is designed to push the boundaries of high-performance practical neural video codecs, delivering cutting-edge compression efficiency, real-time capabilities, and versatile functionalities.

🚀 In this section, we provide a brief overview of DCVC-RT. For an in-depth understanding, we encourage you to read our paper.

🔨 Ready to get started? Head over to the usage to start using this repo.

📄 If you find our work helpful, feel free to cite us. We truly appreciate your support.

Abstract

We introduce a practical real-time neural video codec (NVC) designed to deliver high compression ratio, low latency and broad versatility. In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls. While most efficient NVCs prioritize reducing computational cost, we identify operational cost as the primary bottleneck to achieving higher coding speed. Leveraging this insight, we introduce a set of efficiency-driven design improvements focused on minimizing operational costs. Specifically, we employ implicit temporal modeling to eliminate complex explicit motion modules, and use single low-resolution latent representations rather than progressive downsampling. These innovations significantly accelerate NVC without sacrificing compression quality. Additionally, we implement model integerization for consistent cross-device coding and a module-bank-based rate control scheme to improve practical adaptability. Experiments show our proposed DCVC-RT achieves an impressive average encoding/decoding speed at 125.2/112.8 fps (frames per second) for 1080p video, while saving an average of 21% in bitrate compared to H.266/VTM.

Video Compression Performance

Bit saving over VTM-17.0 (UVG all frames with single intra-frame setting (i.e. intra-period = –1) and YUV420 colorspace.)

The BD-Rate and 1080p encoding/decoding speed on NVIDIA A100 GPU

The complexity analysis and encoding/decoding speed evaluation across various resolutions and devices.

Image Compression Performance

Notably, the intra-frame codec in DCVC-RT also delivers impressive performance. On Kodak, DCVC-RT-Intra achieves an 11.1% bitrate reduction compared to VTM, with a over 10× faster decoding speed than previous state-of-the-art learned image codecs. For encoding, DCVC-RT-Intra also offers a similar speed advantage. For 1080p content, DCVC-RT-Intra achieves an impressive encoding/decoding speed of 40.7 FPS / 44.2 FPS on an NVIDIA A100 GPU.

🔨 Usage

For each step, click it to expand and view details.

Prerequisites

Python 3.12 and conda, get Conda
CUDA 12.6 (other versions may also work. Make sure the CUDA version matches with pytorch.)
pytorch (We have tested that pytorch-2.6 works. Other versions may also work.)

Environment

conda create -n $YOUR_PY_ENV_NAME python=3.12
conda activate $YOUR_PY_ENV_NAME

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Build the project

Please build the C++ code to support bitstream writing and customized CUDA kernels to fuse operations.

sudo apt-get install cmake g++ ninja-build
conda activate $YOUR_PY_ENV_NAME
cd ./src/cpp/
pip install .
cd ../layers/extensions/inference/
pip install .

If the CUDA kernels fail to load successfully in infererence, the standard output will display: cannot import cuda implementation for inference, fallback to pytorch.

CPU performance scaling

Note that the arithmetic coding runs on the CPU, please make sure your CPU runs at high performance while writing the actual bitstream. Otherwise, the arithmetic coding may take a long time.

Check the CPU frequency by

grep -E '^model name|^cpu MHz' /proc/cpuinfo

Run the following command to maximum CPU frequency

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Run the following command to recover the default frequency

echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Pretrained models

Download our pretrained models and put them into ./checkpoints folder.
There are 2 models, one for image coding and the other for video coding.
As a backup, all the pretrained models could be found here.

Test dataset

We support arbitrary original resolution. The input video resolution will be padded automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR) is calculated at original resolution.

YUV 420 content

Put *.yuv in the folder structure similar to the following structure.

/media/data/HEVC_B/
    - BQTerrace_1920x1080_60.yuv
    - BasketballDrive_1920x1080_50.yuv
    - ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...

The dataset structure can be seen in dataset_config_example_yuv420.json.

RGB content

We highly suggest testing YUV420 content. To test RGB content, please refer to the DCVC-FM folder.

Test the models

Example to test pretrained model with four rate points:

 python test_video.py --model_path_i ./checkpoints/cvpr2025_image.pth.tar --model_path_p ./checkpoints/cvpr2025_video.pth.tar --rate_num 4 --test_config ./dataset_config_example_yuv420.json --cuda 1 -w 1 --write_stream 1 --force_zero_thres 0.12 --output_path output.json --force_intra_period -1 --reset_interval 64 --force_frame_num -1 --check_existing 0 --verbose 0

It is recommended that the -w number is equal to your GPU number.

You can also specify different --rate_num values (2~64) to test finer bitrate adjustment.

To measure coding speed, you can set --verbose value to 1 (sequence-level measuring) or 2 (frame-level measuring). This will automatically measure encoding and decoding speeds, print them in the terminal, and record the average speeds in avg_frame_encoding_time and avg_frame_decoding_time in the output JSON file.

Note that test_time is the total testing time for the entire sequence, which includes I/O time, encoding time, decoding time, and distortion calculation time. The overhead from I/O and distortion calculation is much larger than the encoding/decoding time itself, so we exclude these overheads to measure the precise coding time.
Additionally, please make sure time.time() provides sufficient precision on the tested platform. For instance, our experience is that the precision is adequate on our Ubuntu device, but insufficient on our Windows device.

On the comparison

Please note that different methods may use different configurations to test different models, such as

Source video may be different, e.g., cropped or padded to the desired resolution.
Intra period may be different, e.g., 96, 32, 12, or 10.
Number of encoded frames may be different.

So, it does not make sense to compare the numbers in different methods directly, unless making sure they are using same test conditions.

Please find more details on the test conditions.

📋 DCVC-family

DCVC-RT builds on the success of the DCVC family of models. The details of DCVC family models can be found in DCVC-family.

Model	Paper	Code	Checkpoint
DCVC	Paper (NeurIPS 2021) & Paper (arXiv)	Code	Checkpoints
DCVC-TCM	Paper (IEEE TMM) & Paper (arXiv)	Code	Checkpoints
DCVC-HEM	Paper (ACM MM 2022) & Paper (arXiv)	Code	Checkpoints
DCVC-DC	Paper (CVPR 2023) & Paper (arXiv)	Code	Checkpoints
DCVC-FM	Paper (CVPR 2024) & Paper (arXiv)	Code	Checkpoints
DCVC-RT	Paper (arXiv)	Code	Checkpoints
EVC	Paper (ICLR 2023) & Paper (arXiv)	Code	Checkpoints

As a backup, all the pretrained models could be found here.

📄 Citation

If you find this work useful for your research, please cite:

BibTeX (click to expand)

@article{li2021deep,
  title={Deep Contextual Video Compression},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

@article{sheng2022temporal,
  title={Temporal context mining for learned video compression},
  author={Sheng, Xihua and Li, Jiahao and Li, Bin and Li, Li and Liu, Dong and Lu, Yan},
  journal={IEEE Transactions on Multimedia},
  year={2022},
  publisher={IEEE}
}

@inproceedings{li2022hybrid,
  title={Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}

@inproceedings{li2023neural,
  title={Neural Video Compression with Diverse Contexts},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
             {CVPR} 2023, Vancouver, Canada, June 18-22, 2023},
  year={2023}
}

@inproceedings{li2024neural,
  title={Neural Video Compression with Feature Modulation},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
             {CVPR} 2024, Seattle, WA, USA, June 17-21, 2024},
  year={2024}
}

@inproceedings{jia2025towards,
  title={Towards Practical Real-Time Neural Video Compression},
  author={Jia, Zhaoyang and Li, Bin and Li, Jiahao and Xie, Wenxuan and Qi, Linfeng and Li, Houqiang and Lu, Yan},
  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
             {CVPR} 2025, Nashville, TN, USA, June 11-25, 2024},
  year={2025}
}

@inproceedings{wang2023EVC,
  title={EVC: Towards Real-Time Neural Image Compression with Mask Decay},
  author={Wang, Guo-Hua and Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

Acknowledgement

The implementation of DCVC-RT is based on CompressAI.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!