RealESRGANv2

RealESRGANv2 is a practical image super-resolution neural network that removes common image artifacts.

Link:

(stable) https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/RealESRGANv2_v1.7z

Models

Includes small models optimized for anime videos:

RealESRGANv2: requires RGB input
- animevideo-xsx2, animevideo-xsx4 (2x / 4x upscaling)

`vsmlrt.py` wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt:

from vsmlrt import RealESRGANv2, RealESRGANv2Model, Backend

src = core.std.BlankClip(format=vs.RGBS)

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = RealESRGANv2(src, model=RealESRGANv2Model.animevideo_xsx2, backend=Backend.ORT_CUDA())

Raw Model Usage

src = core.std.BlankClip(width=1920, height=1080, format=vs.RGBS)
flt = core.ov.Model(src, "RealESRGANv2/RealESRGANv2-animevideo-xsx2")

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

CPU: private memory including VapourSynth
GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-realesrgan v2.0.0, PyTorch 1.10.1+cu113
vs-mlrt v8 (driver 511.79)

Performance

FP32

Model	[1] ort-cuda	[1] trt	[2] cuda	[3] ort-cuda	[3] trt	[3] trt (no tf32)
xsx2	5.12 / 2594	5.35 / 2953	3.34 / 7251	5.92 / 2786	6.70 / 1760	6.65 / 1759

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[3] ort-cuda	[3] trt	[3] trt (2 streams)
xsx2	10.6 / 2042	11.7 / 2041	21.6 / 2926	3.43 / 4753	8.12 / 3552	13.1 / 1624	22.3 / 2451

RTX 2080 Ti

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-realesrgan v2.0.0, PyTorch 1.10.1+cu113
vs-mlrt v8 (driver 511.79)

Performance

FP32

Model	[1] ort-cuda	[1] trt	[2] cuda	[3] ort-cuda	[3] trt
xsx2	3.24 / 2302	3.91 / 1816	2.63 / 6804	3.08 / 2215	3.89 / 1603

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[3] ort-cuda	[3] trt	[3] trt (2 streams)
xsx2	4.84 / 2558	7.50 / 1299	11.8 / 1947	3.16 / 4296	5.12 / 2983	7.65 / 1450	14.0 / 2265

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-realesrgan v2.0.0, PyTorch 1.10.1+cu113

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda
xsx2	5.27 / 2213	6.07 / 1835	6.86 / 2999	3.64 / 6799

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda
xsx2	8.90 / 2981	11.8 / 1637	15.5 / 2539	4.72 / 4291

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-realesrgan v2.0.0, PyTorch 1.10.1+cu113

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda
xsx2	4.15 / 2817	4.97 / 1965	5.26 / 3125	3.47 / 7075

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda
xsx2	7.16 / 3585	11.5 / 1881	12.1 / 2769	4.90 / 4585

Tesla A10G

Software: VapourSynth R58, Windows Server 2022, Graphics Driver 511.65, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v8

Performance

FP32

Model	[1] trt
xsx2	5.88 / 1584

FP16

Model	[1] trt	[1] trt (2 streams)
xsx2	13.0 / 1525	14.8 / 2401

Tesla A100 (PCIe, 40 GB)

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-realesrgan v2.0.0, PyTorch 1.10.1+cu113

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[1] trt (3 streams)	[2] cuda
xsx2	11.4 / 3903	16.7 / 2167	20.3 / 3315	21.5 / 4463	7.00 / 7229

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[1] trt (3 streams)	[2] cuda
xsx2	16.1 / 3647	28.8 / 1655	41.0 / 2295	46.0 / 2953	6.89 / 4701

Tesla A100 (SXM4, 80 GB)

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

vs-mlrt v9

Performance

FP16

Model	[1] trt	[1] trt (2 streams)	[1] trt (3 streams)
xsx2	25.9 / 1987	34.9 / 1915	44.4 / 2530

Icelake Server

Hardware: Xeon Icelake Server 32C64T @2.90 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6

Performance

FP32

Model	[1] ov-cpu
xsx2	1.49 / 16751

EPYC Milan

Hardware: EPYC Milan 32C64T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019

Input size: 1920x1080

Backends

vs-mlrt v6

Performance

FP32

Model	[1] ov-cpu
xsx2	0.53 / 16750

Home

Runtimes
Models
- waifu2x
- DPIR
- RealESRGANv2
- Real-CUGAN
- RIFE
- External models
Device-specific benchmarks

RealESRGANv2

Models

vsmlrt.py wrapper Usage

Raw Model Usage

Benchmarking

RTX 3090

Backends

Performance

FP32

FP16

RTX 2080 Ti

Backends

Performance

FP32

FP16

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Tesla A10G

Backends

Performance

FP32

FP16

Tesla A100 (PCIe, 40 GB)

Backends

Performance

FP32

FP16

Tesla A100 (SXM4, 80 GB)

Backends

Performance

FP16

Icelake Server

Backends

Performance

FP32

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

`vsmlrt.py` wrapper Usage