MiDaS 3.1
- New models based on 5 different types of transformers (BEiT, Swin2, Swin, Next-ViT, LeViT)
- Training datasets extended from 10 to 12, including also KITTI and NYU Depth V2 using BTS split
- Best model, BEiTLarge 512, with resolution 512x512, is on average about 28% more accurate than MiDaS v3.0
- Integrated live depth estimation from camera feed