Cheng, Bowen, Alex Schwing, and Alexander Kirillov. "Per-pixel classification is not all you need for semantic segmentation." Advances in Neural Information Processing Systems 34 (2021): 17864-17875.
Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|
Maskformer-tiny | SwinTransformer | 512x512 | 160000 | 47.93 | - | - | model | log | vdl |
Maskformer-small | SwinTransformer | 512x512 | 160000 | 50.4 | - | - | model | log | vdl |
-
Maskformer support different network setting including tiny, small, base and large. The training result of base and large is not provided, but it should be consistent with the paper
-
Maskformer-Base and Maskformer-Large will be evaled with multi-scale and flip as the original codebase .
-
Please use cuda 11.2 rather than cuda 10.2 to prevent computation bugs.