Scene Parsing

Accurate Models

Method	Backbone	ADE20K ^(mIoU)	Cityscapes ^(mIoU)	COCO-Stuff ^(mIoU)	Params ^(M)	GFLOPs ^(512x512)	GFLOPs ^(1024x1024)	Weights
SegFormer	MiT-B1	42.2	78.5	40.2	14	16	244	ade
	MiT-B2	46.5	81.0	44.6	28	62	717	ade
	MiT-B3	49.4	81.7	45.5	47	79	963	ade

Light-Ham	VAN-S	45.7	-	-	15	21	-	-
	VAN-B	49.6	-	-	27	34	-	-
	VAN-L	51.0	-	-	46	55	-	-

Lawin	MiT-B1	42.1	79.0	40.5	14	13	218	-
	MiT-B2	47.8	81.7	45.2	30	45	563	-
	MiT-B3	50.3	82.5	46.6	50	62	809	-

TopFormer	TopFormer-T	34.6	-	-	1.4	0.6	-	-
	TopFormer-S	37.0	-	-	3.1	1.2	-	-
	TopFormer-B	39.2	-	-	5.1	1.8	-	-

Real-time Models

Method	Backbone	CityScapes-val ^(mIoU)	CamVid ^(mIoU)	Params (M)	GFLOPs ^(1024x2048)	Weights
BiSeNetv1	ResNet-18	74.8	68.7	14	49	-
BiSeNetv2	-	73.4	72.4	18	21	-
SFNet	ResNetD-18	79.0	-	13	-	-
DDRNet	DDRNet-23slim	77.8	74.7	6	36	city

Face Parsing

Method	Backbone	HELEN-val ^(mIoU)	Params ^(M)	GFLOPs ^(512x512)	FPS ^(GTX1660ti)	Weights
BiSeNetv1	ResNet-18	58.50	14	13	263	HELEN
BiSeNetv2	-	58.58	18	15	195	HELEN
DDRNet	DDRNet-23slim	61.11	6	5	180	HELEN
SFNet	ResNetD-18	61.00	14	31	56	HELEN