Low evaluation scores on pre-trained models


**Describe the issue**
I am trying to replicate the evaluation results from different Models on different datasets (all supported by MMSegmentation) but I always get really low mIoU scores (~ 9.5 mIoU), even when the results seem good (when plotted)

I have implemented some custom wrappers around MMsegmentation, but leaving the functionalities untouched and using all the recommended apis and classes as in the tutorials. 

**Reproduction**

1. What command or script did you run?

I am running this custom script, some of the variables are stored in a general purpose class for easier run. The val_dataset itself is the 
test_dataset and the self.cfg file is the Config file from the corresponding dataset. The config file and the weights are processed from the YAML file (e.g configs/segformer/segformer.yaml), downloaded and the config.py file taking directly from the config file. 

The model is created directly with the init_segmentor() function with the same config and the checkpoint path. 
```
data_loader = build_dataloader(self.val_dataset[0], workers_per_gpu=self.cfg.data.workers_per_gpu,
                                       samples_per_gpu=self.cfg.data.samplers_per_gpu, dist=self.multiple_gpu)
model = MMDataParallel(self.model, device_ids=self.cfg.gpu_ids)
results = single_gpu_test(model, data_loader=data_loader, pre_eval=True)
eval_results = self.val_dataset[0].evaluate(results)
print("Final Evaluation Results", eval_results)
```

No errors or warnings come out during dataset/model building or testing. 

2. What config dir you run?

Different configs, like 
```
segformer_mit-b1_8x1_1024x1024_160k_cityscapes
fcn_hr18_512x1024_40k_cityscapes
fcn_hr48_512x512_80k_potsdam
```
3. Did you make any modifications to the code or config? Did you understand what you have modified?

I have not modified the configs file more than just the samplers_per_gpu or workers_per_gpu, different data_roots, but not anything else.

4. What dataset did you use?

Cityscapes and Potsdam mostly

**Environment**

{'sys.platform': 'linux', 'Python': '3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) [GCC 9.4.0]', 'CUDA available': True, 'GPU 0': 'Quadro K2200', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.3.r11.3/compiler.29745058_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.10.2', 'PyTorch compiling details': 'PyTorch built with:\n  - GCC 7.3\n  - C++ Version: 201402\n  - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 11.3\n  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n  - CuDNN 8.2\n  - Magma 2.5.2\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.11.3', 'OpenCV': '4.5.5', 'MMCV': '1.4.4', 'MMCV Compiler': 'GCC 7.3', 'MMCV CUDA Compiler': '11.3', 'MMSegmentation': '0.21.1+bf80039'}

**Results**

The weird thing is, that I am plotting the results from the networks and their outputs seem almost identical to the groundtruths, which leads me to think that the models I am loading are indeed inferencing correctly the inputs (images also loaded from the same data_loader I am using for the evaluation). The error must be then somehow in the evaluation of said models, but from the few lines I wrote I dont see where the mistake could be.

I have also tried training these models and during validation their scores are also really low, so I dont know if somehow I am loading the models correctly for inference but the evaluation is not working.  

Note: the mIoU is printed as 9.5 in percent value and then printed again as absolute value (0.095)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low evaluation scores on pre-trained models #1322

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low evaluation scores on pre-trained models #1322

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions