Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low evaluation scores on pre-trained models #1322

Closed
A-guridi opened this issue Feb 24, 2022 · 2 comments
Closed

Low evaluation scores on pre-trained models #1322

A-guridi opened this issue Feb 24, 2022 · 2 comments
Assignees

Comments

@A-guridi
Copy link

Describe the issue
I am trying to replicate the evaluation results from different Models on different datasets (all supported by MMSegmentation) but I always get really low mIoU scores (~ 9.5 mIoU), even when the results seem good (when plotted)

I have implemented some custom wrappers around MMsegmentation, but leaving the functionalities untouched and using all the recommended apis and classes as in the tutorials.

Reproduction

  1. What command or script did you run?

I am running this custom script, some of the variables are stored in a general purpose class for easier run. The val_dataset itself is the
test_dataset and the self.cfg file is the Config file from the corresponding dataset. The config file and the weights are processed from the YAML file (e.g configs/segformer/segformer.yaml), downloaded and the config.py file taking directly from the config file.

The model is created directly with the init_segmentor() function with the same config and the checkpoint path.

data_loader = build_dataloader(self.val_dataset[0], workers_per_gpu=self.cfg.data.workers_per_gpu,
                                       samples_per_gpu=self.cfg.data.samplers_per_gpu, dist=self.multiple_gpu)
model = MMDataParallel(self.model, device_ids=self.cfg.gpu_ids)
results = single_gpu_test(model, data_loader=data_loader, pre_eval=True)
eval_results = self.val_dataset[0].evaluate(results)
print("Final Evaluation Results", eval_results)

No errors or warnings come out during dataset/model building or testing.

  1. What config dir you run?

Different configs, like

segformer_mit-b1_8x1_1024x1024_160k_cityscapes
fcn_hr18_512x1024_40k_cityscapes
fcn_hr48_512x512_80k_potsdam
  1. Did you make any modifications to the code or config? Did you understand what you have modified?

I have not modified the configs file more than just the samplers_per_gpu or workers_per_gpu, different data_roots, but not anything else.

  1. What dataset did you use?

Cityscapes and Potsdam mostly

Environment

{'sys.platform': 'linux', 'Python': '3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0]', 'CUDA available': True, 'GPU 0': 'Quadro K2200', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.3.r11.3/compiler.29745058_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.10.2', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.3\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.2\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.11.3', 'OpenCV': '4.5.5', 'MMCV': '1.4.4', 'MMCV Compiler': 'GCC 7.3', 'MMCV CUDA Compiler': '11.3', 'MMSegmentation': '0.21.1+bf80039'}

Results

The weird thing is, that I am plotting the results from the networks and their outputs seem almost identical to the groundtruths, which leads me to think that the models I am loading are indeed inferencing correctly the inputs (images also loaded from the same data_loader I am using for the evaluation). The error must be then somehow in the evaluation of said models, but from the few lines I wrote I dont see where the mistake could be.

I have also tried training these models and during validation their scores are also really low, so I dont know if somehow I am loading the models correctly for inference but the evaluation is not working.

Note: the mIoU is printed as 9.5 in percent value and then printed again as absolute value (0.095)

@MengzhangLI
Copy link
Contributor

Hi, I think it is highly probably caused by different GPU number you used, i.e., total batch size is different.

As for Segformer, default gpu number is 8, so if you only use one GPU, you should make batch size 8 times larger than default setting (if you have enough GPU memory).

@MengzhangLI MengzhangLI self-assigned this Feb 26, 2022
@A-guridi
Copy link
Author

Hello,

I found out that the build_data_loader function sets automatically shuffle=True, thats why my evaluation code was failing to perform. After changing that, the evaluation produces similar results to expected.

The variation due to batch size and GPU number does have a performance impact but not as big as it was in my case.

Anyways, thank you for your help and support !!!

aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023
* [Examples] Correct path

* uP
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023
…b#1334 (open-mmlab#1426)

* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline

* add docs to toc

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* Update pr_tests.yml

Fix tests

* parent 499ff34b3edc3e0c506313ab48f21514d8f58b09
author teticio <teticio@gmail.com> 1668765652 +0000
committer teticio <teticio@gmail.com> 1669041721 +0000

parent 499ff34b3edc3e0c506313ab48f21514d8f58b09
author teticio <teticio@gmail.com> 1668765652 +0000
committer teticio <teticio@gmail.com> 1669041704 +0000

add colab notebook

[Flax] Fix loading scheduler from subfolder (open-mmlab#1319)

[FLAX] Fix loading scheduler from subfolder

Fix/Enable all schedulers for in-painting (open-mmlab#1331)

* inpaint fix k lms

* onnox as well

* up

Correct path to schedlure (open-mmlab#1322)

* [Examples] Correct path

* uP

Avoid nested fix-copies (open-mmlab#1332)

* Avoid nested `# Copied from` statements during `make fix-copies`

* style

Fix img2img speed with LMS-Discrete Scheduler (open-mmlab#896)

Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the  `integrate.quad` call later on- by long I mean more than 10x slower.

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Fix the order of casts for onnx inpainting (open-mmlab#1338)

Legacy Inpainting Pipeline for Onnx Models (open-mmlab#1237)

* Add legacy inpainting pipeline compatibility for onnx

* remove commented out line

* Add onnx legacy inpainting test

* Fix slow decorators

* pep8 styling

* isort styling

* dummy object

* ordering consistency

* style

* docstring styles

* Refactor common prompt encoding pattern

* Update tests to permanent repository home

* support all available schedulers until ONNX IO binding is available

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* updated styling from PR suggested feedback

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

Jax infer support negative prompt (open-mmlab#1337)

* support negative prompts in sd jax pipeline

* pass batched neg_prompt

* only encode when negative prompt is None

Co-authored-by: Juan Acevedo <jfacevedo@google.com>

Update README.md: Minor change to Imagic code snippet, missing dir error (open-mmlab#1347)

Minor change to Imagic Readme

Missing dir causes an error when running the example code.

make style

change the sample model (open-mmlab#1352)

* Update alt_diffusion.mdx

* Update alt_diffusion.mdx

Add bit diffusion [WIP] (open-mmlab#971)

* Create bit_diffusion.py

Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG

* adding bit diffusion to new branch

ran tests

* tests

* tests

* tests

* tests

* removed test folders + added to README

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move Mel to module in pipeline construction, make librosa optional

* fix imports

* fix copy & paste error in comment

* fix style

* add missing register_to_config

* fix class docstrings

* fix class docstrings

* tweak docstrings

* tweak docstrings

* update slow test

* put trailing commas back

* respect alphabetical order

* remove LatentAudioDiffusion, make vqvae optional

* move Mel from models back to pipelines :-)

* allow loading of pretrained audiodiffusion models

* fix tests

* fix dummies

* remove reference to latent_audio_diffusion in docs

* unused import

* inherit from SchedulerMixin to make loadable

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants