Memory Leak when single GPU Test on 2.7+ images

**Describe the bug**
Testing a  pre-trained cityscapes model using a _single gpu_ for train images leads to exhaustion of RAM memory.  

**Reproduction**

1. What command or script did you run?
    ```
     python tools/test.py configs/fcn/fcn_r50-d8_512x1024_80k_cityscapes.py  {checkpoint}  --data_path {path of the data} --eval mIoU  ( without using distributed data parallel params)
    ```
2. Did you make any modifications on the code or config? Did you understand what you have modified?
    Added data_path parse argument to input the path to cityscapes dataset. Changed test paths to train paths to test the model on training images.

3. What dataset did you use?
Cityscapes and also observed the same with a custom dataset
**Environment**

1. Please run `python mmseg/utils/collect_env.py` to collect necessary environment infomation and paste it here.
sys.platform: linux
Python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
CUDA available: False
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
PyTorch: 1.3.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.4.1a0+d94043a
OpenCV: 4.4.0
MMCV: 1.2.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMSegmentation: 0.8.0+0d10921

2. You may add addition that may be helpful for locating the problem, such as
    - I tried to switch to  **pytorch1.6.0**  and also changed corresponding mmcv library but still the problem still persists.  I also tried with the **latest master** its still the same.
    - I have tried using memory_profiler to locate the memory leak but this did not help.
    - Tried with setting num_workers to 0 and also LRU_CACHE_CAPACITY=1 to avoid excessive memory usage.
    - I also observed memory exhaustions during **training** the model on  cityscapes and my  custom dataset . Like  RAM with **60GB**  exhausts after 20k epochs for cityscapes. 
    -  Testing the model on **cityscapes** **validation** also leads to continuous increase in memory usage but since the number of val images are 500 and my RAM allocated is **60GB** this does not crash.

**Error traceback**

I'm running my code on a single node on a headless **SLURM** so I cannot perform any interactive debugging. I **have not** made any 
changes to the source code except those mentioned above.  I'm trying to debug this since a week but still no luck. Please let me know if you can find a solution for my problem.

```none
A placeholder for trackback.
```

**Bug fix**

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Leak when single GPU Test on 2.7+ images #287

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Leak when single GPU Test on 2.7+ images #287

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions