Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation process issue (died with <signals.SIGKILL>) #135

Closed
MacavityT opened this issue Sep 17, 2020 · 3 comments
Closed

evaluation process issue (died with <signals.SIGKILL>) #135

MacavityT opened this issue Sep 17, 2020 · 3 comments

Comments

@MacavityT
Copy link

When I use the test command, it will work until the all test data end but the "result.pkl" can't be saved, and then get an error message like this :

"subprocess.CalledProcessError: Command '['/home/taiyan/anaconda3/envs/mmseg/bin/python', '-u', './tools/test.py', '--local_rank=3', './configs/highway/deeplabv3plus.py', '/home/taiyan/highway-0915/work_dir_0916/iter_40000.pth', '--launcher', 'pytorch', '--out', '/home/taiyan/highway-0915/result.pkl', '--eval', 'mIoU']' died with <Signals.SIGKILL: 9>."

And then, I have to execute the command "kill process-id" to end the process because the GPUs were still be occupied.

Also in the training process, the same issue will occur so that I cancel the evaluation process by setting the "evaluation = dict(interval=0, metric='mIoU')".

Please tell me how to solve it.

@xvjiarui
Copy link
Collaborator

Hi @MacavityT

  1. It may due to insufficient memory. If you are using docker, please enlarge share memory.
  2. You may disable validation by passing --no-validate.

@MacavityT
Copy link
Author

Hi @xvjiarui
Thank you for the help , and I found some mistakes.

1.In the file "mmseg/apis/test.py" , line 53 "out_file = osp.join(out_dir, img_meta['ori_filename'])". The code "img_meta['ori_filename']" will return an absolute path ,so that in the following "join" operator will not work as "given_path+image_name". Actually,the code in line 53 didn't do anything,thus the train image will be covered by result image.

2.In the file "mmseg/apis/test.py". Both the function "single_gpu_test" and "multi_gpu_test" have common problem. "Result" will be appended into "Results" in the loop, and "Results" are dumped at the end of test process. So if the test data have 2k images and each of them with shape (1024,512), it will be out of memory.

Hope the two problems above could help something.

@xvjiarui
Copy link
Collaborator

Hi @MacavityT

  1. is fixed by Use img_prefix and seg_prefix for loading #153
  2. You may train single GPU non-distributed test instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants