You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often ran into the following error when starting a multi-GPU training.
Traceback (most recent call last):
File "train.py", line 116, in <module>
main(opt)
File "train.py", line 44, in main
p.join()
File "/home/xfbai/anaconda3/envs/torch1.0/lib/python3.6/multiprocessing/process.py", line 124, in join
res = self._popen.wait(timeout)
File "/home/xfbai/anaconda3/envs/torch1.0/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/xfbai/anaconda3/envs/torch1.0/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
TypeError: signal_handler() takes 1 positional argument but 3 were given
I got this error on Ubuntu16.04, Python3.6, Pytorch 1.0.1. Can someone help me understand what's the cause of it? I would really appreciate your help, thank you!
The text was updated successfully, but these errors were encountered:
Aha,I see. This is a bug in our code running multi-GPU. There's a " def signal_handler() " function in the " train.py " that you need to change to " def signal_handler(self, signalnum, stackframe) " . We normally use a single GPU for training.
Hi,
I often ran into the following error when starting a multi-GPU training.
The parameters I used are:
I got this error on Ubuntu16.04, Python3.6, Pytorch 1.0.1. Can someone help me understand what's the cause of it? I would really appreciate your help, thank you!
The text was updated successfully, but these errors were encountered: