You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evolve is not closing its resources. It leaves a lot of files open. In evolve.txt I can see 348 entries (lines), then it is aborting with error below.
Code is checked out latest from repo.
Image sizes 256 train, 256 test
Using 4 dataloader workers
Logging results to runs/train/evolve
Starting training for 8 epochs...
Epoch gpu_mem box obj cls total targets img_size
0%| | 0/282 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 142, in _serve
File "/usr/lib/python3.7/multiprocessing/connection.py", line 453, in accept
File "/usr/lib/python3.7/multiprocessing/connection.py", line 599, in accept
File "/usr/lib/python3.7/socket.py", line 212, in accept
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 322, in reduce_storage
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 194, in DupFd
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 48, in __init__
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 142, in _serve
with self._listener.accept() as conn:
File "/usr/lib/python3.7/multiprocessing/connection.py", line 453, in accept
c = self._listener.accept()
File "/usr/lib/python3.7/multiprocessing/connection.py", line 599, in accept
s, self._last_accepted = self._socket.accept()
File "/usr/lib/python3.7/socket.py", line 212, in accept
fd, addr = self._accept()
OSError: [Errno 24] Too many open files
Environment
If applicable, add screenshots to help explain your problem.
OS: Ubuntu
GPU Nvidia 1070
The text was updated successfully, but these errors were encountered:
As I see it from the files open I can see a lot of multiple complete python environments kept active. It looks to me as if train.py spawns a new environment for each evaluation cycle without closing it after stored a new generation of hyperparameter set.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🐛 Bug
Evolve is not closing its resources. It leaves a lot of files open. In evolve.txt I can see 348 entries (lines), then it is aborting with error below.
Code is checked out latest from repo.
To Reproduce (REQUIRED)
Evolve is called like this:
python ./train.py --img 256 --batch 4 --epochs 8 --data $DATA_BASE/data.yaml --cfg $DATA_BASE/yolov5s.yaml --weights '' --cache --evolve
Output:
Environment
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: