You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
While running baseline model(CNN+LSTM+SA) in python 3.6 and pytorch 0.2.0, I got following EOFError.
It seems that it happened after several times of checking accuracy.
I found it by setting checkpoint_every to 1.
Anybody ran into this error?
Checking training accuracy ...
Traceback (most recent call last):
File "scripts/train_model.py", line 498, in <module>
main(args)
File "scripts/train_model.py", line 152, in main
train_loop(args, train_loader, val_loader)
File "scripts/train_model.py", line 276, in train_loop
baseline_model, train_loader)
File "scripts/train_model.py", line 454, in check_accuracy
for batch in loader:
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 195, in __next__
idx, batch = self.data_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 345, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
The text was updated successfully, but these errors were encountered:
While running baseline model(
CNN+LSTM+SA
) in python 3.6 and pytorch 0.2.0, I got following EOFError.It seems that it happened after several times of checking accuracy.
I found it by setting
checkpoint_every
to 1.Anybody ran into this error?
The text was updated successfully, but these errors were encountered: