Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

EOFError #24

Open
kkjh0723 opened this issue Oct 6, 2017 · 2 comments
Open

EOFError #24

kkjh0723 opened this issue Oct 6, 2017 · 2 comments

Comments

@kkjh0723
Copy link

kkjh0723 commented Oct 6, 2017

While running baseline model(CNN+LSTM+SA) in python 3.6 and pytorch 0.2.0, I got following EOFError.
It seems that it happened after several times of checking accuracy.
I found it by setting checkpoint_every to 1.
Anybody ran into this error?

Checking training accuracy ... 
Traceback (most recent call last):
  File "scripts/train_model.py", line 498, in <module>
    main(args)
  File "scripts/train_model.py", line 152, in main
    train_loop(args, train_loader, val_loader)
  File "scripts/train_model.py", line 276, in train_loop
    baseline_model, train_loader)
  File "scripts/train_model.py", line 454, in check_accuracy
    for batch in loader:
  File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 195, in __next__
    idx, batch = self.data_queue.get()
  File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
@wenwei202
Copy link

Did you figure out how to fix it? Thanks!

@anianruoss
Copy link

+1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants