Version of Pytorch and Cuda #136

yuxiangwei0808 · 2022-09-26T07:11:30Z

Hi, I am repeating the experiment in the "osdi21-artifact" branch. However, I have encountered multiple jobs failure due to some errors:

Traceback (most recent call last):
  File "run_glue.py", line 750, in <module>
    main()
  File "run_glue.py", line 476, in main
    model = adaptdl.torch.AdaptiveDataParallel(model, optimizer, lr_scheduler)
  File "/root/adaptdl/adaptdl/torch/parallel.py", line 68, in __init__
    adaptdl.checkpoint.load_state(self._state)
  File "/root/adaptdl/adaptdl/checkpoint.py", line 137, in load_state
    state.load(f)
  File "/root/adaptdl/adaptdl/torch/parallel.py", line 194, in load
    state_dicts, self.gain = torch.load(fileobj)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 600, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

which usually happens when rescaling. I think this possibly resulted from the conflict of environment. Therefore, could you please provide the versions of Pytorch, Cuda, Python, and other necessary modules?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version of Pytorch and Cuda #136

Version of Pytorch and Cuda #136

yuxiangwei0808 commented Sep 26, 2022

Version of Pytorch and Cuda #136

Version of Pytorch and Cuda #136

Comments

yuxiangwei0808 commented Sep 26, 2022