Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'torch.distributed_shard' #6

Open
mitchawitt1 opened this issue Sep 20, 2023 · 2 comments
Open

Comments

@mitchawitt1
Copy link

Running into an module not found error that's in the title of this issue. My conda environment is built using the environment.yml file that is in this repository. When running this command on the provided examples, I get the error.

python ./inference.py -c ./examples/6AL0/ -w ./examples/work/ -r ./examples/result

Loading config file <local directory>/DProQ/configs/Lab_saw_mulitase_gate_af2_decoy_knn10_seed2222.json
Clean pdb files generated.
DIST files generated
DGL files generated
Lightning automatically upgraded your loaded checkpoint from v1.5.10 to v2.0.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ckpt/pre_train_seed_222.ckpt`
Traceback (most recent call last):
  File "inference.py", line 139, in <module>
    model = DPROQLi.load_from_checkpoint(ckpt_file)
  File "<miniconda directory>/envs/DProQ/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1532, in load_from_checkpoint
    loaded = _load_from_checkpoint(
  File "<miniconda directory>/envs/DProQ/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 90, in _load_from_checkpoint
    return _load_state(cls, checkpoint, strict=strict, **kwargs)
  File "<miniconda directory>/envs/DProQ/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 136, in _load_state
    obj = cls(**_cls_kwargs)
  File "<local directory>/DProQ/src/run_DPROQ_li_multitask_v2_gate.py", line 99, in __init__
    super().__init__()
  File "<miniconda directory>/envs/DProQ/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 128, in __init__
    self._register_sharded_tensor_state_dict_hooks_if_available()
  File "<miniconda directory>/envs/DProQ/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1570, in _register_sharded_tensor_state_dict_hooks_if_available
    from torch.distributed._shard.sharded_tensor import pre_load_state_dict_hook, state_dict_hook
ModuleNotFoundError: No module named 'torch.distributed._shard'

Seems to occur when loading a checkpoint file. Any help with this would be appreciated, thanks!

@XiaoChen1992
Copy link
Collaborator

Hi, Please use our latest git repo at: https://github.com/jianlin-cheng/DProQA

That should be fine.
Thank you

@XiaoChen1992
Copy link
Collaborator

XiaoChen1992 commented Sep 21, 2023

Also, we update the environment.yml file. You problem should be solved if you reinstall the conda environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants