Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having Issues re training the model with the existing data provided #6

Open
mujeebarshad opened this issue Sep 14, 2024 · 3 comments
Open

Comments

@mujeebarshad
Copy link

I am getting the following error on re training the uspto data that has been provided in the link: https://drive.google.com/drive/folders/1lZOLRGyZy18EVow7gyxtKWvs_yuwlIE3?usp=sharing

The atoms count error shouldn't appear at all since the data is the existing one that the model is already trained on. Any thought?

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/unicore-train", line 8, in <module>
[rank0]:     sys.exit(cli_main())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 418, in cli_main
[rank0]:     distributed_utils.call_main(args, main)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 186, in call_main
[rank0]:     distributed_main(int(os.environ["LOCAL_RANK"]), main, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 160, in distributed_main
[rank0]:     main(args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 105, in main
[rank0]:     extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/checkpoint_utils.py", line 223, in load_checkpoint
[rank0]:     extra_state, epoch_itr = trainer.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 433, in load_checkpoint
[rank0]:     epoch_itr = self.get_train_iterator(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 508, in get_train_iterator
[rank0]:     self.reset_dummy_batch(batch_iterator.first_batch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in first_batch
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in <listcomp>
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/base_wrapper_dataset.py", line 18, in __getitem__
[rank0]:     return self.dataset[index]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in __getitem__
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in <genexpr>
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 116, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 121, in __getitem_cached__
[rank0]:     data = self.dataset[idx]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 27, in __getitem__
[rank0]:     return self.__cached_item__(idx, self.epoch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 24, in __cached_item__
[rank0]:     return self.dataset[idx][self.key]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 132, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 137, in __getitem_cached__
[rank0]:     reactant = self.reactant_dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 67, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 71, in __getitem_cached__
[rank0]:     smiles = self.dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 75, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 85, in __getitem_cached__
[rank0]:     nm = Chem.RenumberAtoms(reactant_mol, list_reactant)
[rank0]: ValueError: atomCounts shorter than the number of atoms
@mujeebarshad
Copy link
Author

@synsis could you please help me resolve it.

@hyj-real
Copy link

I am getting the following error on re training the uspto data that has been provided in the link: https://drive.google.com/drive/folders/1lZOLRGyZy18EVow7gyxtKWvs_yuwlIE3?usp=sharing

The atoms count error shouldn't appear at all since the data is the existing one that the model is already trained on. Any thought?

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/unicore-train", line 8, in <module>
[rank0]:     sys.exit(cli_main())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 418, in cli_main
[rank0]:     distributed_utils.call_main(args, main)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 186, in call_main
[rank0]:     distributed_main(int(os.environ["LOCAL_RANK"]), main, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 160, in distributed_main
[rank0]:     main(args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 105, in main
[rank0]:     extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/checkpoint_utils.py", line 223, in load_checkpoint
[rank0]:     extra_state, epoch_itr = trainer.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 433, in load_checkpoint
[rank0]:     epoch_itr = self.get_train_iterator(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 508, in get_train_iterator
[rank0]:     self.reset_dummy_batch(batch_iterator.first_batch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in first_batch
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in <listcomp>
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/base_wrapper_dataset.py", line 18, in __getitem__
[rank0]:     return self.dataset[index]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in __getitem__
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in <genexpr>
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 116, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 121, in __getitem_cached__
[rank0]:     data = self.dataset[idx]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 27, in __getitem__
[rank0]:     return self.__cached_item__(idx, self.epoch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 24, in __cached_item__
[rank0]:     return self.dataset[idx][self.key]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 132, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 137, in __getitem_cached__
[rank0]:     reactant = self.reactant_dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 67, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 71, in __getitem_cached__
[rank0]:     smiles = self.dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 75, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 85, in __getitem_cached__
[rank0]:     nm = Chem.RenumberAtoms(reactant_mol, list_reactant)
[rank0]: ValueError: atomCounts shorter than the number of atoms

may I ask if this issue is solved, I am also experiencing the same problem

@mujeebarshad
Copy link
Author

mujeebarshad commented Dec 12, 2024

I am getting the following error on re training the uspto data that has been provided in the link: https://drive.google.com/drive/folders/1lZOLRGyZy18EVow7gyxtKWvs_yuwlIE3?usp=sharing
The atoms count error shouldn't appear at all since the data is the existing one that the model is already trained on. Any thought?

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/unicore-train", line 8, in <module>
[rank0]:     sys.exit(cli_main())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 418, in cli_main
[rank0]:     distributed_utils.call_main(args, main)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 186, in call_main
[rank0]:     distributed_main(int(os.environ["LOCAL_RANK"]), main, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 160, in distributed_main
[rank0]:     main(args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 105, in main
[rank0]:     extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/checkpoint_utils.py", line 223, in load_checkpoint
[rank0]:     extra_state, epoch_itr = trainer.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 433, in load_checkpoint
[rank0]:     epoch_itr = self.get_train_iterator(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 508, in get_train_iterator
[rank0]:     self.reset_dummy_batch(batch_iterator.first_batch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in first_batch
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in <listcomp>
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/base_wrapper_dataset.py", line 18, in __getitem__
[rank0]:     return self.dataset[index]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in __getitem__
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in <genexpr>
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 116, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 121, in __getitem_cached__
[rank0]:     data = self.dataset[idx]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 27, in __getitem__
[rank0]:     return self.__cached_item__(idx, self.epoch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 24, in __cached_item__
[rank0]:     return self.dataset[idx][self.key]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 132, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 137, in __getitem_cached__
[rank0]:     reactant = self.reactant_dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 67, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 71, in __getitem_cached__
[rank0]:     smiles = self.dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 75, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 85, in __getitem_cached__
[rank0]:     nm = Chem.RenumberAtoms(reactant_mol, list_reactant)
[rank0]: ValueError: atomCounts shorter than the number of atoms

may I ask if this issue is solved, I am also experiencing the same problem

@hyj-real #7 This PR is the fix for your issues. I had to debug the code to fix it and found the issue. Let me know if you still face any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants