Error in sampling #7

Rinkumc · 2023-11-23T11:43:41Z

(reinvent4) rinku@admin:~/REINVENT4/configs/toml$ reinvent -l sampling.log sampling.toml
Traceback (most recent call last):
File "/home/rinku/miniconda3/envs/reinvent4/bin/reinvent", line 8, in
sys.exit(main())
File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 284, in main
runner(input_config, actual_device, tb_logdir, responder_config)
File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 101, in run_sampling
sampled = sampler.sample(input_smilies)
File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/mol2mol.py", line 50, in sample
dataset = Dataset(smilies, self.model.get_vocabulary(), tokenizer)
File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/models/mol2mol/dataset/dataset.py", line 25, in init
enc = self._vocabulary.encode(tokenized)
File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/models/mol2mol/models/vocabulary.py", line 60, in encode
ohe_vect[i] = self._tokens[token]
KeyError: '[S@+]'
Error occured during sampling

#############################
Sampling.toml
# REINVENT4 TOML input example for sampling
#


run_type = "sampling"
use_cuda = true  # run on the GPU if true, on the CPU if false
json_out_config = "_sampling.json"  # write this TOML to JSON


[parameters]

# Uncomment one of the comment blocks below.  Each generator needs a model
# file and possibly a SMILES file with seed structures.

## Reinvent: de novo sampling
#model_file = "/home/rinku/REINVENT4/priors/reinvent.prior"

## LibInvent: find R-groups for the given scaffolds
#model_file = "priors/libinvent.prior"
#smiles_file = "scaffolds.smi"  # 1 scaffold per line with attachment points

## LinkInvent: find a linker/scaffold to link two fragments
#model_file = "priors/linkinvent.prior"
#smiles_file = "warheads.smi"  # 2 warheads per line separated with '|'

## Mol2Mol: find molecules similar to the provided molecules
model_file = "/home/rinku/REINVENT4/priors/mol2mol_medium_similarity.prior"
smiles_file = "mol2mol.smi"  # 1 compound per line
sample_strategy = "beamsearch"  # multinomial or beamsearch (deterministic)
temperature = 1.0 # temperature in multinomial sampling
tb_logdir = "tb_logs"  # name of the TensorBoard logging directory

output_file = 'sampling.csv'  # sampled SMILES and NLL in CSV format

num_smiles = 110  # number of SMILES to be sampled, 1 per input SMILES
unique_molecules = true  # if true remove all duplicatesd canonicalize smiles
randomize_smiles = true # if true shuffle atoms in SMILES randomly

################
Input.smi
O=C1O[C@@H](C(=O)N)CN1c2cc(F)c(cc2)[C@H]3CC[S@](=O)CC3

The text was updated successfully, but these errors were encountered:

halx · 2023-11-23T21:11:14Z

As far as I can see, the problem is that the first and last stereocentres are not real. You should remove those and try again.

Rinkumc · 2023-11-24T05:20:03Z

Okay

halx · 2023-11-24T07:51:46Z

I am sorry but I realize that I have pasted the wrong SMILES string. I see now that you posted O=C1O[C@@H](C(=O)N)CN1c2cc(F)c(cc2)[C@H]3CC[S@](=O)CC3. All sterechemistries in this molecules are real.

We will have a look into this case and see how to resolve this.

halx · 2023-11-24T13:58:57Z

So, the issue is with the current prior models for Mol2Mol. Those have been trained on pairs from ChEMBL but pruned for molecules that did not come from the same publication. This was done under the assumption that the molecules are from the same series hence following chemical intuition. This, unfortunately, leads to a more limited chemistry in the models including the sulfoxide you have in your model. At the end, those priors are essentially just proof-of concept.

At some point in the future we will release models trained on the larger PubChem dataset without making assumptions how pairs were/should be constructure. For the time being you can only try to remove the stereochemistry annotation on the sulfur.

Rinkumc · 2023-11-25T07:11:38Z

Okay Thanks!!

halx closed this as completed Nov 27, 2023

kingljy0818 mentioned this issue May 29, 2024

Error Encountered While Running Reinvent_TLRL.ipynb in REINVENT4 #88

Closed

Apl0x mentioned this issue Jun 4, 2024

Unexpected behaviour with MatchingSubstructure. #90

Closed

Luzuokun mentioned this issue Jun 6, 2024

Error countered when running Dockstream in REINVENT #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in sampling #7

Error in sampling #7

Rinkumc commented Nov 23, 2023 •

edited by halx

Loading

halx commented Nov 23, 2023

Rinkumc commented Nov 24, 2023

halx commented Nov 24, 2023

halx commented Nov 24, 2023 •

edited

Loading

Rinkumc commented Nov 25, 2023

Error in sampling #7

Error in sampling #7

Comments

Rinkumc commented Nov 23, 2023 • edited by halx Loading

halx commented Nov 23, 2023

Rinkumc commented Nov 24, 2023

halx commented Nov 24, 2023

halx commented Nov 24, 2023 • edited Loading

Rinkumc commented Nov 25, 2023

Rinkumc commented Nov 23, 2023 •

edited by halx

Loading

halx commented Nov 24, 2023 •

edited

Loading