Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle unacceptable SMILEs? #16

Closed
RunshengSong opened this issue Apr 29, 2017 · 1 comment
Closed

How to handle unacceptable SMILEs? #16

RunshengSong opened this issue Apr 29, 2017 · 1 comment
Assignees
Labels

Comments

@RunshengSong
Copy link

Hi, I used your package to calculate descriptors for thousands chemicals all together. My code:

mols = [Chem.MolFromSmiles(smi) for smi in all_SMILEs]
df = calc.pandas(mols)

where "all_SMILEs" contain thousands SMILEs, but some of them are not acceptable by mordred (or rdkit ?). For example, this one: O=C(c1cc(Cl)c(Cl)n1)c2c(O)c(cc(Cl)c2O)CCCCCC

And it throw this error:

---------------------------------------------------------------------------
ArgumentError                             Traceback (most recent call last)
<ipython-input-15-78193e400eba> in <module>()
     14     mols = [Chem.MolFromSmiles(smi)]
     15 
---> 16     df = calc.pandas(mols)
     17 
     18 mols = [Chem.MolFromSmiles(smi) for smi in all_SMILEs]

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/mordred/_base/calculator.pyc in pandas(self, mols, nproc, nmols, quiet, ipynb, id)
    293         return pandas.DataFrame(
    294             self.map(mols, nproc, nmols, quiet, ipynb, id),
--> 295             columns=[str(d) for d in self.descriptors]
    296         )
    297 

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    298         elif isinstance(data, (list, types.GeneratorType)):
    299             if isinstance(data, types.GeneratorType):
--> 300                 data = list(data)
    301             if len(data) > 0:
    302                 if is_list_like(data[0]) and getattr(data[0], 'ndim', 1) == 1:

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/mordred/_base/parallel.pyc in parallel(self, mols, nproc, nmols, quiet, ipynb, id)
     41     try:
     42         with self._progress(quiet, nmols, ipynb) as bar:
---> 43             for result in [do_task(m) for m in mols]:
     44                 r, err = get_result(result)
     45 

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/mordred/_base/parallel.pyc in do_task(mol)
     36 
     37     def do_task(mol):
---> 38         args = Context.from_calculator(self, mol, id)
     39         return pool.apply_async(worker, (args,))
     40 

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/mordred/_base/context.pyc in from_calculator(cls, calc, mol, id)
     54     @classmethod
     55     def from_calculator(cls, calc, mol, id):
---> 56         return cls.from_query(mol, calc._require_3D, calc._explicit_hydrogens, calc._kekulizes, id)
     57 
     58     def get_coord(self, desc):

/home/runsheng/anaconda2/envs/CLiCC/lib/python2.7/site-packages/mordred/_base/context.pyc in from_query(cls, mol, require_3D, explicit_hydrogens, kekulizes, id)
     24     @classmethod
     25     def from_query(cls, mol, require_3D, explicit_hydrogens, kekulizes, id):
---> 26         n_frags = len(Chem.GetMolFrags(mol))
     27 
     28         if mol.HasProp('_Name'):

ArgumentError: Python argument types in
    rdkit.Chem.rdmolops.GetMolFrags(NoneType)
did not match C++ signature:
    GetMolFrags(RDKit::ROMol mol, bool asMols=False, bool sanitizeFrags=True)

How should I handle SMILEs that is not acceptable? This error message would be more useful if it tells me which SMILEs is not acceptable so I can remove it myself.

Thanks.

@philopon
Copy link
Member

philopon commented Jun 8, 2017

I'm sorry for the late reply.

It seems not acceptable SMILES by rdkit (not mordred). Chem.MolFromSmiles returns None if it is given invalid SMILES. You can handle error by checking None.

example:

mols = []
for smi in all_SMILES:
    mol = Chem.MolFromSmiles(smi)
    if mol is None:
        # handing error
        print("cannot parse SMILES: {}".format(smi))
        continue

    mols.append(mol)

I think this error message it is little unkind. I think to fix it.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants