Skip to content

The charset #1

Open
Open
@hsiaoyi0504

Description

@hsiaoyi0504

As I proposed in maxhodak/keras-molecules#54. I am interested in why the charset is designed like this. It's not straightforward. From the viewpoint of chemistry, the chlorine "Cl" should not be treated as "C" and "l". Maybe it will be some improvement if we re-design the charset. I used the implementation from keras-molecules, and when I tried to interpolate between 2 chemical structures (CC=C(C(=CC)c1ccc(O)cc1)c1ccc(O)cc1 and CN1C(=O)CCS(=O)(=O)C1c1ccc(Cl)cc1).
). I got something like these invalid structures below, so I guess the charset is the reason for this.
CC(C)(O)CCC1CCC(Cr)So2c1ccc(C)cc1
CCNC(=O)CN(CC1((l)CN1c1ccc(OC)cc1
CN1C(=O)CN(CC1((#)CN1c1ccc(OC)cc1
CN1C(=O)CC(CC**()(=O)C1c1ccc(Cl)cc1
CN
1C(=O)CC(NC()(=O)C1**c1ccc(Cl)cc1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions