Selfies alphabet token changes #58

whitead · 2021-08-08T14:06:11Z

sf.encoder('C#[S-]')

outputs

'[C][#S-expl]'

which uses a token [#S-expl] that does not appear on get_semantic_robust_alphabet. The closest token I see is [#S+1expl].

This arises by running the following code:

start_selfies = '[C][#S-1expl]' 
start_smiles = sf.decoder(start_selfies)
end_smiles = rdkit.Chem.MolToSmiles(rdkit.Chem.MolFromSmiles(start_smiles))
print('smiles change', start_smiles, end_smiles)
end_selfies = sf.encoder(end_smiles)
print('selfies change', start_selfies, end_selfies)

smiles change C#[S-1] C#[S-]
selfies change [C][#S-1expl] [C][#S-expl]

Is this intended behavior? I'm having this happen while running STONED and it's changing my alphabet. Any ideas would be appreciated!

The text was updated successfully, but these errors were encountered:

MarioKrenn6240 · 2021-08-09T17:44:49Z

Thanks for this comment. We will constrain the decoder to output only one version, [X-1expl] in the next version (after the workshop).

alstonlo · 2021-10-23T21:06:59Z

Hi @whitead,

In selfies 2.0.0, we have constrained selfies.encoder to one unique representation for every atom, which also matches the symbols returned by selfies.get_semantic_robust_alphabet. This should prevent alphabet changes! For example,

print(sf.encoder("C#[S-1]"))  # [C][#S-1]
print(sf.encoder("C#[S-]"))   # [C][#S-1]
print(sf.encoder("[CH]#C"))   # [CH1][#C]
print(sf.encoder("[CH1]#C"))  # [CH1][#C]

Thanks for the suggestion!

whitead · 2021-10-25T11:11:03Z

Sounds good to me. Thanks!

MarioKrenn6240 added the enhancement New feature or request label Aug 9, 2021

whitead closed this as completed Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selfies alphabet token changes #58

Selfies alphabet token changes #58

whitead commented Aug 8, 2021

MarioKrenn6240 commented Aug 9, 2021

alstonlo commented Oct 23, 2021

whitead commented Oct 25, 2021

Selfies alphabet token changes #58

Selfies alphabet token changes #58

Comments

whitead commented Aug 8, 2021

MarioKrenn6240 commented Aug 9, 2021

alstonlo commented Oct 23, 2021

whitead commented Oct 25, 2021