Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YL-1 Add Insect Sex Pheromones to GlobalChem #320

Open
Lyq322 opened this issue Sep 5, 2024 · 8 comments · Fixed by #321
Open

YL-1 Add Insect Sex Pheromones to GlobalChem #320

Lyq322 opened this issue Sep 5, 2024 · 8 comments · Fixed by #321
Assignees
Labels

Comments

@Lyq322
Copy link
Collaborator

Lyq322 commented Sep 5, 2024

Add chemicals from Insect Sex Pheromones by Martin Jacobson to GlobalChem

@Lyq322
Copy link
Collaborator Author

Lyq322 commented Sep 5, 2024

I could not find the R/S configuration for this molecule:
'd-10-acetoxy-cis-7-hexadecen-1-ol': 'OCCCCCC\C=C/CC(OC(=O)C)CCCCCC'
Also, should I change the (+)/(-) and d/l in the smiles list to R/S so it is more consistent and easier to understand?

@Lyq322
Copy link
Collaborator Author

Lyq322 commented Sep 13, 2024

I calculated the tanimoto similarity scores between this list and one of the tranches in the zinc database (AAAA):
Screenshot 2024-09-12 at 11 02 50 PM
I found that most of the molecules in this list is not similar to the any of the molecules in the zinc database tranche.
The maximum tanimoto score was 0.3137 between these two molecules:
From zinc database:
Screenshot 2024-09-12 at 11 05 47 PM
From SMILES list:
image

@Sulstice
Copy link
Collaborator

Interesting, nice plot! It's probably because their combination algorithms are more drug designed base and less applicable to other chemical spaces. Tanimoto scorring is pretty strict:

    tanimoto_scores = DataStructs.BulkTanimotoSimilarity(fp, ref_fps)
    dice_scores = DataStructs.BulkDiceSimilarity(fp, ref_fps)
    kulczynski_scores = DataStructs.BulkKulczynskiSimilarity(fp, ref_fps)
    mcconnaughey_scores = DataStructs.BulkMcConnaugheySimilarity(fp, ref_fps)
    onbit_scores = DataStructs.BulkOnBitSimilarity(fp, ref_fps)
    rogot_goldberg_scores = DataStructs.BulkRogotGoldbergSimilarity(fp, ref_fps)
    russel_scores = DataStructs.BulkRusselSimilarity(fp, ref_fps)
    sokal_scores = DataStructs.BulkSokalSimilarity(fp, ref_fps)

    if all(x > criteria for x in tanimoto_scores):
        print ('Tanimoto Accepted: %s' % value)
    if all(x > criteria for x in dice_scores):
        print ('Dice Accepted: %s' % value)
    if all(x > criteria for x in kulczynski_scores):
        print ('Kulczynski Accepted: %s' % value)
    if all(x > criteria for x in mcconnaughey_scores):
        print ('Mcconnaughey Accepted: %s' % value)
    if all(x > criteria for x in onbit_scores):
        print ('On Bit Accepted: %s' % value)
    if all(x > criteria for x in rogot_goldberg_scores):
        print ('Rogot Goldberg: %s' % value)
    if all(x > criteria for x in russel_scores):
        print ('Russel: %s' % value)
    if all(x > criteria for x in sokal_scores):
        print ('Sokal: %s' % value)

There's a bunch of other similarity metrics as well that could be useful but by first glance not great. Can we compare to a fragrant database. The problem is that the data is usually sold rather than available open source:

https://github.com/Odeuropa

I found this, is anything we can use in here?

@Sulstice
Copy link
Collaborator

@ANUGAMAGE Review this PR and add it as a node into global-chem, this will increase the version as well and we can do a new release.

@Sulstice Sulstice linked a pull request Sep 13, 2024 that will close this issue
@Lyq322
Copy link
Collaborator Author

Lyq322 commented Sep 13, 2024

image
Is this molecule incomplete? The -yl suffix makes me think it's an ester. When I google the molecule, I also get results on the cyclopropyl propanoate ester being the pheromone of the American Cockroach and not the cyclopropane.

@Sulstice
Copy link
Collaborator

Maybe I wrote it wrong? I will check again on it. There's a little arrow and a star that says "Maybe not work". Idk what I meant there.

@Lyq322
Copy link
Collaborator Author

Lyq322 commented Sep 25, 2024

These are the pairs of molecules that have tanimoto similarity greater than 0.8 when compared to the pheromones list from the olfactionbase:
image

Compared to the ZINC database, olfactionbase definitely had more molecules that were similar to the pheromones.
Screenshot 2024-09-25 at 5 24 38 PM

@Sulstice
Copy link
Collaborator

Sulstice commented Sep 30, 2024

That's pretty interesting. It seems like the olfaction database is what people should be mining and not the zinc database. There should be a curated list of chemical data for different regions of chemical space.

@ANUGAMAGE reopening the issue because @Lyq322 and I are discussing. Usually would close a PR if an issue/discussion is resolved fully or a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging a pull request may close this issue.

2 participants