-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve inconsistencies in Target2 Compound InChIKeys #80
Comments
Hi @FrenkT, Thank you for the detailed report and bringing this to our attention. I can confirm that the compounds are not different, which would mean that the metadata provided is not correct. I will look into this and talk to others who know more about how the metadata was generated and get back to you. |
@srijitseal will post more insights later, but for now, it appears that running StandardizeMolecule.py on JUMP-Target-2_compound_metadata.tsv resolves the discrepancies However, it does add a new discrepancy – we now have 301 unique entries, not 303, in the Target2 set. That's because in addition to "duplicates" of BVT-948, dexamethasone, and thiostrepton (noted here) jump-cellpainting/JUMP-Target#9 (comment), we also notes "duplicates" of ME-0328 and quinidine/quinine https://chat.openai.com/share/fa71ea33-0bc7-4699-b03a-a0ba41353164
In any case, this should keep you going for now @FrenkT Updates Per @srijitseal,
|
Thank you for looking into this @shntnu. Running the molecule standardisation script is definitely helpful. |
When addressing the inconsistencies between SMILES representations from two different sources, we found that discrepancies can be effectively resolved by considering tautomerization and enantiomerization processes. By accounting for this phenomenon, it becomes possible to reconcile differences in SMILES strings across Target2 metadata (which did not do this step) and JUMPCP (which tried to show the lowest energy tautomer). |
There is a new inconsistency however: Converting SMILES using cheminformatics tools like RDKit might sometimes lead to discrepancies from the forms listed on resources like Wikipedia. This can be due to various reasons, including tautomerization, enantiomerization, loss of stereochemical (E/Z) information, or other subtleties in how different software handle chemical structure standardization and normalization. |
@FrenkT I believe we have finally resolved everything here Please see jump-cellpainting/JUMP-Target#32 |
Hi all,
As a follow up from #77, I have been trying to map compound identifiers mentioned in the Target-2 plate map and metadata with compound identifiers provided for Target-2 plates in the JUMP metadata files.
As a result, I found 36 (out of 384) wells for which the compound in the JUMP metadata doesn't match the Target-2 metadata:
As you can see, the first layer of the InChIKey is different, so the mismatch shouldn't be due to just missing stereochemical information.
Note that each row of the table applies to all of the TARGET2 plates described in the metadata files except for those coming from
source_9
(I excluded these from my analysis code because they have a 1536 well layout and I wanted to keep things simple, see #77) and plateCP1-SC2-25
fromsource_7
(similarly, because it seems like the plate has a mirrored layout, see #77). So I ran this check on 131 plates, and for all of them I can find the differences described in the table above.Any idea on whether the compounds used in the experiments are actually different, as suggested by the InChIKeys, or whether there is some issue in the metadata files provided in this repo?
The text was updated successfully, but these errors were encountered: