-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BDNF has no gene or protein interactions #150
Comments
This looks to me like a normalization issue - the CHEMBL identifier doesn't merge with the UniProt Ids. The CHEMBL one is considered a chemical entity while the UniProt is a Protein. I guess it makes sense to merge these? Do we have some other examples of CHEMBL identifiers for proteins? @gaurav what do you think? |
from TAQA: |
Repeated process. https://arax.ncats.io/?r=351c7c8b-9bbf-4818-bff5-22e6e2b09529
changed to HGNC curie and remove the small molecule category results in But, even trying to get the synonyms of the HGNC curie return the small molecule How would this impact queries? |
from TAQA: ABRINEURIN is not a SmallMolecule? That isn't a SmallMolecule. |
@Genomewide in the last screenshot in your most recent post, you showed the ARAX UI mapping I ask because I can't find a good database that maps between gene/protein IDs and chemical/drug IDs in cases where the therapeutic is a protein. If ARAX has found a source for that, I'm guessing the NodeNorm folks would be very interested in that mapping, and it could help solve this issue... |
Note that our production maturity is still using an older database and old Node Synonymizer system, while everything else (testing, staging, dev) is using a newer database and newer Node Synonymizer. So you will get different answers at these two sites: Old system (production): https://arax.transltr.io/?term=HGNC:1033 New system (staging): https://arax.ci.transltr.io/?term=HGNC:1033 |
@andrewsu, @Genomewide - What can we do here for BDNF having no gene or protein interactions? @gaurav - can node normalizer help in any way here? |
Two thoughts from me:
|
I agree with Andrew about the likely group that is affected by this. If there was a way to identify some group of genes that needed this by looking at the list of biologics or some low hanging fruit, I would think it would be worth it . If there was a somewhat targeted way to get 50-80% of them linked up I would think it would be worth a little time. I don't know how insulin does not suffer from the same problem. Is there a fix that was done before? |
It's not easy anymore to be sure, but I think it probably was this record: So based on ChEMBL and UniProtKB giving it alternate names of abrineurin |
If name merging is the only way to bridge between the compound/drug identifiers and the gene/protein identifiers, and we've already established that name merging comes with some undesirable properties, then I suspect that there is not "low lying fruit" to be harvested here. And given that this type of drug-protein edge is not directly the subject of one of our MVP queries, my vote is to table this issue until later (post-fall).
I think insulin will have exactly the same problem. The ARAX synonymizer merges the compound/drug IDs and the gene/protein IDs on prod (https://arax.transltr.io/?r=2d241a3c-25f1-498a-9eab-ff618f65b68c), but not on CI (https://arax.ci.transltr.io/?r=2d241a3c-25f1-498a-9eab-ff618f65b68c) |
I agree with @andrewsu that this is a tough problem that we will probably not solve by Sept. I think (?) that the correct response might be to allow a conflation between chemical/drug and protein but it's going to take some work to implement that and test it out. FWIW, I am not a big fan of name merging for the reason that Eric mentions - I think that there is plenty of structured equivalence or other relationship information that we should try to take advantage of first. |
since there are two votes in favor of tabling this until the fall (and no opposed), going to adjust the milestone now... |
NodeNorm seems to have split abrineurin into multiple cliques: https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=HGNC%3A1033&curie=UniProtKB%3AP23560&curie=CHEMBL.COMPOUND%3ACHEMBL2108230&curie=MESH:C415772&curie=PR:000004716&curie=UNII:A1ED6W905I&curie=UNII:86ZE5V51WT&conflate=false&drug_chemical_conflate=false&description=false Some of these might be genuine splits, but I think something is going wrong in protein conflation here. I'm tracking this at TranslatorSRI/NodeNormalization#224. Is it correct to assume that this is at the same level of priority as other cliquing issues, or is there something particularly bad about this issue? |
@Genomewide can you please retest, i'm not clear what we're looking for |
Searching for any edge with BDNF to another gene or protein returns zero results from the ARAX UI. This is a well-studied gene and just looking at SPOKE shows a number of connected proteins.
The query is the query that failed to return results:
BTE returns results when I used the HGNC:1033. It appears that BTE adjusts the categories from small molecules to small molecules and gene.
ARAX returned results without modifying it. HGNC:1033 still worked to return results. However, it will not using the ARAX query builder that returns CHEMBL.COMPOUND:CHEMBL2108230 for BDNF.
The text was updated successfully, but these errors were encountered: