Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial example query is broken #196

Closed
karafecho opened this issue Feb 6, 2024 · 12 comments
Closed

Tutorial example query is broken #196

karafecho opened this issue Feb 6, 2024 · 12 comments
Assignees

Comments

@karafecho
Copy link

This issue is to report that a user notified me that the example one-hop query that is posted in the tutorial is broken. I verified that it is not behaving as expected. This is a major issue due to the fact that not only is the query included in the tutorial, but it is also included in multiple slide decks that have been shared with users.

  1. The drop-down menu appears to be returning GO terms, e.g., GO:0019341 (also see screenshot). This is true for both 2,3,7,8-tetrachlorodibenzo-P-dioxin and tetrachlorodibenzo-P-dioxin.

image

  1. The CURIE that is posted in the tutorial (UMLS:C003965) no longer returns results. Name Resolver indicates that PUBCHEM.COMPOUND:15625 is the correct CURIE.
  2. The query runs when PUBCHEM.COMPOUND:15625 is entered for n0. However, neoplasm is no longer the top answer. That alone is not necessarily problematic, but (a) I'm no longer seeing the occurs_together_in_literature_with edge and (b) cancers are no longer as richly represented in the answer set, although there are now 10x more answers, so perhaps that's not surprising.
@EvanDietzMorris
Copy link

EvanDietzMorris commented Feb 6, 2024

There are a few separate issues here:

  1. The custom name resolver hooked up to the UI is not returning the proper ID for "2,3,7,8-tetrachlorodibenzo-P-dioxin". You can see the normal one does here:
    https://name-resolution-sri.renci.org/lookup?string=2%2C3%2C7%2C8-tetrachlorodibenzo-P-dioxin

  2. We have a bad curie in the tutorial. As far as I can tell that UMLS does not exist. I think UMLS identifiers starting with C are Concept Unique Identifiers and will always have 7 digits. UMLS:C0039651 is Tetracyclines so maybe that's where it came from?
    https://www.google.com/search?q=UMLS+%22C003965%22
    https://nodenormalization-sri.renci.org/get_normalized_nodes?curie=UMLS%3AC003965

  3. We must be running different queries somehow.. when I select that example query through the UI it does use PUBCHEM.COMPOUND:15625 to populate the query and I do get neoplasm as the top answer.. I don't see occurs_together_in_literature_with edges, but those come from omnicorp and not the robokop graph.

@EvanDietzMorris
Copy link

EvanDietzMorris commented Feb 6, 2024

@karafecho do you have the text from previous TRAPI results for this query? It's going to be hard to track down where the occurs_together_in_literature_with edges went without knowing what they were before.

Edit: I see now that the tutorial has some screen shots of the now-missing omnicorp edge, but in general having the TRAPI queries/results in text is far more helpful for troubleshooting stuff like this.

@karafecho
Copy link
Author

karafecho commented Feb 6, 2024

Re (2): I'm not understanding the issue with the CURIE. I tested every example that I put into the tutorial and elsewhere, so unless I somehow entered a typo, the CURIE that is currently in the tutorial should work. Plus, I know of at least two users who successfully used the tutorial to get started learning ROBOKOP. However, I guess it's possible that they were unsuccessful and simply didn't feel comfortable letting me know, or they didn't use the CURIe. All that said, see https://github.com/RobokopU24/Use-Cases/issues/1, which I created after I posted the tutorial. Maybe I did just enter the wrong CURIE in the tutorial?

Re (3), do you mean when you enter PUBCHEM.COMPOUND:15625, you get neoplasm as the top answer? But no occurs_together_in_literature_with edges?

Here's a couple of screenshots plus the JSON results. Unfortunately, I don't think I saved the original query/results, although I may be able to dig them up. I generally try to save results, but this one must have slipped through.

image

image

ROBOKOP_message(35).json

@karafecho
Copy link
Author

I'm wondering if the missing OmniCorps edges are related to the CURIE issue?

@EvanDietzMorris
Copy link

EvanDietzMorris commented Feb 6, 2024

  1. I suspect the wrong curie just got put into the tutorial. I don't think that would've ever worked given it doesn't seem to be a real UMLS id and I couldn't find it in new or old version of robokopkg. The "sample query" from the drop down in the UI has PUBCHEM.COMPOUND:15625 and so did your use case link, so I suspect the bogus UMLS has only been in the tutorial.

  2. Looks like we are running different queries, the sample query in the UI and the tutorial have associated_with not related_to, which returns neoplasm as the top result and explains why you're getting more results than expected.

re: omnicorp I think something is just broken with the omnicorp edges at the moment. I don't see them on any queries even when the logs show they are coming back from the service. I suspect versions of omnicorp and the Aragorn we have on robokop-u24 are out of sync and we need to upgrade to the latest versions, which I was planning on doing soon but now I'll bump it to high priority and try to get that done tomorrow and hopefully it'll fix that issue.

@karafecho
Copy link
Author

karafecho commented Feb 6, 2024

(2) Yeah, that's what it's sounding like. Weird. But at least that's an easy fix.

(3) Ahhh, I didn't realize that I used associated_with. That resolves that issue.

Re OmniCorp: Thanks for troubleshooting. I'll bet you're right.

Assuming the OmniCorp issue is due to the syncing issue, then there are two remaining bugs:

  1. The typo in the tutorial. I can fix that.
  2. The custom name resolver hooked up to the UI is not returning the proper ID for "2,3,7,8-tetrachlorodibenzo-P-dioxin".

@EvanDietzMorris
Copy link

Agreed.. fingers crossed the omnicorp issue is easily resolved by upgrading versions. David is looking into the name resolver issue.

@EvanDietzMorris
Copy link

The omnicorp issue persists. The correct edges are being returned by Aragorn but something is preventing the UI from showing them. David is looking into this as well.

@karafecho
Copy link
Author

Update re incorrect CURIE: RobokopU24/qgraph#273.

@Woozl
Copy link
Member

Woozl commented Mar 4, 2024

Sorry for the delay, there are a couple issues here. I think the missing node is actually missing from the underlying babel files. I need to see which version of the synonym files https://name-resolution-sri.renci.org/ is using, but the robokop nameres is using the latest set https://stars.renci.org/var/babel_outputs/2024jan9/synonyms/2024jan5/

However, 2,3,7,8-tetrachlorodibenzo-P-dioxin (PUBCHEM.COMPOUND:15625), doesn't seem to actually exist in that, based on this grep:

zgrep "curie\": \"PUBCHEM.COMPOUND:15625\"" /projects/stars/var/babel_outputs/2024jan9/synonyms/2024jan5/*.txt.gz

I need to ask Gaurav what versions we're running on the translator instance, which would hopefully clear this up.

@EvanDietzMorris
Copy link

In our devops repo it looks like 2023nov5 is the version currently deployed on https://name-resolution-sri.renci.org/ but we need to check with Gaurav to confirm.. if it's the case that 2,3,7,8-tetrachlorodibenzo-P-dioxin doesn't exist in the new files, but we just haven't deployed them to name resolution and node normalizer yet, that would explain this "bug" .. either way for the future we need to somehow make sure we use the same babel data that was used to normalize robokopkg for this instance of name resolution.. ORION saves the version of node normalization ie "2.3.5" from https://nodenormalization-sri.renci.org/openapi.json but doesn't know which version of babel that entails - let's discuss with Gaurav

@Woozl
Copy link
Member

Woozl commented Mar 13, 2024

Newest nameres deployment is using 2023nov5 set which has the 2,3,7,8-tetrachlorodibenzo-P-dioxin node, so this specific issue should be resolved. Gaurav has opened an issue for the 2024jan5 set at TranslatorSRI/Babel#242

@Woozl Woozl closed this as completed Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants