Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to change google distance edge #1925

Closed
edeutsch opened this issue Oct 21, 2022 · 12 comments
Closed

Need to change google distance edge #1925

edeutsch opened this issue Oct 21, 2022 · 12 comments
Assignees

Comments

@edeutsch
Copy link
Collaborator

The semantic validator that Richard is putting together is almost ready to deploy. He and I have been doing some testing and there are still a few things for him to fix, but I've successfully run some ARAX Responses through the validator and for the most part is works and our output is good. A few of the flagged problems are things that he needs to fix. But one notable error is:

    {
      "context": "Query Graph n00--['biolink:has_normalized_google_distance_with']->n01 Predicate",
      "name": "biolink:has_normalized_google_distance_with",
      "code": "error.unknown"
    }

flagging the fact that we just made up biolink:has_normalized_google_distance_with and it doesn't exist. But apparently there is now a predicate biolink:occurs_together_in_literature_with
https://biolink.github.io/biolink-model/docs/occurs_together_in_literature_with.html#relation-occurs_together_in_literature_with

So I'm thinking we should switch all occurrences in our code to that. Agreed?

But the we also need a proper way to encode the metric. We currently do:

  "message": {
    "knowledge_graph": {
      "edges": {
        "N1_0": {
          "attributes": [
            {
              "attribute_source": null,
              "attribute_type_id": "EDAM:data_2526",
              "attributes": null,
              "description": "\n        Normalized google distance is a metric based on edge subject/object node co-occurrence in abstracts of all [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles. \n        The formula can be found here on [wikipedia.](https://en.wikipedia.org/wiki/Normalized_Google_distance) \n        Where in this case f(x,y) is the number of PubMed abstracts both concepts apear in, f(x)/f(y) are the number of abstracts individual concepts apear in, and N is the number of pubmed articles times the average numbver of search terms per article (27 million * 20).\n        ",
              "original_attribute_name": "normalized_google_distance",
              "value": "0.775434002709891",
              "value_type_id": null,
              "value_url": "https://arax.ncats.io/api/rtx/v1/ui/#/PubmedMeshNgd"
            }
          ],
          "object": "CHEMBL.COMPOUND:CHEMBL2109588",
          "predicate": "biolink:has_normalized_google_distance_with",

          "subject": "CHEMBL.COMPOUND:CHEMBL112"
        },

I'm thinking that we need to change:

  1. The predicate should change as described above
  2. The attribute_type_id is EDAM:data_2526 which is just "text data": https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_2526&lang=en&viewMode=All&siblings=false . I'm thinking we can do better than that.
  3. is the description accurate for what we're actually doing?
chunyuma added a commit that referenced this issue Oct 28, 2022
@chunyuma
Copy link
Collaborator

The predicate name has been changed from biolink:has_normalized_google_distance_with to biolink:occurs_together_in_literature_with used in all associated scripts in a new branch issue1925. All related tests have been passed. Now we need to wait the feedback for getting a better EDAM code to replace the original "EDAM:data_2526".

@edeutsch
Copy link
Collaborator Author

Great, thanks, @chunyuma !
If a replacement for EDAM:data_2526 seems like a long way off based on initial contact with Matt, Sierra, I would be fine to move ahead with just the predicate change. That will quiet an egregious error and only leave a minor warning that might take a time to resolve. @saramsey let us know whether you think Matt or Sierra can suggest a quick remedy or not.

@saramsey
Copy link
Member

I've posted this question on the EPC GitHub project area:
NCATSTranslator/Evidence-Provenance-Confidence-Working-Group#8

and DM'd Matt and Sierra about it.

@saramsey
Copy link
Member

saramsey commented Nov 4, 2022

Matt and Sierra have responded. Here is what Matt wrote:

Screen Shot 2022-11-04 at 9 38 47 AM

Here was the hyperlink that Matt provided: here

Does this help clarify our path forward?

@saramsey
Copy link
Member

saramsey commented Nov 4, 2022

Sierra asked me to explain how ARAX computes NGD. Here is what I wrote (hope I got it right?)

Screen Shot 2022-11-04 at 9 49 54 AM

@dkoslicki
Copy link
Member

@saramsey Can we have the link to the "s/s" google doc mentioned by Matt?

@edeutsch
Copy link
Collaborator Author

edeutsch commented Mar 4, 2023

So what's the state of sunsetting has_normalized_google_distance_with? Tyler is turning up the heat on non-compliant TRAPI. This still appears to be a problem. Perusing the issue here suggests that maybe the fix has been sitting stale in a branch for months? Are we waiting for something to happen first before merging this branch? Is the ranker prepared to handle this change?

@chunyuma or anyone else: what stands in the way of merging this into master?

@dkoslicki
Copy link
Member

@chunyuma , this is a simple name change, right? Or is there more to it than that? I don't see any reason why not to make it biolink compliant

@chunyuma
Copy link
Collaborator

chunyuma commented Mar 4, 2023

Sorry @edeutsch @dkoslicki, I previously thought that we might need to wait for the final decision about whether we need to change the EDAM code. It seems like our final decision is to keep it as is, right?

I have just merged the branch issue1925 to the master branch. I'm so sorry for the delay.

@dkoslicki
Copy link
Member

No worries at all @chunyuma ! I think this just fell through the cracks: nothing to be sorry about!

@edeutsch
Copy link
Collaborator Author

edeutsch commented Mar 4, 2023

yes, no problem, no one has really been checking, but now they're going to start checking so it is good that we roll this out now. thank you!

I wonder if there are other branches that are similarly sitting stale that could be merged? Maybe we should do a branch and issue cleaning party at a mini-hackathon soon?

@edeutsch
Copy link
Collaborator Author

edeutsch commented Mar 4, 2023

This is now rolled out in all the usual places where master is rolled out and seems to work. closing, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants