Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGR API Yaml #127

Merged
merged 4 commits into from
Oct 30, 2023
Merged

AGR API Yaml #127

merged 4 commits into from
Oct 30, 2023

Conversation

mnarayan1
Copy link
Contributor

AGR API yaml file, for gene-disease relationships. Addresses this issue.

Notes:

  • There are multiple possible gene ID types for the _id field (MGI, AGR, HGNC, etc). I wasn't exactly sure how to tackle this, so I wrote an x-bte operation for each id type (mgi-gene-disease, rgd-gene-disease, etc.)
  • I only included the id types in x-bte-response-mapping, but are there any other relevent fields I should include?

Problems:
Using this API record, I'm assuming that querying the gene FB:FBgn0038376 should return the disease DOID:9970 (dyschromatosis universalis hereditaria). This is the query I ran:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"],
                    "ids": ["FB:FBgn0038376"]
                },
                "n1": {
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

However, BTE is retrieving 0 successful results. My local installation of BTE is working fine, so I'm assuming that something is wrong with the annotations themselves. How can I fix this?

@andrewsu
Copy link
Contributor

@mnarayan1 on quick glance, your TRAPI query and your smartAPI annotation look good to me. When you say "My local installation of BTE is working fine" I assume you've gotten local overrides working on your local instance? And do you see zero results for other gene identifiers (e.g., HGNC, wormbase, xenbase, etc.)?

@mnarayan1
Copy link
Contributor Author

@andrewsu The other gene identifiers are not working either. I have local overrides working on my local instance, and BTE was able to successfully load AGR into smartapi_specs.

Here is the message I get when I try to run the above query.
{
    "description": "Query processed successfully, retrieved 0 results.",
    "schema_version": "1.4.0",
    "biolink_version": "3.5.0",
    "workflow": [
        {
            "id": "lookup"
        }
    ],
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": [
                        "MGI:1096330"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:Disease"
                    ]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        },
        "knowledge_graph": {
            "nodes": {},
            "edges": {}
        },
        "results": []
    },
    "logs": [
        {
            "timestamp": "2023-07-31T17:29:29.435Z",
            "level": "INFO",
            "message": "Expanded ids for node n0: (1 ids -> 1 ids)",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.416Z",
            "level": "INFO",
            "message": "Node n0 with id [MGI:1096330] assigned category [biolink:Gene] inferred from id.",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.417Z",
            "level": "DEBUG",
            "message": "BTE identified 2 qNodes from your query graph",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.417Z",
            "level": "DEBUG",
            "message": "BTE identified 1 qEdges from your query graph",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.426Z",
            "level": "DEBUG",
            "message": "Edge manager is managing 1 qEdges.",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.426Z",
            "level": "DEBUG",
            "message": "Edge manager is sending next qEdge 'e01' for execution.",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.426Z",
            "level": "INFO",
            "message": "Executing e01: n0 --> n1",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.801Z",
            "level": "DEBUG",
            "message": "REDIS cache is not enabled.",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.802Z",
            "level": "DEBUG",
            "message": "BTE is trying to find metaKG edges (smartAPI registry, x-bte annotation) connecting from Gene to Disease with predicate undefined",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.817Z",
            "level": "DEBUG",
            "message": "BTE found 9 metaKG edges corresponding to e01. These metaKG edges comes from 1 unique APIs. They are BioThings AGR API",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.820Z",
            "level": "DEBUG",
            "message": "BTE found 1 metaKG for this batch.",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.820Z",
            "level": "DEBUG",
            "message": "Resolving ID feature is turned on",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:32.820Z",
            "level": "DEBUG",
            "message": "call-apis: 1 planned queries for edge e01",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:33.827Z",
            "level": "DEBUG",
            "message": "Successful POST https://biothings.ncats.io/agr (1 ID): Gene > gene_associated_with_condition > Disease (obtained 0 records, took 967ms)",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:33.827Z",
            "level": "DEBUG",
            "message": "call-apis: Total number of records returned for this query is 0",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:33.827Z",
            "level": "DEBUG",
            "message": "call-apis: qEdge queries complete in 1s",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:33.828Z",
            "level": "INFO",
            "message": "e01 execution: 1 queries (1 success/0 fail) and (0) cached qEdges return (0) records",
            "code": null
        },
        {
            "timestamp": "2023-07-31T17:29:33.829Z",
            "level": "WARNING",
            "message": "qEdge (e01) got 0 records. Your query terminates.",
            "code": null
        }
    ]
}

@colleenXu
Copy link
Collaborator

@mnarayan1

Sorry for such a belated response. Are you still available to work on this issue? If not, it's not a problem - I'll merge the PR which will preserve the record of work you've done, then add commits...

I've found the reasons why the x-bte annotation wasn't working, and I have a list of proposed fixes (the minimum needed to get the annotation working)

(1) writing separate sets of operations for each data subset/gene-ID-namespace combo

This is necessary for current x-bte annotation because the different data subsets represent different relationships that we can assign different biolink predicates to. Also, the different ID-namespaces need to be handled differently (see next points)...

Notes:

  • that could mean a combinatorial explosion of operations >.<. We can cut down by only writing operations if they cover > 5 records/documents.
  • there's 4 data subsets that we could annotate (not negation: agr.biomarker_via_orthology, agr.implicated_via_orthology, agr.is_implicated_in, agr.is_marker_for)
  • multiple gene ID-namespaces involved (MGI, RGD, SGD, etc). Madhumita has already listed them in yaml comments

(2) for requestBody.body.q: use replPrefix() so BTE adds the prefixes needed for the querying

BTE doesn't always automatically add prefixes to IDs when generating the queries. It looks like for this API, all the IDs (field _id) have prefixes that need adding (gene namespaces and DOID)

Example:

        requestBody:
          body:
            ## API data has prefix
            ## joinSafe is only needed if the delimiter isn't a comma
            q: "{{ queryInputs | replPrefix('MGI') }}"
            scopes: _id

(3) parameters.fields adjustments

  • fields (besides _id) are missing the root field: they should start with agr.
  • We can add the agr.symbol for each operation. This may be useful since Translator's NodeNorm may not support every namespace (could check here or put IDs into the endpoint)

(4) response-mapping adjustments

Right now, it doesn't work because: (1) many references are to x-bte-response-mapping/gene but that doesn't exist (the two objects in response-mapping are drug and disease), and (2) the drug object includes multiple output fields which currently isn't supported in x-bte annotation/BTE...

To fix:

  • 1 response-mapping object per output field (so agr.biomarker_via_orthology.doid and agr.implicated_via_orthology.doid would be in separate objects)
  • and 1 response-mapping object per ID-namespace (so RGD: _id and MGI: _id would be in separate objects)
  • make sure the response_mapping ref for each operation points to an existing object in the x-bte-response-mapping section

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 11, 2023

And a note (mostly to my future self), here's the other stuff I noticed. It's not essential now, but will be for getting the AGR SmartAPI yaml fully ready

click to expand

  • version: I'm not sure if this is valid. The metadata endpoint seems to show that the data download is 2021?
  • info.x-translator.infores: this needs to be a separate new one for this api, and registered in the infores registry
  • info.x-tranlsator.biolink-version: this can be updated to 3.5.3
  • servers.url: Production server url should (?) be changed to http (right now it's https which makes it the same as encrypted one)
  • For operations, we could likely add qualifier for species o_0 since each namespace is species-specific! That's cool!
  • For the operation's source: does infores:agrkb exist in registry? Or is it AGR?

@colleenXu
Copy link
Collaborator

After discussion with Andrew, we've decided to merge this PR and I'll proceed with updating the yaml to complete biothings/biothings_explorer#260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants