Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phase 1: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) #614

Closed
colleenXu opened this issue Apr 12, 2023 · 7 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Apr 12, 2023

Background

Overview

I use this sub-query for my example results below
{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0005377"],
                    "categories":["biolink:DiseaseOrPhenotypicFeature"],
                    "name": "noonan"
                },
                "n1": {
                    "categories":["biolink:Gene"]
                }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:caused_by"]
                }
            }
        }
    }
}

Our sub-queries to TRAPI KPs are 1-hop Predict "style" TRAPI queries with batches of IDs sent in each request.

We expect two kinds of results in the TRAPI-1.4 responses (mirrors the two scenarios of #603)...

no ID/node-expansion was involved

  • we expect 1 analysis object in result.analyses, so BTE can take that object's edge_bindings and they should be just like the result.edge_bindings in TRAPI 1.3
  • the edge(s?) there should be "flat", meaning they won't reference an auxiliary graph (aka they won't have an element in the attributes array where the attribute_type_id is "biolink:support_graphs")
example result

This is a "fake" expected result that isn't based on a real response from a TRAPI-1.4 KP...

            {
                "node_bindings": {
                    "n0": [
                        {
                            "id": "MONDO:0005377"
                        }
                    ],
                    "n1": [
                        {
                            "id": "NCBIGene:3315"
                        }
                    ]
                },
                "analyses": [
                    {
                        "resource_id": "infores:automat-biolink",
                        "edge_bindings": {
                            "eA": [
                                {
                                    "id": "54d9ed32bec4d12369592709e20c997f"
                                }
                            ]
                        }
                        "score": 0.8
                    }
                ]
            }

ID/node-expansion was involved

READ THIS FIRST:

  • This is still being discussed by the TRAPI team / Translator
  • the info below is based on TRAPI 1.4.0-beta3 and the discussions Jackson and I had on this topic

Notes:

  • we still expect 1 analysis object in result.analyses. But the edge(s)? in that object's edge_bindings may reference an auxiliary graph. When this happens, we expect 1 element in the attributes array where the attribute_type_id is "biolink:support_graphs". The value of that Attribute object should be 1 or more keys for auxiliary-graphs...
  • we could decide to drop these results! (if we want data for descendant IDs, we'll include them in the batch of IDs we send and ideally get edges back with no auxiliary graph references)
  • UPDATED 2023-04-26 discussion: if we want to keep the edges, but ignore/remove the edge-attribute with the aux-graph (since we'll "drop" that), we may want to generate a warning-level log so we know this is happening.
  • If we want to process these edges with auxiliary graph references, we'll want to:
    • get the referenced auxiliary-graph objects. We want those in the auxiliary_graphs section of our TRAPI response
      • implementation musing: may need to rename aux-graph key to keep it unique?
    • get the edges listed in those auxiliary-graph objects. We want those in the knowledge_graph.edges section of our TRAPI response
    • check if this set of edges reference auxiliary-graphs (will be in their attributes, same as before). If they do...repeat the two steps above (implementation musing: recursive behavior?)
  • when doing the next parts of query-execution, I imagine we'd use the main edge(s) from the result.analyses.edge_bindings. So we'd basically ignore the nested auxiliary-graphs and their edges...

Examples:

@colleenXu colleenXu changed the title phase 2: processing TRAPI-1.4 KP sub-query responses phase 2: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) Apr 12, 2023
@colleenXu colleenXu changed the title phase 2: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) phase 1: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) Apr 12, 2023
@colleenXu
Copy link
Collaborator Author

Specifically interested in @tokebe's view of this idea of dropping results when the analyses.edge_binding edges reference aux-graphs...

we could decide to drop these results! (if we want data for descendant IDs, we'll include them in the batch of IDs we send and ideally get edges back with no auxiliary graph references)

@colleenXu
Copy link
Collaborator Author

Note that COHD's dev instance seems to be on TRAPI 1.4 (we can access it through the registration we currently use, but they also registered a separate yaml for TRAPI 1.4)

However, I haven't checked their /query responses to see if they are using the aux-graph/result.analyses as we expect, and whether we can use it to develop and test our code for this issue...

From my post here: #597 (comment)

colleenXu added a commit that referenced this issue May 11, 2023
don't deploy this until #614 is addressed
aka BTE can handle TRAPI 1.4 KP responses
other code may be needed as well
colleenXu referenced this issue May 11, 2023
text-mining targeted and multiomics clinicaltrials. to support trapi 1.4 sources data ingest
@colleenXu
Copy link
Collaborator Author

colleenXu commented May 11, 2023

Deleted the previous comment (oops? should have edited or hidden it instead?).

@tokebe and I agreed to adjust the API_LIST config file rather than use SmartAPI overrides, because the names of the APIs were different between registrations.

The adjustments are in this branch main...trapi1-4-overrides and include...

  • COHD has a second registration for TRAPI 1.4 instances (2023-05-19: created a new one with dev + CI)
  • Automat KPs have second registrations for TRAPI 1.4 instances (only dev right now). However, some tools that we previously used are missing or don't have TRAPI 1.4 instances. Those have been removed in the API_LIST config file for the main branch (TRAPI 1.3 instances) and this branch
  • 2023-05-19: Connections Hypothesis Provider has a second registration for TRAPI 1.4 instances (only dev right now)

@tokebe
Copy link
Member

tokebe commented May 11, 2023

Above linked PR currently drops all KP result edges that have support graphs, per 1-on-1 discussion with @colleenXu. This might change if there's a good case in which we'd want to keep support graphs.

Note that support graphs on the result analysis are also not kept, but not used as criteria for dropping a result edge (these support graphs would typically explain result scoring, which we also don't use from TRAPI KPs).

@colleenXu
Copy link
Collaborator Author

colleenXu commented May 31, 2023

Note that I'm not sure that CHP's TRAPI 1.4 instance is working (dev only; http://chp.thayer.dartmouth.edu/query). When querying it directly, I'm getting either an empty response or a malformed error response. Looks like BTE is handling this somewhat...but I dunno if it could handle it more gracefully / intelligently?

BTE log example:

        {
            "timestamp": "2023-05-30T20:10:35.710Z",
            "level": "ERROR",
            "message": "call-apis: Failed POST http://chp.thayer.dartmouth.edu (1 ID): Gene > expressed_in > GrossAnatomicalStructure: (TypeError: Cannot read properties of undefined (reading 'id'))",
            "code": null
        },
query 1
{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:672"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": [
                        "biolink:GrossAnatomicalStructure"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expressed_in"]
                }
            }
        }
    }
}
response to query 1: empty KG/results
{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:672"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ],
                    "is_set": false,
                    "constraints": []
                },
                "n1": {
                    "ids": null,
                    "categories": [
                        "biolink:GrossAnatomicalStructure"
                    ],
                    "is_set": false,
                    "constraints": []
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "knowledge_type": null,
                    "predicates": [
                        "biolink:expressed_in"
                    ],
                    "attribute_constraints": [],
                    "qualifier_constraints": []
                }
            }
        },
        "knowledge_graph": null,
        "results": [
            {
                "node_bindings": {
                    "n0": [],
                    "n1": []
                },
                "analyses": [
                    {
                        "resource_id": "infores:connections-hypothesis",
                        "edge_bindings": {
                            "e0": []
                        },
                        "score": null,
                        "support_graphs": null,
                        "scoring_method": null,
                        "attributes": null
                    }
                ]
            }
        ],
        "auxiliary_graphs": null
    },
    "logs": [
        {
            "timestamp": "2023-05-30T20:14:12.898845",
            "level": "INFO",
            "message": "Running message.",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.898853",
            "level": "INFO",
            "message": "Getting message templates.",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.898917",
            "level": "INFO",
            "message": "Checking template matches for gene_specificity",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.900685",
            "level": "INFO",
            "message": "Detected 1 matches for gene_specificity",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.900690",
            "level": "INFO",
            "message": "Constructing queries on matching templates",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.900972",
            "level": "INFO",
            "message": "Sending 1 consistent queries",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.905156",
            "level": "INFO",
            "message": "Wildcard detected",
            "code": null
        },
        {
            "timestamp": "2023-05-30T20:14:12.905312",
            "level": "INFO",
            "message": "Received responses from gene_specificity",
            "code": null
        }
    ],
    "trapi_version": "1.4",
    "biolink_version": "3.1.2",
    "status": "Success",
    "id": "2adeb7ba-6b70-429b-97b1-384f8e9c80f1",
    "workflow": [
        {
            "id": "lookup"
        }
    ]
}
query 2

uses an ID they list in the example response of the /curies endpoint

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["ENSEMBL:ENSG00000106665"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": [
                        "biolink:GrossAnatomicalStructure"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expressed_in"]
                }
            }
        }
    }
}
query 2 response: malformed error

very long, only including snippets that seem useful

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/chp_api/web/dispatcher/views.py", line 45, in post
    return dispatcher.get_response(message)
  File "/home/chp_api/web/dispatcher/base.py", line 167, in get_response
    responses = get_app_response_fn(consistent_app_queries, self.logger)
  File "/usr/local/lib/python3.8/site-packages/gene_specificity/app_interface.py", line 25, in get_response
    response = interface.get_response(consistent_query, logger)
  File "/usr/local/lib/python3.8/site-packages/gene_specificity/trapi_interface.py", line 142, in get_response
    self._add_results(message, subject_mapping, qg_subject_id, [curie], subject_category, predicate, qg_edge_id, object_mapping, qg_object_id, object_curies, object_category, vals)

Exception Type: TypeError at /query
Exception Value: _add_results() missing 2 required positional arguments: 'object_category' and 'vals'
	<div id="explanation">
		<p>
			You’re seeing this error because you have <code>DEBUG = True</code> in your
			Django settings file. Change that to <code>False</code>, and Django will
			display a standard page generated by the handler for this status code.
		</p>
	</div>

</body>

</html>

EDIT: I found a query that works. However, (a) BTE wouldn't send a sub-query like this (where the ID is the object) and (b) BTE may not be able to process the response (only 1 result that contains all 30 "answers", as ifis_set: true was on the Gene QNode...)

query that works

This is the example given for their /query endpoint

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "ids": ["UBERON:0009835"],
                    "categories": ["biolink:GrossAnatomicalStructure"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expressed_in"]
                }
            }
        }
    }
}

query response:
response2.txt

@tokebe
Copy link
Member

tokebe commented Aug 3, 2023

Marking this one as done -- we'll treat the above as a new issue (tracked in #685)

@tokebe tokebe closed this as completed Aug 3, 2023
@colleenXu
Copy link
Collaborator Author

Note that other tools in Translator aren't doing subclassing w/ aux-graphs right now (it's an after-Sept goal). So...we'll open a new issue if we notice any issues processing their KP responses or we want to change our behavior...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants