Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_set and KGEdgeID #88

Merged
merged 31 commits into from
Feb 2, 2022
Merged

is_set and KGEdgeID #88

merged 31 commits into from
Feb 2, 2022

Conversation

ariutta
Copy link
Collaborator

@ariutta ariutta commented Jan 28, 2022

This is a draft pull request intended to better handle consolidation of results, including these two cases:

  1. when is_set = true: consolidate results based on is_set query parameter biothings_explorer#341.
  2. when there are multiple edges connecting the same two node primaryIDs. Some bugs for this case were fixed by the updated consolidation logic, but to fully handle it properly, it was additionally necessary to incorporate @tokebe's fix: Experimental: Use Predicate when generating KGEdgeID #86. Now potentially two bugs can be closed:

Items remaining to address before this can be merged:

@ariutta
Copy link
Collaborator Author

ariutta commented Jan 29, 2022

For the following query graph, this new code will correctly produce one edge binding with two ids vs. the current production code that produces one edge binding with one id:

query graph w/ one edge w/ multiple predicates...
{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "object": "n1",
          "subject": "n2",
          "predicates": [
            "biolink:response_affected_by",
            "biolink:response_increased_by"
          ]   
        }   
      },  
      "nodes": {
        "n1": {
          "ids": [
            "NCBIGene:23221"
          ],  
          "categories": [
            "biolink:Gene"
          ]   
        },  
        "n2": {
          "categories": [
            "biolink:ChemicalEntity"
          ]   
        }   
      }   
    }   
  }
}

@ariutta
Copy link
Collaborator Author

ariutta commented Jan 29, 2022

As previously discussed, query graphs like the following are considered invalid, so it's OK in such cases that both the current production code and this new code fail to produce results:

query graph w/ two nodes, both connected by two edges, each w/ one predicate...
{
  "message": { 
    "query_graph": { 
      "edges": { 
        "e0": { 
          "object": "n1",
          "subject": "n2",
          "predicates": [ 
            "biolink:response_affected_by"
          ] 
        },
        "e1": { 
          "object": "n1",
          "subject": "n2",
          "predicates": [ 
            "biolink:response_increased_by"
          ] 
        } 
      },
      "nodes": { 
        "n1": { 
          "ids": [ 
            "NCBIGene:23221"
          ],
          "categories": [ 
            "biolink:Gene"
          ] 
        },
        "n2": { 
          "categories": [ 
            "biolink:ChemicalEntity"
          ] 
        } 
      } 
    } 
  } 
}

But even so, .update() probably shouldn't be called in this case (possibly handled by #87):
image

@ariutta
Copy link
Collaborator Author

ariutta commented Jan 29, 2022

This pull request may solve biothings/biothings_explorer#386

@colleenXu
Copy link
Contributor

so far, I think biothings/biothings_explorer#390 is addressed by this PR...

@colleenXu
Copy link
Contributor

Documenting the current automated tests that are failing....

expand this
 FAIL  __test__/integration/ResultAssembly.test.js
  ● Testing QueryResults Module › A2a query

    TypeError: record.$edge_metadata.trapi_qEdge_obj.isReversed is not a function

       8 |
       9 |   _getInputQueryNodeID(record) {
    > 10 |     return record.$edge_metadata.trapi_qEdge_obj.isReversed()
         |                                                  ^
      11 |       ? record.$edge_metadata.trapi_qEdge_obj.getObject().getID()
      12 |       : record.$edge_metadata.trapi_qEdge_obj.getSubject().getID();
      13 |   }

      at QueryGraphHelper._getInputQueryNodeID (src/helper.js:10:50)
      at src/query_results.js:234:39
          at Array.forEach (<anonymous>)
      at QueryResult.update (src/query_results.js:232:25)
      at Object.<anonymous> (__test__/integration/ResultAssembly.test.js:14:19)

  ● Testing QueryResults Module › n0 is-set true, a1_b2_a3_a4

    TypeError: record.$edge_metadata.trapi_qEdge_obj.isReversed is not a function

       8 |
       9 |   _getInputQueryNodeID(record) {
    > 10 |     return record.$edge_metadata.trapi_qEdge_obj.isReversed()
         |                                                  ^
      11 |       ? record.$edge_metadata.trapi_qEdge_obj.getObject().getID()
      12 |       : record.$edge_metadata.trapi_qEdge_obj.getSubject().getID();
      13 |   }

      at QueryGraphHelper._getInputQueryNodeID (src/helper.js:10:50)
      at src/query_results.js:234:39
          at Array.forEach (<anonymous>)
      at QueryResult.update (src/query_results.js:232:25)
      at Object.<anonymous> (__test__/integration/ResultAssembly.test.js:34:21)

  ● Testing QueryResults Module › n0 is-set true, a1_b1_a3_a4

    TypeError: record.$edge_metadata.trapi_qEdge_obj.isReversed is not a function

       8 |
       9 |   _getInputQueryNodeID(record) {
    > 10 |     return record.$edge_metadata.trapi_qEdge_obj.isReversed()
         |                                                  ^
      11 |       ? record.$edge_metadata.trapi_qEdge_obj.getObject().getID()
      12 |       : record.$edge_metadata.trapi_qEdge_obj.getSubject().getID();
      13 |   }

      at QueryGraphHelper._getInputQueryNodeID (src/helper.js:10:50)
      at src/query_results.js:234:39
          at Array.forEach (<anonymous>)
      at QueryResult.update (src/query_results.js:232:25)
      at Object.<anonymous> (__test__/integration/ResultAssembly.test.js:54:21)

  ● Testing QueryResults Module › is-set false

    TypeError: record.$edge_metadata.trapi_qEdge_obj.isReversed is not a function

       8 |
       9 |   _getInputQueryNodeID(record) {
    > 10 |     return record.$edge_metadata.trapi_qEdge_obj.isReversed()
         |                                                  ^
      11 |       ? record.$edge_metadata.trapi_qEdge_obj.getObject().getID()
      12 |       : record.$edge_metadata.trapi_qEdge_obj.getSubject().getID();
      13 |   }

      at QueryGraphHelper._getInputQueryNodeID (src/helper.js:10:50)
      at src/query_results.js:234:39
          at Array.forEach (<anonymous>)
      at QueryResult.update (src/query_results.js:232:25)
      at Object.<anonymous> (__test__/integration/ResultAssembly.test.js:100:21)

 PASS  __test__/unittest/QueryEdge.test.js
 PASS  __test__/integration/QEdge2BTEEdgeHandler.test.js
 PASS  __test__/unittest/redisClient.test.js
 PASS  __test__/integration/integrity.test.js
 PASS  __test__/integration/TRAPIQueryHandler.test.js (5.422 s)

Test Suites: 1 failed, 1 skipped, 16 passed, 17 of 18 total
Tests:       4 failed, 3 skipped, 148 passed, 155 total
Snapshots:   0 total

@ariutta
Copy link
Collaborator Author

ariutta commented Feb 1, 2022

Regarding the failing tests in ResultAssembly.test.js, I'll leave it up to @marcodarko to decide, with feedback welcome from @tokebe, @andrewsu and anyone else interested. That file was an experiment with a different way of writing tests, but then @marcodarko had to revert some changes in helper.js, and now those tests don't work.

@marcodarko
Copy link
Contributor

Those tests need to rewritten, looks like we need to create instances of the classes used, hardcoded results won't work for now.

@ariutta ariutta requested review from tokebe and removed request for andrewsu February 1, 2022 02:08
@ariutta
Copy link
Collaborator Author

ariutta commented Feb 1, 2022

@marcodarko, would you be able to take the lead on updating those tests and removing any unused .DSStore files?

@ariutta
Copy link
Collaborator Author

ariutta commented Feb 1, 2022

@andrewsu and @colleenXu (and anyone else interested), any thoughts on the vocab bullet point in the first comment? We could just leave query_results.js as-is for now and do any renaming later, or we could rename a few variables now if you prefer. The basic question is what to call the items produced at each of these stages:
https://github.com/biothings/bte_trapi_query_graph_handler/blob/is_set2_pr86/src/query_results.js#L204

Maybe these names are easiest to understand?

  1. records: what the external APIs return
  2. unconsolidatedResults: sets of records, each with one record per QEdge
  3. unconsolidatedResultsByResultID: grouped in preparation for consolidation
  4. consolidatedResults: one per result
  5. results: formatted according to the standard

The name unconsolidatedResults could potentially be a little confusing, because they don't always correspond 1:1 with trapi results. If any consolidation happens, then there will be more unconsolidatedResults than trapi results. Maybe unconsolidatedResults could instead be termed something like queryGraphSolutions, because each one is a combination of records that solves (is consistent with) the query graph.

Potential alternative terms:
- preresult -> unconsolidatedResult, atomicResult, recordCombination, recordSolution, queryGraphSolution or tripleCombination
- consolidatedPreresult -> consolidatedResult or mergedResult
- uniqueNodeID -> nodeID
- uniqueResultID -> resultID
- result -> trapiResult

@ariutta
Copy link
Collaborator Author

ariutta commented Feb 1, 2022

@tokebe, I added you to "reviewers", but I mostly just wanted to get your OK to close PR #86 in favor of this one. I wasn't able to actually merge your fix-390 branch into this one, but I manually incorporated it.

@marcodarko
Copy link
Contributor

@ariutta just confirming, yes I can do that!

@colleenXu
Copy link
Contributor

I'm thinking of deferring to @ariutta and @marcodarko on whether to do the vocab change in this PR vs a separate PR....I think it depends on how easy it is to do here (and whether it breaks anything)...

@ariutta
Copy link
Collaborator Author

ariutta commented Feb 2, 2022

As soon as the tests are updated and the .DSStore files are gone, I'm voting that this pull request be merged because it fixes multiple issues.

Regarding vocabulary, I think we should make some changes at some point, but we can do this in a subsequent PR.

Regarding memory usage, my tests don't indicate any significant difference when compared to main. If this is wrong, we could look into it further, but if it's no worse than what we have now, then it shouldn't be a blocker to merging this PR.

Copy link
Member

@tokebe tokebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding integration of fix-390 everything seems fine. I'll continue to test memory usage but I tend to agree with the current suspicion that it may be caching that's responsible rather than this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants