refactoring "records" to help test results-assembly #402

colleenXu · 2022-02-02T18:10:54Z

Involves query-handler module. we're thinking of creating a class (Record) and creating methods to do what we want and/or building a way to get records as a "readable json" format that can be exported / imported for tests...
would make them easier to serialize / cache (w/ less memory usage)
Would make it easier to write tests for results-assembly code

We've tried working on this with biothings/bte_trapi_query_graph_handler#88, but put it aside in order to get its fixes deployed

related to #379 and a long-term effort to get "power users" like Colleen able to write tests for module-behavior and for overall-behavior.

colleenXu · 2022-02-02T18:23:55Z

Right now, there isn't a separate issue for the "testing" goal. This is because we're still figuring out what this effort involves: would we need separate processes for writing tests for each module and for overall-behavior?

Also, Andrew drew this diagram when trying to understand the "test writing process" for results assembly

colleenXu · 2022-02-02T18:37:42Z

also regarding testing results-assembly: Jackson's code of running queries and checking: https://suwulab.slack.com/archives/CC218TEKC/p1643749685831709

check that every returned edge is used at least once in at least one result's edge_bindings
check that every returned node is used at least once in at least one result's node_bindings
check that every query node and query edge appears at least once in at least on result

tokebe · 2022-03-07T18:27:28Z

Just want to note that once the vocab refactor is done we'll be in a pretty good spot to take a look at this. Records basically come into existence from api-response-transform and are lightly changed with annotation, etc. in the call-apis package.

There may be some work to do in the api-response-transform package or it may be better to create a new Record class in the query_graph_handler that handles the newer implementation and leaves the transformer largely unchanged. This will become more clear after the refactor is done and some discussion is had.

colleenXu · 2022-03-08T07:15:21Z

I agree that the vocab refactor will help us understand what's going on.

I believe @marcodarko and @ariutta have previously been thinking about this and trying things out. But their attempt didn't fully work...

ariutta · 2022-03-23T16:42:57Z

QueryGraphHelper is already almost a Record class. Every method except _generateHash takes the input record:
https://github.com/biothings/bte_trapi_query_graph_handler/blob/main/src/helper.js

If the methods with input record were pulled out and refactored, we'd have a Record class. Then we could add a toString method for serialization. We could additionally create a method to "re-hydrate" the serialized output into a new instance of Record.

tokebe · 2022-03-23T17:05:59Z

This is definitely the general idea -- the main addition to this I would like to consider is possibly refactoring the internal data structure of a Record -- both to conform to new vocab standards and to avoid the current cyclic references in parts of the record object (or at least mitigate this as a problem during serialization/"rehydration").

ariutta · 2022-03-23T17:26:06Z

That sounds great

ariutta · 2022-03-23T17:35:59Z

Would toJSON be more JS-appropriate than toString?

tokebe · 2022-03-23T18:10:11Z

(edit: clarification)

I think, given the cyclic references, we'll need a modified structure to output to that can circumvent these references (which can later be rebuilt). So I agree that toJSON or perhaps a more specific/accurate verb might be the better choice. Anything we output to can easily be converted to a string afterwards.

ariutta · 2022-03-23T20:52:38Z

Yeah, toJSON sounds good. One benefit: JSON.stringify automatically looks for a toJSON method.

The cyclic references are tricky. That's where @marcodarko and I ran into trouble earlier.

tokebe · 2022-04-08T17:32:18Z

Below is the general interface for a FrozenRecord -- what you would expect from .toJSON().

interface FrozenRecord {
  subject: FrozenNode;
  object: FrozenNode;
  predicate: string;
  edgeAttributes: EdgeAttribute[];
  publications: string[];
  recordHash?: string; // always supplied by Record, not required from user
  api: string;
  apiInforesCurie: string;
  metaEdgeSource: string;
}

interface FrozenNode {
  original: string;
  obj?: NodeNormalizerResultObj; // always supplied by Record, not required from user
  qNodeID: string;
  isSet: boolean;
  curie: string;
  UMLS: string;
  semanticType: string;
  label: string;
  equivalentCuries?: string[]; // always supplied by Record, not necessarily required from user
  names: string[];
  attributes: any;
}

Most of these values are actually pulled from the apiEdge or the qXEdge. The idea here is that when making a test record, you no longer have to interact with these more "under the hood" parts and can instead use this interface to deal with the data you care about. When initializing a Record instance, the instance will generate a "fake" apiEdge and qXEdge to store the information so that the instance will behave normally and be compatible with the rest of BTE.

Meanwhile, in internal cases like caching, a much more minimal set can be frozen, with often-repeated parts like the apiEdge being pulled out to lower redundant information. When pulled from cache, this can be re-assembled properly into fully functional Record instances.

As it happens, the cyclic reference problem is not really an issue -- we don't need to reference one record through the qXEdge of another record ever, or other similar cases, so the cyclic reference is only there for convenience and can be broken without harming anything (this is also what allows "fake" qXEdges to be made for testing).

tokebe · 2022-05-09T15:52:53Z

Closing as this has been addressed by biothings/api-respone-transform.js/pull/29 and related.

colleenXu · 2022-05-11T17:05:24Z

Related to this issue?

An idea: Have a flag (like USE_THREADING = false?) that'll allow records from a query to a file as json (each hop/QEdge execution as a separate file)? Then can modify + use these records as "data" for a test

colleenXu changed the title ~~refactoring "records"~~ refactoring "records" to help test results-assembly Feb 2, 2022

tokebe closed this as completed Jul 19, 2022

tokebe mentioned this issue Sep 27, 2022

Refactor internal query graph representation #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactoring "records" to help test results-assembly #402

refactoring "records" to help test results-assembly #402

colleenXu commented Feb 2, 2022 •

edited

Loading

colleenXu commented Feb 2, 2022

colleenXu commented Feb 2, 2022 •

edited

Loading

tokebe commented Mar 7, 2022

colleenXu commented Mar 8, 2022 •

edited

Loading

ariutta commented Mar 23, 2022

tokebe commented Mar 23, 2022 •

edited

Loading

ariutta commented Mar 23, 2022

ariutta commented Mar 23, 2022 •

edited

Loading

tokebe commented Mar 23, 2022 •

edited

Loading

ariutta commented Mar 23, 2022

tokebe commented Apr 8, 2022

tokebe commented May 9, 2022

colleenXu commented May 11, 2022 •

edited

Loading

refactoring "records" to help test results-assembly #402

refactoring "records" to help test results-assembly #402

Comments

colleenXu commented Feb 2, 2022 • edited Loading

colleenXu commented Feb 2, 2022

colleenXu commented Feb 2, 2022 • edited Loading

tokebe commented Mar 7, 2022

colleenXu commented Mar 8, 2022 • edited Loading

ariutta commented Mar 23, 2022

tokebe commented Mar 23, 2022 • edited Loading

ariutta commented Mar 23, 2022

ariutta commented Mar 23, 2022 • edited Loading

tokebe commented Mar 23, 2022 • edited Loading

ariutta commented Mar 23, 2022

tokebe commented Apr 8, 2022

tokebe commented May 9, 2022

colleenXu commented May 11, 2022 • edited Loading

colleenXu commented Feb 2, 2022 •

edited

Loading

colleenXu commented Feb 2, 2022 •

edited

Loading

colleenXu commented Mar 8, 2022 •

edited

Loading

tokebe commented Mar 23, 2022 •

edited

Loading

ariutta commented Mar 23, 2022 •

edited

Loading

tokebe commented Mar 23, 2022 •

edited

Loading

colleenXu commented May 11, 2022 •

edited

Loading