clear language / documentation of vocab + data structures #379

colleenXu · 2021-12-22T23:19:17Z

I think there's a bit of confusion (maybe this is just on my part?) on vocab that makes it difficult to discuss what data structures BTE modules are actually handling.

There's:

Hit (having an output ID in the field specified by x-bte or TRAPI) vs Record (1 piece of data from a response) vs result (a compiled collection of nodes + edges that matches the query graph)
Restraint (the cut-off code?) vs constraint (the node property filtering?) vs capping vs other-filtering (promiscuous nodes?)
QEdge (from query graph) vs Edge (1 connection between nodes built from data we get from sub-queries) vs MetaEdge (how to do sub-queries)
1 ID/curie vs an entity/Node (that can have multiple corresponding IDs + 1 primary ID) vs the data structures for ID resolution

Other things that are probably confusing to newcomers but are "normal" now:

QGraph and how it relates to knowledge graph and results in the response
node category vs node semantic type (same thing)
primary ID vs other IDs
edge ID vs subject-object-predicate triple
predicate vs relation

ariutta · 2022-02-02T04:52:57Z

Discussion regarding vocabulary in query_results.js for the query graph handler:
biothings/bte_trapi_query_graph_handler#88 (comment)

colleenXu · 2022-02-02T17:43:10Z

This will be discussed further as we discuss "how to write tests for the results-assembly process"

and we want to discuss case...https://betterprogramming.pub/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841

tokebe · 2022-02-02T20:13:52Z

Just adding a few notes here:

We want to stick to camelCase as it's pretty much the standard in javascript, with PascalCase for class names
While we do want descriptive names for each thing, we probably want to keep to shorter variable names for code readability. There's a balance we can probably find between clarity and length, for instance the class QNode is short for QueryNode, which is relatively easy to infer, while keeping the name distinct from others using the word node.
- I bring this up because I suspect that something like unconsolidatedResultsByResultID will tend to blend together a little when scanning a file, especially when mixed with other relatively similar names like unconsolidatedResults and consolidatedResults

tokebe · 2022-02-02T20:38:21Z

I also just want to define the scope for which we're going to try and do this -- along the lines of in-code language, I assume the main areas of focus will be the query_graph_handler and call-apis packages, while overall we're trying to develop a clear vocabulary to be used in just about any discussion. Does this sound acceptable, at least for a start?

It might be best to go file by file and identify specific variable names/concept names that we want to discuss, and build our final list through some basic discussion over these?

ariutta · 2022-02-03T19:08:15Z

Regarding unconsolidatedResultsByResultID, shorter variable names are better, as long as they remain understandable, because variables that are too long can be unwieldy. Here's one opinion:

Give as descriptive a name as possible, within reason. Do not worry about saving horizontal space as it is far more important to make your code immediately understandable by a new reader. Do not use abbreviations that are ambiguous or unfamiliar to readers outside your project, and do not abbreviate by deleting letters within a word.

https://google.github.io/styleguide/jsguide.html#naming-rules-common-to-all-identifiers

Regarding variable names for maps/objects, I've seen conventions like valueByKey, valueForKey, keyToValue and keyToValueMap. There are arguments pro/con for each one, e.g.,

keyToValue matches the left to right order of {"key": value}, but a quick scan of a variable name like idToRecords for data like {'abc': [record0, record1, ...]} could suggest idToRecords is an array because of the final s.
keyToValueMap makes it clear this isn't a function, but naming it Map could be a bit confusing if it's actually just a "normal" object, not an actual JS Map.

I don't have any strong preferences here, as long as it's all consistent and understandable.

colleenXu · 2022-02-08T20:41:14Z

note on CURIE (vs ID) and prefix stuffs: https://cthoyt.com/2021/09/14/curies.html. We use prefixes from biolink-model (either what biolink model has at the top or what they list under semantic types for nodes)

tokebe · 2022-03-01T16:28:16Z

Going to begin posting items for discussion here as I move through the process of renaming.

First up: As far as I can tell, smartAPIEdge and metaEdge refer to the same concept -- should we note that the two can be used interchangeably, or pick only one to use? Or am I mistaken about them being the same?

colleenXu · 2022-03-01T21:20:45Z

@tokebe it might help to link to code lines where the terms are used, so we can see context.

Brainstorming on what I think this first concept is...

I think I've been calling the thing "metaKGEdges" or "operations" when communicating....
historically, I think we called these things "smartAPIEdge" or something like this, because the TRAPI "meta knowledge graph" concept wasn't implemented/didn't exist. So BTE only used the info from the smartAPI registry yaml's x-bte kgs operation annotation...
which brings up a point: do we need different terms to describe the stuff from x-bte annotation VS TRAPI meta_knowledge_graph?

tokebe · 2022-03-01T21:44:47Z

@colleenXu The majority of uses of smartAPIEdge are in qedge2bteedge, while metaEdge is not actually used in-code, but has been used in discussion. I think the best idea will be to use metaEdge instead of smartAPIEdge in basically all cases.

In my exploratory refactoring I'm also using metaXEdge to differentiate when a metaEdge has been paired with a queryExecutionEdge (qXEdge for short).

As far as I can tell, we don't really need differentiation of x-bte annotation vs TRAPI outside of what currently exists, as once the metaKG is retrieved for execution in qedge2bteedge, this doesn't need to be distinguished. The smartapi-kg package differentiates these I think as far as is needed.

colleenXu · 2022-03-01T21:52:34Z

I'll leave it to your discretion then! I'm not clear on the "queryExecutionEdge / pairing with metaEdge". Like what's the diff between queryExecutionEdge and qEdge (from the QGraph / query request body)? And does the metaEdge change once it's paired, enough that we want to use a diff name?

I imagine that these are concepts/constructs in the code that I'm not fully clear on...

tokebe · 2022-03-01T22:01:03Z

A qEdge exists on the query graph, while a qXEdge is an internal representation of a qEdge with additional properties and methods, which seemed sufficiently different to define separately just to keep track of the changing data types in the code.

tokebe · 2022-03-03T15:55:38Z

Another question for vocabulary:

bteEdges are essentially metaXEdges that have been duplicated and had additional information attached so that queries may be generated using them. They have a 1:1 relationship with sub-queries (excepting paginated sub-queries, which are technically made multiple times, but are the same query related to one bteEdge).

However, the name sort of conflicts with the bteGraph, the edges of which by convention would be bteEdges, which could lead to some potential confusion. Should we rename bteEdges to something else? What would be an appropriate name?

Perhaps they could be named subQueryEdge or APIEdge to make their 1:1 relationship with subQueries more apparent?

tokebe · 2022-03-03T17:03:55Z

@ariutta regarding uniqueNodeID vs simply nodeID, Would it be acceptable to stick with uniqueNodeID? I think that name better implies the purpose of the concept and makes it clearly separate from something like qNodeID.

tokebe · 2022-03-03T17:05:42Z

Additionally, would it be acceptable to use ucResult and cResult as short forms for unconsolidatedResult and consolidatedResult, given that function names applicable first occurrences use the long form for clarity?

tokebe · 2022-03-03T17:57:19Z

@colleenXu Another question: Since results and the kg in a response from BTE are supposed to be somewhat separate, what do we want to use for internal language for response kg nodes/edges? Simply kgNode and kgEdge, or is there something more specific that would make sense?

ariutta · 2022-03-03T19:10:58Z

Regarding uniqueNodeID vs nodeID, I'm fine either way. Technically, an ID should be unique, so it sounded a bit redundant, but, as you point out, just nodeID alone could be confusing.

colleenXu · 2022-03-03T19:58:42Z

maybe "edge" isn't the best noun here, and something like an "operation" / "action" / "plan" is better? I agree on using something more descriptive (what it does or what it helps with) is better than "bteEdge" which is very generic (it's an edge thing and this tool is called BTE :P)
I've been okay with using Node/Edge, similar to TRAPI....or picking a longer name that makes it clear that we're working with the "trapi response's KG" (kgNode / kgEdge, trapiNode / trapiEdge)

Maybe this is useful context? ⬇️ @tokebe @marcodarko @ariutta

I think Kevin originally designed the internal representation of BTE's info (after transforming raw sub-query responses) as independent of the TRAPI standard. I remember that there was an internal "table" of info (pandas dataframe) or "graph" representation (networkx graph) in the original Python-based tool. There was then a step to build a TRAPI response with its KG/results format for export. If this design carried on to the current tool, that may help explain some of the internal code...

tokebe · 2022-03-03T20:02:27Z

For 2. In that case, perhaps we should try to keep it somewhat in-line with TRAPI and say that kgNodes/kgEdges, unless otherwise stated, are TRAPI response nodes/edges?

tokebe · 2022-03-04T18:59:42Z

The above mentioned PRs contain all the changes I've currently made. We'll likely also need a pass on api-response-transform and biomedical_id_resolver, and possibly smartapi-kg, however I felt that these should be a second pass after we've adequately established the majority of our standardized vocab through these more 'central' packages.

tokebe · 2022-03-10T16:53:04Z

@colleenXu One final question regarding this pass, some TRAPI logs reference SmartAPI edges when they are really talking about metaEdges -- how should we handle this? We could:

Not change it (BTE is trying to find smartAPI edges...)
Change to metaKG (not quite the convention, but more clear to user than metaEdge) (BTE is trying to find metaKG edges...)
Use metaKG but explain what that means (BTE is trying to find metaKG (smartAPI + x-bte) edges...)
Something else?

colleenXu · 2022-03-14T04:57:40Z

@tokebe
I'm thinking of 3, but something like BTE is trying to find metaKG edges from smartAPI registry (x-bte annotation). Maybe that will make what is happening clearer to the developer?

sorry for the late reply.

tokebe · 2022-03-15T14:33:22Z

Just for ease-of-access here is the finalized list of vocabulary replacements in the first pass:

View Table

New standard term, short-form	Term meaning	Old term (to replace)	Where old term is used
queryEdge, qEdge	Edge in TRAPI query graph	Edge	lots of places
queryEdgeID, qEdgeID	ID of TRAPI query graph edge, e.g. "e01"	edge_id	query_graph_handler.query_graph
queryNodeID, qNodeID	ID of TRAPI query graph node, e.g. "n01"	node_id	query_graph_handler.query_graph
queryEdgeHash, qEdgeHash	A uniquely identifying (across BTE queries) hash of a qEdge	-	-
queryExecutionEdge, qXEdge	Internal UpdatedExeEdge representation of a qEdge to be executed	edge, queryEdge, qEdge	lots of places
queryExecutionEdgeHash, qXEdgeHash	A uniquely identifying (across BTE queries) hash of a qXEdge	hashedEdgeID	query_graph_handler.cache_handler
APIEdge	Internal representation of a smartAPIEdge/qXEdge combo, per subquery	bteEdge	lots of places
metaEdge	An edge in the metaKG	smartapi_edge	query_graph_handler.qedge2bteedge
metaExecutionEdge, metaXEdge	A metaEdge paire with a qXEdge (such as the reasoner_edge property)	smartapi_edge	query_graph_handler.qedge2bteedge
Curie	A "compact URI" -- consisting of a "prefix" and an "identifier" separated by a colon -- used to uniquely identify an entity	ID	lots of places
PrimaryCurie	The 'main' Curie identifying an entity chosen from many valid equivalent CURIEs, usually chosen based on a ranked priority of prefixes	PrimaryID	lots of places
record	A single unit of transformed data from a sub-query response	hit, record, result	lots of places
recordHash	A uniquely identifying hash of a record	edgeID	query_graph_handler.graph
uniqueNodeID	A unique identifier for a node while generating the results graph	uniqueNodeID	query_graph_handler.query_results
queryGraphSolution	A collection of records representing a valid solution to the query graph	preresult	query_graph_handler.query_results
consolidatedSolution	A consolidated set of all records representing a valid solution to the query graph (needs clarification)	consolidatedPreresult	query_graph_handler.query_results
trapiResultID	A unique identifier for a result in the TRAPI response from BTE	uniqueResultID	query_graph_handler.query_results
result	1 item of the array in the TRAPI response (message.results)	-	-

tokebe · 2022-08-24T19:39:20Z

Closing as this issue has been well-addressed and should be considered as a coding style directive moving forward.

colleenXu mentioned this issue Dec 22, 2021

create user-friendly log for the TRAPI output #328

Closed

ariutta mentioned this issue Jan 29, 2022

is_set and KGEdgeID biothings/bte_trapi_query_graph_handler#88

Merged

7 tasks

colleenXu mentioned this issue Feb 2, 2022

refactoring "records" to help test results-assembly #402

Closed

colleenXu mentioned this issue Feb 8, 2022

Create a way to export the query execution plan #278

Closed

This was referenced Mar 4, 2022

Refactor: Conform variable names to standardized vocabulary (pass#1) biothings/bte_trapi_query_graph_handler#93

Merged

Refactor: Conform variable names to standardized vocabulary (pass#1) biothings/call-apis.js#47

Merged

tokebe mentioned this issue Jun 2, 2022

A high-level complete sequence diagram of BTE #456

Closed

colleenXu mentioned this issue Jun 3, 2022

update ICEES APIs / Problem with ICEES Asthma #421

Closed

tokebe closed this as completed Aug 24, 2022

colleenXu mentioned this issue Oct 12, 2023

support JQ for API response transformation #489

Closed

colleenXu mentioned this issue Oct 24, 2024

Pathfinder: templates appear to not be filled out properly #893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear language / documentation of vocab + data structures #379

clear language / documentation of vocab + data structures #379

colleenXu commented Dec 22, 2021 •

edited

Loading

ariutta commented Feb 2, 2022

colleenXu commented Feb 2, 2022 •

edited

Loading

tokebe commented Feb 2, 2022

tokebe commented Feb 2, 2022

ariutta commented Feb 3, 2022

colleenXu commented Feb 8, 2022

tokebe commented Mar 1, 2022

colleenXu commented Mar 1, 2022

tokebe commented Mar 1, 2022

colleenXu commented Mar 1, 2022

tokebe commented Mar 1, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

ariutta commented Mar 3, 2022

colleenXu commented Mar 3, 2022 •

edited

Loading

tokebe commented Mar 3, 2022

tokebe commented Mar 4, 2022

tokebe commented Mar 10, 2022

colleenXu commented Mar 14, 2022 •

edited

Loading

tokebe commented Mar 15, 2022 •

edited

Loading

tokebe commented Aug 24, 2022

clear language / documentation of vocab + data structures #379

clear language / documentation of vocab + data structures #379

Comments

colleenXu commented Dec 22, 2021 • edited Loading

ariutta commented Feb 2, 2022

colleenXu commented Feb 2, 2022 • edited Loading

tokebe commented Feb 2, 2022

tokebe commented Feb 2, 2022

ariutta commented Feb 3, 2022

colleenXu commented Feb 8, 2022

tokebe commented Mar 1, 2022

colleenXu commented Mar 1, 2022

tokebe commented Mar 1, 2022

colleenXu commented Mar 1, 2022

tokebe commented Mar 1, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

tokebe commented Mar 3, 2022

ariutta commented Mar 3, 2022

colleenXu commented Mar 3, 2022 • edited Loading

tokebe commented Mar 3, 2022

tokebe commented Mar 4, 2022

tokebe commented Mar 10, 2022

colleenXu commented Mar 14, 2022 • edited Loading

tokebe commented Mar 15, 2022 • edited Loading

tokebe commented Aug 24, 2022

colleenXu commented Dec 22, 2021 •

edited

Loading

colleenXu commented Feb 2, 2022 •

edited

Loading

colleenXu commented Mar 3, 2022 •

edited

Loading

colleenXu commented Mar 14, 2022 •

edited

Loading

tokebe commented Mar 15, 2022 •

edited

Loading