-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Conform variable names to standardized vocabulary (pass#1) #93
Conversation
src/query_results.js
Outdated
// [ | ||
// {"inputPrimaryID": "NCBIGene:3630", "outputPrimaryID", "MONDO:0005068"}, | ||
// {"inputPrimaryID": "MONDO:0005068", "outputPrimaryID", "PUBCHEM.COMPOUND:43815"} | ||
// {"inputprimaryCurie": "NCBIGene:3630", "outputprimaryCurie", "MONDO:0005068"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for lowercase p
in primary
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, more generally, is it clear to everyone that inputPrimaryCurie
and outputPrimaryCurie
refer to nodes? We specify edge
and node
for QEdge
and QNode
but not for inputPrimaryCurie
. Maybe it's obviously because it says input
and output
? Not saying we should change it, but just wanted to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the lowercase. It felt relatively clear to me that input and output are nodes on an edge, however I'd like to hear @colleenXu and @marcodarko's opinions on that. Part of the goal here is to make variable names more obvious so if anyone thinks this should be made more obvious (e.g. inputNodePrimaryCurie
) then I'm all for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hear @marcodarko 's view.
In general, I find the use of input/output to be confusing.....
- are they nouns (nodes) or adjectives (descriptive)?
- What's the perspective?
- Users are giving TRAPI qGraphs with directed qEdges, and those qEdges often have to be "reversed" in actual execution....which then makes it confusing to say what is "input" and "output" related to those qEdges...
- I think apiEdges / records are a bit clearer on what is "input" and "output" (it's what ID you give the API and what concept ID you get from the API response).
- Do we want to distinguish the two different things above?
- There may be more ideas related to input/output that I'm not thinking of (biomedical ID resolver's step of giving an "input" to the SRI ID resolver and receiving "output"....maybe that's a thing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of I/O is a bit dynamic from my experience, it depends on the direction it takes based on the edge's subject/object ids. So it's hard to come up with a single name for them. Similarly I was confused by that part when I wanted to refer to them as subject/object instead when I first started working on the edge manager but ended up making sense when you think of it more as a definition that changes based on the direction it takes by default I --> O but can flip to O <-- I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My questions below might not be helpful....(I'm a bit confused here)
It sounds like you're using subject/object for the qNodes based on the qEdge's direction?
and then you use I/O for the execution of sub-queries / records? where "I" corresponds to subject and "O" to object? Does "I --> O / O <-- I" refer to different ways of executing the qEdges (reversal), or to the directionality of records, or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to execution, the part I'm more familiar with. I believe however that same check happens with the query results. Anders checks the reversal to get the I/O from the correct node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't see a clear way to keep the naming consistent there, unless after everything is done we "reset" everything back to the original query directionality maybe?? eg. totally hypothetical but maybe after all edges are executed the manager might have to execute A <-- B ---> C <-- D but the original graph was A --> B --> C --> D. we could reset it to that to keep the context of I/O unchanged for subsequent
processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this may need to be a semi-separate issue to clarify in a later pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this conversation unresolved for ease of reference in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. I left some comments on specific lines with a few questions and observations. We'll want to update the in-line code comments and any JSDocs to be consistent with the new vocab.
I find variable names like consolidatedResult
more easy to understand than cResult
, but that could just be a personal preference. However, if we do use cResult
, then I'd change all forms of consolidatedResult
, consolidatedResults
, etc. throughout the codebase to be consistent.
@marcodarko has worked with quite a bit of this code and may have feedback too. If @colleenXu and Marco are good with this, then I am as well.
I've responded to some of these comments where further discussion is in order. I'll be working on fixing spots I missed, comments, jsdocs, etc. The main point seems to be I've used this convention in a couple of places in the code (such as |
I've made some comments above, sorry for being late >.< |
I checked some queries and it looks like the code still works as-expected. I think it's helpful to check behavior to make sure we didn't miss something that would create a bug... |
I did some limited testing to make sure no basic execution was broken, however I definitely think that once we declare changes to be 'done' more thorough testing will be in order... |
I've made changes according to our discussions. Please let me know how it looks now. I'll get back to writing up the list of changed vocab, and additionally will be making another pass to just check comments/etc in various places. |
I checked some queries and it looks like the code still works as-expected. FYI I haven't fully reviewed the vocab yet....I've only been chiming in when Anders or Marco bring something up. In general, I'm trusting Jackson's process since I think the vocab depends on a lot on this internal code (and I talk/work with the "higher level" stuff). I'm fine with changing the vocab I use to reflect changes here... |
Not saying we should change this now, but the |
@ariutta Honestly I think if there's a time to do it, it's now in this PR, and I agree that it would be a good idea, so I'll push that along with changes addressing your other comments. |
@ariutta has stated that this PR is ready as far as he is concerned. Next steps are to test more extensively to ensure nothing has been broken and to compile the finalized list of vocab. |
@tokebe noting a bunch of failed automated tests, when I run "npm test"....I have the vocab-refactor branch for call-apis active as well. expand for console output➜ query_graph_handler git:(vocab-refactor) ✗ npm test
PASS test/integration/QueryNode.test.js
● Test graph class › Multiple query results are correctly updated for two edges having same input, predicate and output
● Test graph class › Multiple query results for different edges are correctly updated
PASS test/integration/biolink.test.js
PASS test/unittest/QueryEdge.test.js
● Test helper moduler › Test _getInputID function › If edge is not reversed, should return the node ID of the subject
● Test helper moduler › Test _getOutputID function › If edge is reversed, should return the node ID of the subject
● Test helper moduler › If edge is not reversed, should return the node ID of the object
● Test helper moduler › Test _getKGEdgeID function › encountered a declaration exception
● Test helper moduler › Test _getInputEquivalentIdentifiers function › If edge is reversed, should return the curies of the output
● Test helper moduler › Test _getInputEquivalentIdentifiers function › If error occurred, return null
● Test helper moduler › Test _getInputEquivalentIdentifiers function › If edge is not reversed, should return the curies of the subject
PASS test/unittest/utils.test.js Test Suites: 3 failed, 1 skipped, 13 passed, 16 of 17 total |
This PR is a first-pass attempt to address biothings/biothings_explorer/issues/379.
No actual behavior has been changed. Variable names have been changed to better reflect a set of standardized vocabulary for internal data structures (which should better reflect their purpose and relationship to external data structures).
Due to some data structures passed between modules, this PR requires biothings/call-apis.js/pull/47.
This PR will require additional review and discussion to ensure that there are no additional changes that should be made, etc.