Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Conform variable names to standardized vocabulary (pass#1) #93

Merged
merged 20 commits into from
Mar 15, 2022

Conversation

tokebe
Copy link
Member

@tokebe tokebe commented Mar 4, 2022

This PR is a first-pass attempt to address biothings/biothings_explorer/issues/379.

No actual behavior has been changed. Variable names have been changed to better reflect a set of standardized vocabulary for internal data structures (which should better reflect their purpose and relationship to external data structures).

Due to some data structures passed between modules, this PR requires biothings/call-apis.js/pull/47.

This PR will require additional review and discussion to ensure that there are no additional changes that should be made, etc.

src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
// [
// {"inputPrimaryID": "NCBIGene:3630", "outputPrimaryID", "MONDO:0005068"},
// {"inputPrimaryID": "MONDO:0005068", "outputPrimaryID", "PUBCHEM.COMPOUND:43815"}
// {"inputprimaryCurie": "NCBIGene:3630", "outputprimaryCurie", "MONDO:0005068"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for lowercase p in primary here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, more generally, is it clear to everyone that inputPrimaryCurie and outputPrimaryCurie refer to nodes? We specify edge and node for QEdge and QNode but not for inputPrimaryCurie. Maybe it's obviously because it says input and output? Not saying we should change it, but just wanted to check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the lowercase. It felt relatively clear to me that input and output are nodes on an edge, however I'd like to hear @colleenXu and @marcodarko's opinions on that. Part of the goal here is to make variable names more obvious so if anyone thinks this should be made more obvious (e.g. inputNodePrimaryCurie) then I'm all for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hear @marcodarko 's view.

In general, I find the use of input/output to be confusing.....

  • are they nouns (nodes) or adjectives (descriptive)?
  • What's the perspective?
    • Users are giving TRAPI qGraphs with directed qEdges, and those qEdges often have to be "reversed" in actual execution....which then makes it confusing to say what is "input" and "output" related to those qEdges...
    • I think apiEdges / records are a bit clearer on what is "input" and "output" (it's what ID you give the API and what concept ID you get from the API response).
    • Do we want to distinguish the two different things above?
    • There may be more ideas related to input/output that I'm not thinking of (biomedical ID resolver's step of giving an "input" to the SRI ID resolver and receiving "output"....maybe that's a thing?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of I/O is a bit dynamic from my experience, it depends on the direction it takes based on the edge's subject/object ids. So it's hard to come up with a single name for them. Similarly I was confused by that part when I wanted to refer to them as subject/object instead when I first started working on the edge manager but ended up making sense when you think of it more as a definition that changes based on the direction it takes by default I --> O but can flip to O <-- I

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcodarko

My questions below might not be helpful....(I'm a bit confused here)

It sounds like you're using subject/object for the qNodes based on the qEdge's direction?

and then you use I/O for the execution of sub-queries / records? where "I" corresponds to subject and "O" to object? Does "I --> O / O <-- I" refer to different ways of executing the qEdges (reversal), or to the directionality of records, or...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to execution, the part I'm more familiar with. I believe however that same check happens with the query results. Anders checks the reversal to get the I/O from the correct node.

Copy link
Contributor

@marcodarko marcodarko Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't see a clear way to keep the naming consistent there, unless after everything is done we "reset" everything back to the original query directionality maybe?? eg. totally hypothetical but maybe after all edges are executed the manager might have to execute A <-- B ---> C <-- D but the original graph was A --> B --> C --> D. we could reset it to that to keep the context of I/O unchanged for subsequent
processes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this may need to be a semi-separate issue to clarify in a later pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this conversation unresolved for ease of reference in the future.

src/query_results.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@ariutta ariutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I left some comments on specific lines with a few questions and observations. We'll want to update the in-line code comments and any JSDocs to be consistent with the new vocab.

I find variable names like consolidatedResult more easy to understand than cResult, but that could just be a personal preference. However, if we do use cResult, then I'd change all forms of consolidatedResult, consolidatedResults, etc. throughout the codebase to be consistent.

@marcodarko has worked with quite a bit of this code and may have feedback too. If @colleenXu and Marco are good with this, then I am as well.

@tokebe tokebe requested a review from marcodarko March 4, 2022 19:46
@tokebe
Copy link
Member Author

tokebe commented Mar 4, 2022

I've responded to some of these comments where further discussion is in order. I'll be working on fixing spots I missed, comments, jsdocs, etc.

The main point seems to be preresult vs unconsolidatedResult and whether to use long-form, short-form, or both -- If the general opinion is to use only one, then I would prefer the long form for clarity. I used a combination so that the function names can adequately hint at what the short-form means, while the short-forms, which appear more often, are slightly quicker to read.

I've used this convention in a couple of places in the code (such as QueryExecutionEdge as a class definition, followed by qXEdge for short-form variable names). I think this works as a convention for both clarity and readability, however I'm open to changing it if people agree that it's probably more confusing than helpful.

@colleenXu
Copy link
Contributor

I've made some comments above, sorry for being late >.<

@colleenXu
Copy link
Contributor

I checked some queries and it looks like the code still works as-expected.

I think it's helpful to check behavior to make sure we didn't miss something that would create a bug...

@tokebe
Copy link
Member Author

tokebe commented Mar 8, 2022

I did some limited testing to make sure no basic execution was broken, however I definitely think that once we declare changes to be 'done' more thorough testing will be in order...

@tokebe
Copy link
Member Author

tokebe commented Mar 8, 2022

I've made changes according to our discussions. Please let me know how it looks now. I'll get back to writing up the list of changed vocab, and additionally will be making another pass to just check comments/etc in various places.

src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
@colleenXu
Copy link
Contributor

I checked some queries and it looks like the code still works as-expected.

FYI I haven't fully reviewed the vocab yet....I've only been chiming in when Anders or Marco bring something up. In general, I'm trusting Jackson's process since I think the vocab depends on a lot on this internal code (and I talk/work with the "higher level" stuff). I'm fine with changing the vocab I use to reflect changes here...

src/query_results.js Outdated Show resolved Hide resolved
@ariutta
Copy link
Collaborator

ariutta commented Mar 9, 2022

Not saying we should change this now, but the QueryResult class would more accurately be named something like QueryResultsAssembler or TrapiResultsAssembler. I think plural TrapiResultsAssembler makes sense instead of singular TrapiResultAssembler because the output is an array. Maybe we can think about for a future update?

src/query_results.js Outdated Show resolved Hide resolved
src/query_results.js Outdated Show resolved Hide resolved
@tokebe
Copy link
Member Author

tokebe commented Mar 9, 2022

Not saying we should change this now, but the QueryResult class would more accurately be named something like QueryResultsAssembler or TrapiResultsAssembler. I think plural TrapiResultsAssembler makes sense instead of singular TrapiResultAssembler because the output is an array. Maybe we can think about for a future update?

@ariutta Honestly I think if there's a time to do it, it's now in this PR, and I agree that it would be a good idea, so I'll push that along with changes addressing your other comments.

@tokebe
Copy link
Member Author

tokebe commented Mar 9, 2022

@ariutta has stated that this PR is ready as far as he is concerned. Next steps are to test more extensively to ensure nothing has been broken and to compile the finalized list of vocab.

@colleenXu
Copy link
Contributor

@tokebe noting a bunch of failed automated tests, when I run "npm test"....I have the vocab-refactor branch for call-apis active as well.

expand for console output

➜ query_graph_handler git:(vocab-refactor) ✗ npm test

@biothings-explorer/query_graph_handler@1.18.0 test
jest --env=node

PASS test/integration/QueryNode.test.js
PASS test/integration/QueryEdge.test.js
FAIL test/integration/graph/graph.test.js
● Test graph class › A single query result is correctly updated.

expect(received).toEqual(expected) // deep equality

Expected: "outputPrimaryID"
Received: undefined

   99 |         expect(g.nodes).toHaveProperty("outputPrimaryID-qg2");
  100 |         expect(g.nodes).toHaveProperty("inputPrimaryID-qg1");
> 101 |         expect(g.nodes["outputPrimaryID-qg2"]._primaryID).toEqual("outputPrimaryID");
      |                                                           ^
  102 |         expect(g.nodes["outputPrimaryID-qg2"]._qgID).toEqual("qg2");
  103 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceNodes)).toEqual(['inputPrimaryID-qg1']);
  104 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceQGNodes)).toEqual(['qg1']);

  at Object.<anonymous> (__test__/integration/graph/graph.test.js:101:59)

● Test graph class › Multiple query results are correctly updated for two edges having same input, predicate and output

expect(received).toEqual(expected) // deep equality

Expected: "outputPrimaryID"
Received: undefined

  119 |         expect(g.nodes).toHaveProperty("outputPrimaryID-qg2");
  120 |         expect(g.nodes).toHaveProperty("inputPrimaryID-qg1");
> 121 |         expect(g.nodes["outputPrimaryID-qg2"]._primaryID).toEqual("outputPrimaryID");
      |                                                           ^
  122 |         expect(g.nodes["outputPrimaryID-qg2"]._qgID).toEqual("qg2");
  123 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceNodes)).toEqual(['inputPrimaryID-qg1']);
  124 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceQGNodes)).toEqual(['qg1']);

  at Object.<anonymous> (__test__/integration/graph/graph.test.js:121:59)

● Test graph class › Multiple query results for different edges are correctly updated

expect(received).toEqual(expected) // deep equality

Expected: "outputPrimaryID"
Received: undefined

  146 |         expect(g.nodes).toHaveProperty("outputPrimaryID-qg2");
  147 |         expect(g.nodes).toHaveProperty("inputPrimaryID-qg1");
> 148 |         expect(g.nodes["outputPrimaryID-qg2"]._primaryID).toEqual("outputPrimaryID");
      |                                                           ^
  149 |         expect(g.nodes["outputPrimaryID-qg2"]._qgID).toEqual("qg2");
  150 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceNodes)).toEqual(['inputPrimaryID-qg1']);
  151 |         expect(Array.from(g.nodes["outputPrimaryID-qg2"]._sourceQGNodes)).toEqual(['qg1']);

  at Object.<anonymous> (__test__/integration/graph/graph.test.js:148:59)

PASS test/integration/biolink.test.js
FAIL test/integration/KnowledgeGraph.test.js
● Testing KnowledgeGraph Module › Testing _createNode function › test creating node

TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))
    at Function.from (<anonymous>)

  46 |         {
  47 |           attribute_type_id: 'source_qg_nodes',
> 48 |           value: Array.from(kgNode._sourceQNodeIDs),
     |                        ^
  49 |           //value_type_id: 'bts:source_qg_nodes',
  50 |         },
  51 |         {

  at KnowledgeGraph._createNode (src/graph/knowledge_graph.js:48:24)
  at Object.<anonymous> (__test__/integration/KnowledgeGraph.test.js:131:28)

PASS test/unittest/QueryEdge.test.js
FAIL test/unittest/helper.test.js
● Test helper moduler › Test _getInputID function › If edge is reversed, should return the primary ID of the output

TypeError: helper._getInputID is not a function

  136 |                 },
  137 |             }
> 138 |             const res = helper._getInputID(record);
      |                                ^
  139 |             expect(res).toEqual('output')
  140 |         })
  141 |

  at Object.<anonymous> (__test__/unittest/helper.test.js:138:32)

● Test helper moduler › Test _getInputID function › If edge is not reversed, should return the node ID of the subject

TypeError: helper._getInputID is not a function

  162 |                 },
  163 |             }
> 164 |             const res = helper._getInputID(record);
      |                                ^
  165 |             expect(res).toEqual('input')
  166 |         })
  167 |     })

  at Object.<anonymous> (__test__/unittest/helper.test.js:164:32)

● Test helper moduler › Test _getOutputID function › If edge is reversed, should return the node ID of the subject

TypeError: helper._getOutputID is not a function

  191 |                 },
  192 |             }
> 193 |             const res = helper._getOutputID(record);
      |                                ^
  194 |             expect(res).toEqual('input')
  195 |         })
  196 |     })

  at Object.<anonymous> (__test__/unittest/helper.test.js:193:32)

● Test helper moduler › If edge is not reversed, should return the node ID of the object

TypeError: helper._getOutputID is not a function

  218 |             },
  219 |         }
> 220 |         const res = helper._getOutputID(record);
      |                            ^
  221 |         expect(res).toEqual('output')
  222 |     })
  223 |

  at Object.<anonymous> (__test__/unittest/helper.test.js:220:28)

● Test helper moduler › Test _getKGEdgeID function › encountered a declaration exception

TypeError: helper._getKGEdgeID is not a function

  312 |             },
  313 |         }
> 314 |         const res = helper._getKGEdgeID(record);
      |                            ^
  315 |         expect(res).toEqual('b052708d75d94d55916ffce9f0ea3458')
  316 |     })
  317 |

  at Suite.<anonymous> (__test__/unittest/helper.test.js:314:28)
  at Suite.<anonymous> (__test__/unittest/helper.test.js:291:5)
  at Object.<anonymous> (__test__/unittest/helper.test.js:3:1)

● Test helper moduler › Test _getInputEquivalentIdentifiers function › If edge is reversed, should return the curies of the output

TypeError: helper._getInputEquivalentIds is not a function

  618 |                 },
  619 |             }
> 620 |             const res = helper._getInputEquivalentIds(record);
      |                                ^
  621 |             expect(res).toEqual(['789'])
  622 |         })
  623 |

  at Object.<anonymous> (__test__/unittest/helper.test.js:620:32)

● Test helper moduler › Test _getInputEquivalentIdentifiers function › If error occurred, return null

TypeError: helper._getInputEquivalentIds is not a function

  650 |                 },
  651 |             }
> 652 |             const res = helper._getInputEquivalentIds(record);
      |                                ^
  653 |             expect(res).toBeNull;
  654 |         })
  655 |

  at Object.<anonymous> (__test__/unittest/helper.test.js:652:32)

● Test helper moduler › Test _getInputEquivalentIdentifiers function › If edge is not reversed, should return the curies of the subject

TypeError: helper._getInputEquivalentIds is not a function

  682 |                 },
  683 |             }
> 684 |             const res = helper._getInputEquivalentIds(record);
      |                                ^
  685 |             expect(res).toEqual(['123', '456'])
  686 |         })
  687 |     })

  at Object.<anonymous> (__test__/unittest/helper.test.js:684:32)

PASS test/unittest/utils.test.js
PASS test/unittest/LogEntry.test.js
PASS test/integration/QueryGraphHandler.test.js
PASS test/integration/QueryResult.test.js
PASS test/integration/BatchEdgeQueryHandler.test.js
PASS test/integration/QEdge2BTEEdgeHandler.test.js
PASS test/unittest/redisClient.test.js
PASS test/integration/integrity.test.js (52.412 s)
PASS test/integration/TRAPIQueryHandler.test.js (52.806 s)

Test Suites: 3 failed, 1 skipped, 13 passed, 16 of 17 total
Tests: 12 failed, 3 skipped, 137 passed, 152 total
Snapshots: 0 total
Time: 54.561 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants