Enables support for multi-indexed DataFrames in the Query Language#76
Merged
slabasan merged 3 commits intollnl:developfrom Mar 6, 2023
Merged
Enables support for multi-indexed DataFrames in the Query Language#76slabasan merged 3 commits intollnl:developfrom
slabasan merged 3 commits intollnl:developfrom
Conversation
64b1a27 to
d4199ab
Compare
d4199ab to
283d6cf
Compare
Collaborator
Author
…ti_index_mode' parameter
d8bd47b to
34cffcf
Compare
Collaborator
Author
|
@slabasan rebasing is now complete. This PR is fully ready for review and merge. |
205d9c5 to
aa5d574
Compare
aa5d574 to
238235a
Compare
slabasan
approved these changes
Mar 6, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Currently, the Object-based dialect and String-based dialect of the Query Language cannot handle GraphFrames containing a DataFrame with a multi-index (e.g., when you have rank and thread info).
This PR adds support for that type of data to the Object-based Dialect and String-based Dialect. This support comes in the form of a new
multi_index_modeargument to theObjectQueryconstructor, theStringQueryconstructor, theparse_string_dialectfunction, and theGraphFrame.filterfunction. This argument can have one of three values:"off"(default): query will be applied under the assumption that theDataFramedoes not have aMultiIndex(i.e., the currently behavior of the QL)"all": when applying a predicate to a particular node's data in theDataFrame, all rows associated with the node must satisfy the predicate"any": when applying a predicate to a particular node's data in theDataFrame, at least one row associate with the node must satisfy the predicateThe implementation of these three modes is performed within the
ObjectQueryandStringQueryclasses. In these classes, the translation of predicates from dialects to the "base" syntax (represented by theQueryclass) will differ depending on the value ofmulti_index_mode. Since the implementation of this functionality is inObjectQueryandStringQuery, themulti_index_modearguments toparse_string_dialectandGraphFrame.filterare simply passed through to the correct class.Finally, one important thing to note is that this functionality is ONLY implemented for new-style queries (as defined in PR #72). Old-style queries (e.g., using the
QueryMatcherclass) do not support this behavior.What's Left to Do?
In short, all that's left in this PR is unit testing. I still need to implement tests in
test/query.pyand confirm that everything is working correctly.