Skip to content

Comments

Enables support for multi-indexed DataFrames in the Query Language#76

Merged
slabasan merged 3 commits intollnl:developfrom
TauferLab:multi_index_ql
Mar 6, 2023
Merged

Enables support for multi-indexed DataFrames in the Query Language#76
slabasan merged 3 commits intollnl:developfrom
TauferLab:multi_index_ql

Conversation

@ilumsden
Copy link
Collaborator

Summary

Currently, the Object-based dialect and String-based dialect of the Query Language cannot handle GraphFrames containing a DataFrame with a multi-index (e.g., when you have rank and thread info).

This PR adds support for that type of data to the Object-based Dialect and String-based Dialect. This support comes in the form of a new multi_index_mode argument to the ObjectQuery constructor, the StringQuery constructor, the parse_string_dialect function, and the GraphFrame.filter function. This argument can have one of three values:

  • "off" (default): query will be applied under the assumption that the DataFrame does not have a MultiIndex (i.e., the currently behavior of the QL)
  • "all": when applying a predicate to a particular node's data in the DataFrame, all rows associated with the node must satisfy the predicate
  • "any": when applying a predicate to a particular node's data in the DataFrame, at least one row associate with the node must satisfy the predicate

The implementation of these three modes is performed within the ObjectQuery and StringQuery classes. In these classes, the translation of predicates from dialects to the "base" syntax (represented by the Query class) will differ depending on the value of multi_index_mode. Since the implementation of this functionality is in ObjectQuery and StringQuery, the multi_index_mode arguments to parse_string_dialect and GraphFrame.filter are simply passed through to the correct class.

Finally, one important thing to note is that this functionality is ONLY implemented for new-style queries (as defined in PR #72). Old-style queries (e.g., using the QueryMatcher class) do not support this behavior.

What's Left to Do?

In short, all that's left in this PR is unit testing. I still need to implement tests in test/query.py and confirm that everything is working correctly.

@ilumsden ilumsden added area-query-lang Issues and PRs related to Hatchet's query language priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on type-feature Requests for new features or PRs which implement new features labels Dec 21, 2022
@ilumsden ilumsden requested a review from slabasan December 21, 2022 15:18
@ilumsden ilumsden self-assigned this Dec 21, 2022
@ilumsden ilumsden marked this pull request as ready for review February 23, 2023 16:52
@ilumsden
Copy link
Collaborator Author

@slabasan this PR is now ready for review. I'll rebase and fix formatting once #72 is merged.

@ilumsden ilumsden added status-ready-for-review This PR is ready to be reviewed by assigned reviewers and removed status-work-in-progress PR is currently being worked on labels Feb 23, 2023
@ilumsden
Copy link
Collaborator Author

@slabasan rebasing is now complete. This PR is fully ready for review and merge.

@ilumsden ilumsden force-pushed the multi_index_ql branch 2 times, most recently from 205d9c5 to aa5d574 Compare February 27, 2023 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-query-lang Issues and PRs related to Hatchet's query language priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants