Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial support for grouped join pushdown #9032

Merged
merged 26 commits into from
Jan 14, 2025

Conversation

mcheshkov
Copy link
Member

@mcheshkov mcheshkov commented Dec 10, 2024

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

With this

  • Load request extended with list of grouped subqueries to join with. Each subquery come with SQL string, alias, condition and join type. SQL is treated as static query, fully porcessed by SQL API. Condition - as member expression, because it can contain references to dimensions
  • BaseQuery will add grouped subqueries to SQL after subquery dimensions and join tree parts
  • Wrapper rewrite rules handle wrapper_pullup_replacer in join position of wrapped_select, but still mostly with empty list inside. Notable exception is aggregation flattening - it is allowed to try to flatten aggregation into WrappedSelect with non-empty joins.
  • Wrapper pull up rules no handle trivial pulling up over single wrapped_select_join and over list WrappedSelectJoins list
  • Wrapper replacers now have grouped_subqueries in a context, it is used to track qualifiers of grouped subqueries, to allow rewriting of column expressions referencing grouped subquery with push-to-cube active
  • Converter from egraph to logical plan reconstructs schema for joins
  • Penalize joins harder than wrappers. For queries ungrouped queries with grouped join, like SELECT * FROM (ungrouped) JOIN (grouped);, this would allow to prefer representation without join, but with wrapper. Without this cost change join representation would have higher ast_size_outside_wrapper component, but it is less important then wrapper_nodes, so wrapper representation would be more expensive.
  • SQL generation for WrappedSelectNode will generate each grouped subquery separately, generate member expressions for join conditions and push all that to Cube
  • Rewrite rules are too relaxed for now: they allow any push_to_cube=false query as grouped, which is incorrect, as we may drop push-to-Cube when wrapping WrappedSelect with another layer, even if inner WrappedSelect is projection. This is covered by check in SQL generation: only CubeScan(grouped=true) are allowed. This require new flag in wrapper replacers to fix.

Supporting changes:

  • Extract CubeScanWrappedSqlNode from CubeScanWrapperNode. Both a logical plan nodes for now, Wrapper is generated during rewrites, and WrappedSql stores SQL, generated generated during evaluate_wrapped_sql
  • Extracted __cubeJoinField join check to function, and make it stricter, only left.__cubeJoinField = right.__cubeJoinField is allowed
  • Avoid clone calls in find_column_by_alias and __cubeJoinField join check
  • Add PushToCubeContext in SQL generation code, used to track everything necessary to generate SQL for push-to-Cube case. For now it contains ungrouped CubeScanNode that whould be pushed into, and qualifiers and generated SQL for all grouped subqueries. Qualifiers are necessary to generate column expressions on top of ungrouped-grouped join.
  • New copy_value! macro, to extend copy_flag!, allows to copy single value between different unary language variants
  • Prettify extracted best plan and cost in logs

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 94.45760% with 117 lines in your changes missing coverage. Please review.

Project coverage is 83.14%. Comparing base (e55beb1) to head (e34feec).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...t/cubesql/cubesql/src/compile/engine/df/wrapper.rs 84.66% 52 Missing ⚠️
.../src/models/v1_load_request_query_join_subquery.rs 0.00% 14 Missing ⚠️
...c/compile/rewrite/rules/wrapper/scalar_function.rs 78.78% 7 Missing ⚠️
rust/cubesql/cubesql/src/compile/mod.rs 89.47% 6 Missing ⚠️
rust/cubesql/cubesql/src/compile/engine/df/scan.rs 66.66% 4 Missing ⚠️
...ubesql/src/compile/rewrite/rules/wrapper/column.rs 95.12% 4 Missing ⚠️
.../cubesql/src/compile/rewrite/rules/wrapper/join.rs 98.87% 4 Missing ⚠️
...sql/src/compile/rewrite/rules/wrapper/aggregate.rs 97.72% 3 Missing ⚠️
...l/cubesql/src/compile/rewrite/rules/wrapper/mod.rs 95.38% 3 Missing ⚠️
...t/cubesql/cubesql/src/compile/test/test_wrapper.rs 94.23% 3 Missing ⚠️
... and 13 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9032      +/-   ##
==========================================
+ Coverage   82.85%   83.14%   +0.29%     
==========================================
  Files         222      225       +3     
  Lines       78828    80300    +1472     
==========================================
+ Hits        65309    66767    +1458     
- Misses      13519    13533      +14     
Flag Coverage Δ
cubesql 83.14% <94.45%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcheshkov mcheshkov marked this pull request as ready for review December 10, 2024 20:39
@mcheshkov mcheshkov requested review from a team as code owners December 10, 2024 20:39
@mcheshkov mcheshkov changed the title feast: Initial support for grouped join pushdown feat: Initial support for grouped join pushdown Dec 11, 2024
@mcheshkov mcheshkov force-pushed the initial-grouped-join-pushdown branch 2 times, most recently from 18417aa to 739fdc6 Compare December 18, 2024 22:18
@mcheshkov mcheshkov force-pushed the initial-grouped-join-pushdown branch 2 times, most recently from fe62c0f to 9440876 Compare January 13, 2025 18:57
They would be used later as a way to allow replacers to handle column from grouped subquery in join
Non-flatten rules will gain support for joins with grouped-grouped joins SQL generation support in WrappedSelect
@mcheshkov mcheshkov force-pushed the initial-grouped-join-pushdown branch from 9440876 to e34feec Compare January 14, 2025 18:12
@mcheshkov mcheshkov merged commit 2f11d20 into master Jan 14, 2025
72 checks passed
@mcheshkov mcheshkov deleted the initial-grouped-join-pushdown branch January 14, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants