Skip to content

Conversation

@venkata91
Copy link
Contributor

@venkata91 venkata91 commented Aug 29, 2023

What is the purpose of the change

Support filter push down on nested fields
For example:
SELECT * FROM users WHERE user.age > 18
In the above SQL, user.age > 18 filter can be pushed to table source to avoid scanning records that don't match the predicate. This is especially useful in analytics with columnar formats as well as in JDBC datasources.

Brief change log

  • Introduce a new ResolvedExpression called NestedFieldReferenceExpression to handle filters on nested fields
  • Changes to RexNodeToExpressionConverter and ExpressionConverter to convert RexFieldAccess <=> NestedFieldReferenceExpression and vice versa.

Verifying this change

Added unit tests in PushFilterIntoTableSourceScanRuleTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no)
  • The runtime per-record code paths (performance sensitive): (yes / no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no)
  • The S3 file system connector: (yes / no)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 29, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@venkata91 venkata91 force-pushed the FLINK-20767 branch 2 times, most recently from 2583185 to bc31ebd Compare August 30, 2023 05:17
@venkata91
Copy link
Contributor Author

@flinkbot run azure

@venkata91 venkata91 marked this pull request as ready for review August 30, 2023 23:56
@venkata91 venkata91 force-pushed the FLINK-20767 branch 2 times, most recently from fe34258 to 74ccf82 Compare September 6, 2023 16:54
@venkata91
Copy link
Contributor Author

@wuchong @swuferhong @JingsongLi @becketqin please review

@venkata91
Copy link
Contributor Author

Also tagging @slinkydeveloper

Copy link
Contributor

@swuferhong swuferhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @venkata91 . Thanks for your contribution. The PR looks good in general. I left some comments.

Comment on lines +168 to +169
util.verifyRelPlan(
"SELECT id FROM NestedTable WHERE `deepNestedWith.`.nested.```name` = 'foo'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question, is this the standard way of adding "`"? I couldn't find the relevant documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the case where the column name has backtick () in it and should be escaped as the whole nested field expression name itself has to be wrapped inside backticks. File formats like ORC etc requires the entire nested field to be with in backticks ()

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkata91 Thanks for the patch. LGTM overall. I agree with @swuferhong that we need to add some IT cases for E2E test.

@venkata91
Copy link
Contributor Author

@becketqin @swuferhong Addressed review comments mainly adding ITCases.

Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkata91 Thanks for your contribution, please use git rebase instead of git merge. I can't get your all commits correctly when pull your pr to local.

@venkata91
Copy link
Contributor Author

@venkata91 Thanks for your contribution, please use git rebase instead of git merge. I can't get your all commits correctly when pull your pr to local.

@lsyldliu Do you want me to fix it?

@venkata91
Copy link
Contributor Author

@venkata91 Thanks for your contribution, please use git rebase instead of git merge. I can't get your all commits correctly when pull your pr to local.

@lsyldliu Do you want me to fix it?

I used to use rebase before. But with rebase the commit ids changes and therefore it is hard to only see the requested changes which is why I changed to merge instead of rebase. See here, started doing it after this comment.

- Implement nested fields filter push down

1. See if a new ResolvedExpression called NestedFieldReferenceExpression has to be implemented or not. If implemented, convert the NestedFieldReferenceExpression in to RexFieldAccess
2. Revisit ExpressionConverter to see if the FieldReferenceExpression has to be converted to a RexFieldAccess
3. Add more tests
4. Check other rules like PushFilterInCalcIntoTableSourceScanRule, PushPartitionIntoTableSourceScanRule
@venkata91
Copy link
Contributor Author

@venkata91 Thanks for your contribution, please use git rebase instead of git merge. I can't get your all commits correctly when pull your pr to local.

@lsyldliu Do you want me to fix it?

nvm, fixed it as requested.

@venkata91
Copy link
Contributor Author

@swuferhong @becketqin @lsyldliu Gentle ping for reviews.

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment regarding the usage of Optional.

Ping @swuferhong for a review as well.

Copy link
Contributor

@swuferhong swuferhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@becketqin becketqin merged commit 5be4688 into apache:master Sep 26, 2023
@becketqin
Copy link
Contributor

Merged to master: 5be4688

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants