Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-2417] Support partition filter on optimizing plan #2436

Merged
merged 12 commits into from
Dec 27, 2023

Conversation

HuangFru
Copy link
Contributor

Why are the changes needed?

Close #2417.

Brief change log

  • Use Pair<Integer, StructLike> to represent the partition instead of a String.
  • Add an ExpressionUtil used to convert partition data to a data filter and apply it as a partition filter.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not applicable)

@github-actions github-actions bot added module:core Core module module:ams-dashboard Ams dashboard module labels Dec 15, 2023
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (c3d1969) 52.93% compared to head (5278d44) 52.91%.
Report is 1 commits behind head on master.

Files Patch % Lines
...java/com/netease/arctic/utils/ArcticTableUtil.java 57.14% 2 Missing and 1 partial ⚠️
.../java/com/netease/arctic/utils/ExpressionUtil.java 94.44% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2436      +/-   ##
============================================
- Coverage     52.93%   52.91%   -0.02%     
- Complexity     4420     4423       +3     
============================================
  Files           534      535       +1     
  Lines         30578    30618      +40     
  Branches       2994     2997       +3     
============================================
+ Hits          16185    16202      +17     
- Misses        13097    13132      +35     
+ Partials       1296     1284      -12     
Flag Coverage Δ
core 53.36% <94.44%> (+0.06%) ⬆️
trino 50.44% <ø> (-0.43%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot removed the module:ams-dashboard Ams dashboard module label Dec 25, 2023
Copy link
Contributor

@wangtaohz wangtaohz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, just some minor details need to be modified.

public void testUnkeyedConvertPartitionStructLikeToDataFilter() throws IOException {
Assume.assumeTrue(getArcticTable().isUnkeyedTable());
UnkeyedTable table = getArcticTable().asUnkeyedTable();
logger.info("Partition Spec:" + getArcticTable().spec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not very clear about the necessity of printing logs here.

Copy link
Contributor Author

@HuangFru HuangFru Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Without this log, I can't tell which test used which partition spec. For example, if an error occurs in a case, we will not be able to tell which transform case has the error.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the test parameters rather than printing the log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated:
image

Copy link
Contributor

@wangtaohz wangtaohz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HuangFru I left some comments, please take a look.

public void testUnkeyedConvertPartitionStructLikeToDataFilter() throws IOException {
Assume.assumeTrue(getArcticTable().isUnkeyedTable());
UnkeyedTable table = getArcticTable().asUnkeyedTable();
logger.info("Partition Spec:" + getArcticTable().spec());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not very clear about the necessity of printing logs here.

@HuangFru
Copy link
Contributor Author

@HuangFru I left some comments, please take a look.

Fixed in the new commit.

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhoujinsong zhoujinsong merged commit 4ccfb4e into apache:master Dec 27, 2023
6 of 7 checks passed
* @param partitions the collection of partition data
* @return data filter converted from partition data
*/
public static Expression convertPartitionDataToDataFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both methods are named convertPartitionDataToDataFilter. I think they can be distinguished.

* @param partition the partition data
* @return data filter converted from partition data
*/
public static Expression convertPartitionDataToDataFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add @VisibleForTesting? This method is not referenced externally

@HuangFru
Copy link
Contributor Author

HuangFru commented Dec 27, 2023

Sorry, the PR has been merged into the master branch. Thanks for your review. Looks like some minor imperfections that maybe could be ignored?
@huyuanfeng2018

@huyuanfeng2018
Copy link
Contributor

Sorry, the PR has been merged into the master branch. Thanks for your review. Looks like some minor imperfections that maybe could be ignored? @huyuanfeng2018

ok!

ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
* support partition filter on scan helper

* apply spotless

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* change test param

---------

Co-authored-by: wangtaohz <103108928+wangtaohz@users.noreply.github.com>
@zhoujinsong zhoujinsong mentioned this pull request Jun 25, 2024
66 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:core Core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Use partition filter to speed up optimizing plan
4 participants