Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARCTIC-1213] Optimizing of Mixed Format Table supports optimizing a part of partitions at a time #1220

Merged
merged 2 commits into from
Mar 14, 2023

Conversation

wangtaohz
Copy link
Contributor

Why are the changes needed?

fix #1213

Brief change log

  • add PartitionWeight to sort partitions when planning
  • PartitionWeight for Mixed Iceberg Format Table's Minor-Optimizing is interval and change files' count
  • PartitionWeight for Mixed Iceberg Format Table's Major-Optimizing is interval and base small files' count
  • PartitionWeight for Mixed Iceberg Format Table's Full-Optimizing is interval and delete files' size
  • PartitionWeight for Mixed Hive Format Table's Major-Optimizing is interval and count of base files not in the hive location
  • PartitionWeight for Mixed Hive Format Table's Full-Optimizing is interval and delete files' size
  • Minor-Optimizing is now triggered by the change files' count instead of delete files' count

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduces a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Patch coverage: 88.39% and project coverage change: +1.42 🎉

Comparison is base (34b9894) 27.26% compared to head (2c1a002) 28.69%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1220      +/-   ##
============================================
+ Coverage     27.26%   28.69%   +1.42%     
- Complexity     4552     5088     +536     
============================================
  Files           615      658      +43     
  Lines         65636    69392    +3756     
  Branches       7646     7999     +353     
============================================
+ Hits          17896    19911    +2015     
- Misses        45937    47555    +1618     
- Partials       1803     1926     +123     
Flag Coverage Δ
core 27.33% <88.39%> (+0.06%) ⬆️
trino 52.78% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...etease/arctic/ams/server/model/TableQuotaInfo.java 0.00% <ø> (ø)
...ms/server/optimize/AbstractArcticOptimizePlan.java 90.65% <ø> (+2.13%) ⬆️
...s/server/optimize/AbstractIcebergOptimizePlan.java 99.01% <ø> (-0.01%) ⬇️
...s/server/optimize/SupportHiveFullOptimizePlan.java 52.63% <0.00%> (-1.22%) ⬇️
.../server/optimize/SupportHiveMajorOptimizePlan.java 73.17% <50.00%> (+0.25%) ⬆️
...c/ams/server/optimize/IcebergFullOptimizePlan.java 82.85% <71.42%> (-1.52%) ⬇️
.../ams/server/optimize/IcebergMinorOptimizePlan.java 91.81% <71.42%> (-1.46%) ⬇️
.../arctic/ams/server/optimize/MinorOptimizePlan.java 85.00% <91.30%> (+10.00%) ⬆️
...ctic/ams/server/optimize/AbstractOptimizePlan.java 87.50% <95.23%> (+0.15%) ⬆️
...e/arctic/ams/server/optimize/FullOptimizePlan.java 84.31% <95.65%> (+5.66%) ⬆️
... and 1 more

... and 46 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhoujinsong zhoujinsong merged commit 8162c6d into apache:master Mar 14, 2023
wangtaohz added a commit to wangtaohz/amoro that referenced this pull request Mar 14, 2023
…part of partitions at a time (apache#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1
@wangtaohz wangtaohz deleted the fix-1213-1 branch March 14, 2023 13:36
zhoujinsong pushed a commit that referenced this pull request Mar 15, 2023
* fix ArcticHadoopFileIO cast error

* overwrite file in trash when move

* [ARCTIC-1213] Optimizing of Mixed Format Table supports optimizing a part of partitions at a time (#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1

* fix checkstyle

* TableTrashManager should extends Serializable
baiyangtx pushed a commit that referenced this pull request Mar 22, 2023
…part of partitions at a time (#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1
zhoujinsong pushed a commit that referenced this pull request Mar 29, 2023
…part of partitions at a time (#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1
zhoujinsong pushed a commit that referenced this pull request May 31, 2023
…part of partitions at a time (#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1
zhoujinsong pushed a commit that referenced this pull request May 31, 2023
* fix ArcticHadoopFileIO cast error

* overwrite file in trash when move

* [ARCTIC-1213] Optimizing of Mixed Format Table supports optimizing a part of partitions at a time (#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1

* fix checkstyle

* TableTrashManager should extends Serializable
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
…part of partitions at a time (apache#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
…he#1223)

* fix ArcticHadoopFileIO cast error

* overwrite file in trash when move

* [ARCTIC-1213] Optimizing of Mixed Format Table supports optimizing a part of partitions at a time (apache#1220)

* support partition ordered by PartitionWeight for OptimizePlan

* if not all partitions are optimized, current change snapshot id should set to -1

* fix checkstyle

* TableTrashManager should extends Serializable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Self-Optimizing for Mixed Format tables should limit the file cnt for each Optimizing
2 participants