Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARCTIC-1093] Self-Optimizing scan files from metadata instead of from file info cache #1100

Merged
merged 11 commits into from
Feb 14, 2023

Conversation

wangtaohz
Copy link
Contributor

Why are the changes needed?

fix #1093

Brief change log

  • UnkeyedTable scans files use TableScan API
  • KeyedTable's Major/Full Optimizing scans files use TableScan API of BaseStore
  • KeyedTable's Minor Optimizing scans files use ChangeTableIncrementalScan API of ChangeStore
  • Put all insert/delete/pos-delete/base files to partitionFileTree to replace partitionPosDeleteFiles,partitionNeedMajorOptimizeFiles,partitionDeleteFiles
  • extract addBaseFilesIntoFileTree to AbstractArcticOptimizePlan
  • refactor FileTree and remove useless method

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduces a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added module:ams-server Ams server module module:ams-dashboard Ams dashboard module labels Feb 13, 2023
Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangtaohz I left some comments, please take another look.

@zhoujinsong
Copy link
Contributor

@wangtaohz Do not forget to give the PR an appropriate title.

@wangtaohz wangtaohz changed the title [ARCTIC-1093] [ARCTIC-1093] Self-Optimizing scan files from metadata instead of from file info cache Feb 14, 2023
@wangtaohz
Copy link
Contributor Author

@wangtaohz Do not forget to give the PR an appropriate title.

It's my negligence, I add it.

2.remove checking table changed during the plan and check it before scan files
3.AbstractIcebergOptimizePlan use the correct currentSnapshot
4.import OptimizePlanResult to to encapsulate the plan result
@zhoujinsong zhoujinsong merged commit 1b33b0e into apache:master Feb 14, 2023
wangtaohz added a commit that referenced this pull request Feb 15, 2023
* [ARCTIC-1062][AMS]Terminal support config spark properties in the local model (#1094)

* terminal support config spark properties in the local model

---------

Co-authored-by: jinsilei <jinsilei@corp.netease.com>

* [AMS][Improvement]: Support set login user and login password in config yaml file (#1086)

* Support set login user and login password in config yaml file

* [ARCTIC-1091] Browser tab does not display Arctic's icon (#1092)

fix-1091

Co-authored-by: shendanfeng01 <shendanfeng01@corp.netease.com>

* [ARCTIC-1090][AMS]:Terminal support add hadoop conf when use native iceberg (#1099)

* terminal supoort hadoop conf

---------

Co-authored-by: jinsilei <jinsilei@corp.netease.com>

* [ARCTIC-1093] Self-Optimizing scan files from metadata instead of from file info cache (#1100)

* fix-1093 optimize use TableScan

* modify OptimizeIntegrationTest for TestHiveSupport Table

* 1.remove checking any tasks running during the plan
2.remove checking table changed during the plan and check it before scan files
3.AbstractIcebergOptimizePlan use the correct currentSnapshot
4.import OptimizePlanResult to to encapsulate the plan result

* [hotfix] Lower the log level in ShuffleSplitAssigner (#1106)

* [ARCTIC-1095][AMS] Add the sequence number for the native iceberg table when the major optimizing commit (#1101)

* fix #1095
Adding the sequence number in the plan when the major commit for the native iceberg table


---------

Co-authored-by: luting <dylzlt93299@gmail.com>

* [ARCTIC-924][Hive] When AMS runs for a period of time and then cannot connect to HMS (#1054)

---------

Co-authored-by: shendanfeng01 <shendanfeng01@corp.netease.com>

---------

Co-authored-by: PlanetWalker <52364847+hellojinsilei@users.noreply.github.com>
Co-authored-by: jinsilei <jinsilei@corp.netease.com>
Co-authored-by: wangzeyu <hameizi369@gmail.com>
Co-authored-by: shendanfengg <109209550+shendanfengg@users.noreply.github.com>
Co-authored-by: shendanfeng01 <shendanfeng01@corp.netease.com>
Co-authored-by: Xianxun Ye <yxx_cmhd@163.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: luting <dylzlt93299@gmail.com>
@wangtaohz wangtaohz deleted the fix-1093-1 branch February 15, 2023 02:55
zhoujinsong pushed a commit that referenced this pull request May 31, 2023
…m file info cache (#1100)

* support modifying log4j2.xml dynamically

* fix-1093 optimize use TableScan

* remove partitionPosDeleteFiles from AbstractArcticOptimizePlan

* refactor MinorOptimizePlan

* modify OptimizeIntegrationTest for TestHiveSupport Table

* refactor collectSubTree to splitFileTree

* 1.remove checking any tasks running during the plan
2.remove checking table changed during the plan and check it before scan files
3.AbstractIcebergOptimizePlan use the correct currentSnapshot
4.import OptimizePlanResult to to encapsulate the plan result

* refactor to SplitIfNoFileExists

* fix checkstyle

* remove useless comment
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
…m file info cache (apache#1100)

* support modifying log4j2.xml dynamically

* fix-1093 optimize use TableScan

* remove partitionPosDeleteFiles from AbstractArcticOptimizePlan

* refactor MinorOptimizePlan

* modify OptimizeIntegrationTest for TestHiveSupport Table

* refactor collectSubTree to splitFileTree

* 1.remove checking any tasks running during the plan
2.remove checking table changed during the plan and check it before scan files
3.AbstractIcebergOptimizePlan use the correct currentSnapshot
4.import OptimizePlanResult to to encapsulate the plan result

* refactor to SplitIfNoFileExists

* fix checkstyle

* remove useless comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-dashboard Ams dashboard module module:ams-server Ams server module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Self-Optimizing scan files from metadata instead of from file info cache
2 participants