Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-1723] Support auto create tag daily for Iceberg Format #2263

Merged
merged 160 commits into from
Nov 27, 2023

Conversation

wangtaohz
Copy link
Contributor

@wangtaohz wangtaohz commented Nov 8, 2023

Why are the changes needed?

Brief change log

  • add a new TagsCheckingExecutor to AMS
  • add a new interface method autoCreateTags to TableMaintainer
  • add AutoCreateIcebergTagAction to create daily tags for the Iceberg table

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

@github-actions github-actions bot added the type:docs Improvements or additions to documentation label Nov 13, 2023
Copy link
Contributor

@huyuanfeng2018 huyuanfeng2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 206 to 208
public static final String AUTO_CREATE_TAG_FORMAT = "tag.auto-create.daily.tag-format";
public static final String AUTO_CREATE_TAG_FORMAT_DEFAULT = "'tag-day-'yyyyMMdd";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag-20220901 is fine. 20220901 has already mark the tag as a daily tag. so tag-day is not necessary

docs/user-guides/configurations.md Outdated Show resolved Hide resolved
Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangtaohz I left some comments, please task another look.

Besides, I am confused about the usage of property tag.auto-create.trigger.max-delay.minutes. I have reread the design document, but I still couldn't find the use case for this property. Would you mind making some additions in the document to clarify it?

@CLAassistant
Copy link

CLAassistant commented Nov 22, 2023

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ zhoujinsong
❌ wangtaohz
❌ baiyangtx
You have signed the CLA already but the status is still pending? Let us recheck it.

@wangtaohz
Copy link
Contributor Author

Besides, I am confused about the usage of property tag.auto-create.trigger.max-delay.minutes. I have reread the design document, but I still couldn't find the use case for this property. Would you mind making some additions in the document to clarify it?

I added some explanations in the document.
https://docs.google.com/document/d/1_56YHZO7XSkkZV7bTFQ7yWNVfiV0TknwOIMt0j8Me6Y/edit#heading=h.m88xjoz356hr

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, l left some small suggestion.

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhoujinsong zhoujinsong merged commit a0b98b7 into apache:master Nov 27, 2023
7 checks passed
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
…#2263)

* upgrade iceberg to 1.3.0

* fix flink

* fix flink

* remove useless

* change version from 0.5.0-SNAPSHOT to 0.5.1-SNAPSHOT

* update iceberg version 1.3.x in Flink 1.12 module

* update iceberg version 1.3.x in Flink 1.12 module

* ArcticUpdate support toBranch

* fix ci error

* update iceberg version 1.3.x in Flink 1.14 module

* update iceberg version 1.3.x in Flink 1.15 module

* add PuffinUtil and unit test

* remove legacyPartitionMaxTransactionId before 0.4.1

* store optimized sequence to puffin

* support overwrite puffin when retry

* add table property table.version

* calculate available core

* change max input size per thread from 5GB to 500MB

* add auto create tag properties

* add PuffinUtil and unit test

* remove legacyPartitionMaxTransactionId before 0.4.1

* store optimized sequence to puffin

* support overwrite puffin when retry

* add table property table.version

* implement TagsCheckingExecutor and add unit test

* fix unit test

* fix unit test for hive

* change version from 0.5.1-SNAPSHOT to 0.5.0-SNAPSHOT

* change version from 0.5.0-SNAPSHOT to 0.5.1-SNAPSHOT

* support keyed table scan use ref(tag/branch)

* no need to get optimized sequence from KeyedTableSnapshot

* change version back to 0.5.0-SNAPSHOT

* fix get null sequence

* fix sequence number = -1

* fix optimizing integration test for hive

* create puffin for each snapshot

* for compatibility, if puffin not exist, using table properties

* remove useless table version

* add readWithCompatibility

* add some comments for PuffinUtil

* expire statistics files

* fix check style

* fix compile error

* remove useless deprecate

* fix unit test

* fix compile error and unit test

* refactor PuffinUtil

* fix compile error in ams server

* spotless:apply core

* fix unit test

* support generic type for PartitionDataSerializer

* rename PuffinUtil to StatisticsFileUtil

* rename method to readFromStatisticsFile

* spotless: apply

* spotless: apply

* spotless: apply

* 1.fix comment
2.add writerBuilder for StatisticsFileUtil.Writer
3.store puffin files in the location of data/puffin/
4.search snapshot based on a mark in snapsnot summary

* fix snapshot expring unit test error

* fix compile error

* fix TableConfiguration equals hashcode

* refactor code and only support create tag now

* add unit test for TestAutoCreateIcebergTagAction

* spotless:apply

* remove the support for mixed format

* Update ams/server/src/main/java/com/netease/arctic/server/optimizing/maintainer/AutoCreateIcebergTagAction.java

Co-authored-by: baiyangtx <xiangnebula@163.com>

* Update ams/server/src/main/java/com/netease/arctic/server/optimizing/maintainer/AutoCreateIcebergTagAction.java

Co-authored-by: baiyangtx <xiangnebula@163.com>

* revert useRef when scan

* fix compile error

* Update ams/server/src/main/java/com/netease/arctic/server/table/executor/TagsCheckingExecutor.java

Co-authored-by: baiyangtx <xiangnebula@163.com>

* fix comment

* support more auto create configs

* Update core/src/main/java/com/netease/arctic/table/TagTriggerPeriod.java

Co-authored-by: big face cat <731030576@qq.com>

* fix compile error

* fix unit test

* add auto create tag max delay

* implement auto creating tag max delay

* add docs

* change daily tag format to 'tag-'yyyyMMdd

* change docs

* add docs for TagTriggerPeriod

* revert test assert

* refactor TagConfiguration

* add docs

* spotless:apply

* add docs

* improve logs and remove autoCreateTagEnabled from TableConfiguration

* improve logs

---------

Co-authored-by: lklhdu <lekeleihz@163.com>
Co-authored-by: ZhouJinsong <zhoujinsong0505@163.com>
Co-authored-by: baiyangtx <xiangnebula@163.com>
Co-authored-by: big face cat <731030576@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-dashboard Ams dashboard module module:core Core module type:build type:docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask]: Automatically create tags on snapshots daily for Iceberg Format Table
6 participants