Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: iceberg v2 table orphan file clean trigger #1925

Closed
2 tasks done
celltobig opened this issue Sep 6, 2023 · 10 comments
Closed
2 tasks done

[Feature]: iceberg v2 table orphan file clean trigger #1925

celltobig opened this issue Sep 6, 2023 · 10 comments
Labels
stale type:feature Feature Requests

Comments

@celltobig
Copy link
Contributor

Description

I want a iceberg table of orphan file clean , table expire snapshots,
has table property trigger ,not set time trigger

Use case/motivation

image image

Describe the solution

No response

Subtasks

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@celltobig celltobig added the type:feature Feature Requests label Sep 6, 2023
@shidayang
Copy link
Contributor

Thank you for raising this issue. This logic has already been extracted, and there will be a dedicated class responsible for snapshot expiration and orphan file cleaning in the future. Please refer to #1916

@baiyangtx
Copy link
Contributor

Would it be more reasonable to trigger it through a web page or some command-line method? Using the table property doesn't seem like a good approach.

@HuangFru
Copy link
Contributor

HuangFru commented Sep 7, 2023

I agree with @baiyangtx. Maybe this feature could be linked to #1357, and using command line tools to explicitly trigger these procedures is a better way.

@Kyofin
Copy link
Contributor

Kyofin commented Sep 8, 2023

I believe that allowing users to manually trigger and set a timer for the trigger button in Table web page is a better choice.

@wangtaohz
Copy link
Contributor

For the expiring snapshots and cleaning orphan files service, I think there should be two types of triggers:

  • a manual trigger for a single execution
  • a automatic trigger for periodic executions

For the single manual trigger, it is suitable to be completed through a web UI (such as a button) or the command-line(#1357), which has been in our roadmap.
For the periodic automatic trigger, it is suitable to be configured as a setting. Of course, we can make this configuration to be more user-friendly like setting a timer on the web UI, but in the short term, I think it is feasible to configure it as a table property, like what we have done on the trigger of self-optimizing.

The goal of this issue is more focused on improving the periodic automatic trigger, and it doesn't conflict with the feature in #1357.

@baiyangtx
Copy link
Contributor

How about implement in Spark SQL first, so that we can use it on Terminal SQL ?

@wangtaohz
Copy link
Contributor

How about implement in Spark SQL first, so that we can use it on Terminal SQL ?

AFAIK, Iceberg has already supported cleaning orphan files by calling spark Procedure, remove_orphan_files .

@baiyangtx
Copy link
Contributor

How about implement in Spark SQL first, so that we can use it on Terminal SQL ?

AFAIK, Iceberg has already supported cleaning orphan files by calling spark Procedure, remove_orphan_files .

@celltobig Can this approach meet your needs?

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Aug 21, 2024
Copy link

github-actions bot commented Sep 4, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale type:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

6 participants