-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes RemoveOrphanFiles delete files unexpected #2890
Conversation
Signed-off-by: Xianyang Liu <xianyangliu@tencent.com>
There have been multiple discussions around this. I'll try to fetch the old thread on slack later today. |
Hi @aokolnychyi, could you help to review this? Thanks a lot. |
gentle ping @rdblue @aokolnychyi, could you help to review this? Thanks a lot. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
* File path: mock://localhost:9000/path will be resolved as mock://localhost:9000/path | ||
* | ||
*/ | ||
public class MockFileSystem extends RawLocalFileSystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this store data in memory?
This sounds reasonable to me, but there is one caveat.
but someone else should confirm that. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
RemoveOrphanFiles
useactualFileDF leftanti join validFileDF
to determine which files should be removed. We will list all the files under the provided or table location directory withFileSystem.listStatus
and create theactualFileDF
.validFileDF
is created by index those metadata file and reference.However,
FileSystem.listStatus
willqualify
the given path. For example: a path:hdfs:/path
will be qualified withhdfs://host:port/path
. If thewarehouse
is set as:hdfs:/path
:validFileDF
:hdfs:/path/file1
hdfs:/path/file2
hdfs:/path/file3
....
actualFileDF
:hdfs://host:port/path/file1
hdfs://host:port/path/file2
hdfs://host:port/path/file3
....
Then, all the files in
actualFileDF
will be treated as invalid.In this patch, we only compare the pure path (remove the schema and authority) when doing the
leftanti join
.Updated existed UTs to test it.