Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: If the optimizing of a Mixed Hive table fails to alter the Hive location, the files will be moved to the old Hive location. #1898

Closed
2 tasks done
Tracked by #1930
wangtaohz opened this issue Aug 29, 2023 · 1 comment · Fixed by #1911
Labels
type:bug Something isn't working

Comments

@wangtaohz
Copy link
Contributor

What happened?

When optimizing the Mixed Hive Format table but failing to alter the hive location, the following full-optimizing will move files to the old Hive location. This brings two problems

  • If the asynchronous thread successfully alters the Hive location, users will see a decrease in the data in Hive, similar to phantom reads.
  • The commit of the following full-optimizing will fail, because the old hive location contains a mixture of files from new and old snapshots. The exception will be like
image

Affects Versions

0.5.x,0.4.x,0.3.x

What engines are you seeing the problem on?

Core

How to reproduce

This issue occurs only when failing to alter the Hive location.

To reproduce this scenario

  1. We can deliberately change the hive location back to its original location after the full-optimizing committing.
  2. Then insert some data, and wait for the next full-optimizing move files to the old hive location.
  3. Then trigger another full-optimizing, it will fail when committing.

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
@wangtaohz wangtaohz added the type:bug Something isn't working label Aug 29, 2023
@wangtaohz
Copy link
Contributor Author

To fix this issue, we should make sure that the Hive location is correct before committing of optimizing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
1 participant