-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DELETE statement #5146
Comments
Welcome to comments, @dantengsky |
Generally, it's a common way to store an optional extra bitmap inside BlockInfo and apply the filter selection during query. |
This is a good way(preferred it), if we add an extra bitmap, this bitmap should be binding with one snapshot. |
Perhaps we could introduce the “delete_file“ like iceberg. By the way,why not just borrow iceberg format, but create fuse? The iceberg looks nice. |
Let's do copy-on-write first. (replace blocks whose rows have been deleted with new blocks). This is the fundamental model we should implement (GDPR) Merge-on-read is interesting, let's implement it later if a high volume of mutation is required which the copy-on-write can not meet. |
Summary
Each segment file(parquet) in the underlying storage is immutable; it is not suited for mutable operations (e.g., Delete, Update).
There are many ways to improve that, such as adding a hidden column for each row:
1
mark for deleted,0
is normal.The parquet file also needs to read the old and write the new mark flag to the new, the parquet file is
compact
format(vs.wide
format), so it is not well suited for any modification-related operations.For Databend deletion, the first version must work, then the performance. The initial plan is:
There is already a draft PR in #4228, but it still needs work.
The text was updated successfully, but these errors were encountered: