Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 使用delete 删除的数据仍然可以查询到 #34551

Closed
2 of 3 tasks
weilai201 opened this issue May 8, 2024 · 4 comments · Fixed by #35917
Closed
2 of 3 tasks

[Bug] 使用delete 删除的数据仍然可以查询到 #34551

weilai201 opened this issue May 8, 2024 · 4 comments · Fixed by #35917

Comments

@weilai201
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.4

What's Wrong?

--1. 先创建TEST1表
drop table IF EXISTS TEST1;
CREATE TABLE IF NOT EXISTS TEST1 (
ID int NULL COMMENT 'ID',
ID1 int NULL COMMENT 'ID1',
ID2 int NULL COMMENT 'ID2'
) UNIQUE KEY(
ID
) DISTRIBUTED BY HASH(ID) BUCKETS AUTO PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"enable_unique_key_merge_on_write" = "true",
"store_row_column" = "true"
);

-- 2. 表中插入一条记录
insert into TEST1(ID,ID1,ID2) values(1,1,1);
select * from TEST1;
查询结果
image

-- 3. 删除此记录
delete from TEST1 where ID=1;

--4. 设置部分更新
set enable_unique_key_partial_update=true;
set enable_insert_strict=false;

--5. 在此插入相同key值记录
insert into TEST1(ID,ID1) values(1,2);
select * from TEST1;

image

What You Expected?

上面第五步骤中期望结果为
image

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@weilai201
Copy link
Author

我看到2.1.2 版本中修复了类似问题,不知道能不能解决我的问题:#32574

@weilai201
Copy link
Author

使用2.1.2版本,此问题仍然存在

@Yukang-Lian
Copy link
Collaborator

Prerequisite Knowledge: During the process of updating partial columns, existing data may be read. If the key of the newly imported data for partial columns already exists, BE (Backend) will read the rows with the same key and update the data of the partial columns. If the key does not exist, BE will write the newly imported data of the partial columns and fill in the default values for other columns.
Since the storage engine used by Doris is similar to the LSM-tree structure, the delete statement does not actually delete the data but marks it for deletion. The delete statement in Doris writes the delete information into the delete predicate, marking the data for deletion with a predicate.

Reason: In step 3, the data with key=1 was deleted, so key=1 was written into the delete predicate. In step 5, when partially updating the columns with key=1, the old data was read without considering the effect of the delete predicate, resulting in the ID2 column showing the previously deleted value.

Solution: When reading and completing data for partial column updates, consider the delete predicate to solve the issue.

@ycycse
Copy link
Member

ycycse commented May 16, 2024

Hi, I'll try this.

@Yukang-Lian Yukang-Lian assigned ycycse and unassigned zhannngchen and Yukang-Lian May 16, 2024
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 15, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 17, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 18, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 19, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 19, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 20, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 24, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jun 26, 2024
Problem: As shown in the issue above, if a key deleted by a delete statement is written to by updating only certain columns, the data will not display correctly.

Reason: The delete statement deletes the data by writing a delete predicate, which is stored in the rowset meta and applied during data retrieval to filter the data. However, partial column updates do not consider the effect of the delete predicate when reading the original data. The imported key should be considered as a new key (since it has already been deleted), but it is actually treated as an old key. Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates, but this method will result in reading more columns, as shown in apache#35766. Thus, in this PR, we change the delete operation in the mow table from writing a delete predicate to writing a delete sign, which effectively resolves the issue.
dataroaring pushed a commit that referenced this issue Jun 28, 2024
…table (#35917)

## Proposed changes

close #34551 

Problem: As shown in the issue above, if a key deleted by a delete
statement is written to by updating only certain columns, the data will
not display correctly.

Reason: The delete statement deletes the data by writing a delete
predicate, which is stored in the rowset meta and applied during data
retrieval to filter the data. However, partial column updates do not
consider the effect of the delete predicate when reading the original
data. The imported key should be considered as a new key (since it has
already been deleted), but it is actually treated as an old key.
Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates,
but this method will result in reading more columns, as shown in #35766.
Thus, in this PR, we change the delete operation in the mow table from
writing a delete predicate to writing a delete sign, which effectively
resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jul 10, 2024
…table (apache#35917)

close apache#34551

Problem: As shown in the issue above, if a key deleted by a delete
statement is written to by updating only certain columns, the data will
not display correctly.

Reason: The delete statement deletes the data by writing a delete
predicate, which is stored in the rowset meta and applied during data
retrieval to filter the data. However, partial column updates do not
consider the effect of the delete predicate when reading the original
data. The imported key should be considered as a new key (since it has
already been deleted), but it is actually treated as an old key.
Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates,
but this method will result in reading more columns, as shown in apache#35766.
Thus, in this PR, we change the delete operation in the mow table from
writing a delete predicate to writing a delete sign, which effectively
resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jul 15, 2024
…table (apache#35917)

close apache#34551

Problem: As shown in the issue above, if a key deleted by a delete
statement is written to by updating only certain columns, the data will
not display correctly.

Reason: The delete statement deletes the data by writing a delete
predicate, which is stored in the rowset meta and applied during data
retrieval to filter the data. However, partial column updates do not
consider the effect of the delete predicate when reading the original
data. The imported key should be considered as a new key (since it has
already been deleted), but it is actually treated as an old key.
Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates,
but this method will result in reading more columns, as shown in apache#35766.
Thus, in this PR, we change the delete operation in the mow table from
writing a delete predicate to writing a delete sign, which effectively
resolves the issue.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Jul 29, 2024
…table (apache#35917)

close apache#34551

Problem: As shown in the issue above, if a key deleted by a delete
statement is written to by updating only certain columns, the data will
not display correctly.

Reason: The delete statement deletes the data by writing a delete
predicate, which is stored in the rowset meta and applied during data
retrieval to filter the data. However, partial column updates do not
consider the effect of the delete predicate when reading the original
data. The imported key should be considered as a new key (since it has
already been deleted), but it is actually treated as an old key.
Therefore, only some columns are updated, leading to incorrect results.

Solution: Consider the delete predicate during partial column updates,
but this method will result in reading more columns, as shown in apache#35766.
Thus, in this PR, we change the delete operation in the mow table from
writing a delete predicate to writing a delete sign, which effectively
resolves the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants