-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
v2.1.6
What's Wrong?
Iceberg Dangling Deletes 影响数量统计
test case: doris/samples/datalake/iceberg_and_paimon
bash start_all.sh
bash start_doris_client.shspark:
> select version();
3.5.1 fd86f85e181fc2dc0f50a096855acf83a6cc5d9c
CREATE TABLE demo.db_iceberg.tb_iceberg (
id BIGINT NOT NULL,
val STRING)
USING iceberg
LOCATION 's3://warehouse/wh/db_iceberg/tb_iceberg'
TBLPROPERTIES (
'current-snapshot-id' = '2047510404873857005',
'format' = 'iceberg/parquet',
'format-version' = '2',
'identifier-fields' = '[id]',
'upsert-enabled' = 'true',
'write.delete.mode' = 'merge-on-read',
'write.parquet.compression-codec' = 'zstd',
'write.update.mode' = 'merge-on-read',
'write.upsert.enabled' = 'true');
insert into demo.db_iceberg.tb_iceberg values(1, 'abd');
update demo.db_iceberg.tb_iceberg set val = 'def' where id = 1;
update demo.db_iceberg.tb_iceberg set val = 'hgk' where id = 1;
call demo.system.rewrite_data_files(table => 'demo.db_iceberg.tb_iceberg', options => map('min-input-files', '1'));
call demo.system.expire_snapshots(table => 'demo.db_iceberg.tb_iceberg', older_than => timestamp'2024-10-22 12:41:00');
insert into demo.db_iceberg.tb_iceberg values(2, 'abd');~/mc ls minio/warehouse/wh/db_iceberg/tb_iceberg/data/
[2024-10-22 12:38:36 CST] 1.4KiB STANDARD 00000-4-c401aec0-dab0-4476-b99e-c67022be3505-00001-deletes.parquet
[2024-10-22 12:42:41 CST] 637B STANDARD 00000-624-9bb2caa4-0c97-4588-8f6b-68b72f970905-0-00001.parquet
[2024-10-22 12:40:03 CST] 646B STANDARD 00000-7-d78a7a7d-a615-429b-b437-31c66d6a00b0-0-00001.parquetD select * from read_parquet('s3://warehouse/wh/db_iceberg/tb_iceberg/data/00000-624-9bb2caa4-0c97-4588-8f6b-68b72f970905-0-00001.parquet');
┌───────┬─────────┐
│ id │ val │
│ int64 │ varchar │
├───────┼─────────┤
│ 2 │ abd │
└───────┴─────────┘
D select * from read_parquet('s3://warehouse/wh/db_iceberg/tb_iceberg/data/00000-4-c401aec0-dab0-4476-b99e-c67022be3505-00001-deletes.parquet');
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────┐
│ file_path │ pos │
│ varchar │ int64 │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┤
│ s3://warehouse/wh/db_iceberg/tb_iceberg/data/00000-2-38e5f4da-8e99-43a2-ba15-f648adc6483b-00001.parquet │ 0 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────┘
D select * from read_parquet('s3://warehouse/wh/db_iceberg/tb_iceberg/data/00000-7-d78a7a7d-a615-429b-b437-31c66d6a00b0-0-00001.parquet');
┌───────┬─────────┐
│ id │ val │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ hgk │
└───────┴─────────┘
doris:
mysql> select version from backends();
+-----------------------------+
| version |
+-----------------------------+
| doris-2.1.6-rc04-653e315ba5 |
+-----------------------------+
mysql> select count(id) from iceberg.db_iceberg.tb_iceberg;
+-----------+
| count(id) |
+-----------+
| 2 |
+-----------+
1 row in set (0.10 sec)
mysql> select count(*) from iceberg.db_iceberg.tb_iceberg; -- wrong
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.07 sec)
mysql> select * from iceberg.db_iceberg.tb_iceberg;
+------+------+
| id | val |
+------+------+
| 1 | hgk |
| 2 | abd |
+------+------+
2 rows in set (0.06 sec)
使用rewrite_position_delete_files清理
spark:
spark-sql ()> CALL demo.system.rewrite_position_delete_files(table => 'db_iceberg.tb_iceberg', options => map('rewrite-all', 'true'));
1 0 1440 0
doris:
mysql> refresh table iceberg.db_iceberg.tb_iceberg;
Query OK, 0 rows affected (0.01 sec)
mysql> select count(*) from iceberg.db_iceberg.tb_iceberg; -- right
+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row in set (0.10 sec)
mysql> select * from iceberg.db_iceberg.tb_iceberg;
+------+------+
| id | val |
+------+------+
| 2 | abd |
| 1 | hgk |
+------+------+
2 rows in set (0.06 sec)
What You Expected?
正确处理Iceberg Dangling Deletes
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
No labels