Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Data lost after optimizing in Mixed format #2253

Closed
1 of 2 tasks
Tracked by #2448
jefyjiang opened this issue Nov 6, 2023 · 2 comments · Fixed by #2249
Closed
1 of 2 tasks
Tracked by #2448

[Bug]: Data lost after optimizing in Mixed format #2253

jefyjiang opened this issue Nov 6, 2023 · 2 comments · Fixed by #2249
Labels
type:bug Something isn't working

Comments

@jefyjiang
Copy link
Contributor

What happened?

通过spark3.3.3 往amoro0.5.1版本管理的iceberg1.3.1表里插入数据,总数据量11710条,刚插入完成时,查询表中数据量正常,经过一次MINOR optimizing 再查表中数据,就只有是7271条数据,经确认,数据ID不重复,以下是相关信息:

建表语句

CREATE TABLE jid.dwd.pat_main1 (
pid string COMMENT 'ID',
ct timestamp COMMENT '创建时间',
an string,
pn string,
ad date ,
pd date ,
db_name string,
primary key (pid)
) USING arctic
PARTITIONED BY (db_name,bucket(8,pid))
TBLPROPERTIES (
'format-version'='2',
'write.metadata.previous-versions-max' = '5',
'write.metadata.delete-after-commit.enabled'= 'true',
'write.upsert.enabled' = 'true',
'self-optimizing.enabled' = 'true',
'change.data.ttl.minutes' = '20',
'snapshot.change.keep.minutes' = '20',
'snapshot.base.keep.minutes' = '10',
'table-expire.enabled' = 'true',
'self-optimizing.max-file-count' = '1000000',
'clean-orphan-file.min-existing-time-minutes' = '15',
'self-optimizing.group' = 'amoro_e_flink',
'clean-orphan-file.enabled' = 'true'
);
531b46a7f0b2d074d2d0320069b5dd9
查询表和对应的chang,结果如下
c1dae2ed9360dff6ca2ac130efacc07
查看对应的hdfs数据,发现base中少了两个分区
c6ce812db36a22a01544c170a8987e5
92fcd304292889f3ab7c445d1eb9492

Affects Versions

Amoro0.5.1

What engines are you seeing the problem on?

Spark

How to reproduce

No response

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jefyjiang jefyjiang added the type:bug Something isn't working label Nov 6, 2023
@shidayang
Copy link
Contributor

Thank you for reporting this bug. BTW, Amoro community encourages users to communicate in English.

@shidayang
Copy link
Contributor

As far as I know, This is Mixed-format?

@shidayang shidayang changed the title [Bug]: optimizing后数据丢失 [Bug]: Data lost after optimizing in Mixed format Nov 6, 2023
@zhoujinsong zhoujinsong mentioned this issue Dec 19, 2023
33 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
2 participants