We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
通过spark3.3.3 往amoro0.5.1版本管理的iceberg1.3.1表里插入数据,总数据量11710条,刚插入完成时,查询表中数据量正常,经过一次MINOR optimizing 再查表中数据,就只有是7271条数据,经确认,数据ID不重复,以下是相关信息:
CREATE TABLE jid.dwd.pat_main1 ( pid string COMMENT 'ID', ct timestamp COMMENT '创建时间', an string, pn string, ad date , pd date , db_name string, primary key (pid) ) USING arctic PARTITIONED BY (db_name,bucket(8,pid)) TBLPROPERTIES ( 'format-version'='2', 'write.metadata.previous-versions-max' = '5', 'write.metadata.delete-after-commit.enabled'= 'true', 'write.upsert.enabled' = 'true', 'self-optimizing.enabled' = 'true', 'change.data.ttl.minutes' = '20', 'snapshot.change.keep.minutes' = '20', 'snapshot.base.keep.minutes' = '10', 'table-expire.enabled' = 'true', 'self-optimizing.max-file-count' = '1000000', 'clean-orphan-file.min-existing-time-minutes' = '15', 'self-optimizing.group' = 'amoro_e_flink', 'clean-orphan-file.enabled' = 'true' ); 查询表和对应的chang,结果如下 查看对应的hdfs数据,发现base中少了两个分区
Amoro0.5.1
Spark
No response
The text was updated successfully, but these errors were encountered:
Thank you for reporting this bug. BTW, Amoro community encourages users to communicate in English.
Sorry, something went wrong.
optimized-sequence
As far as I know, This is Mixed-format?
Successfully merging a pull request may close this issue.
What happened?
通过spark3.3.3 往amoro0.5.1版本管理的iceberg1.3.1表里插入数据,总数据量11710条,刚插入完成时,查询表中数据量正常,经过一次MINOR optimizing 再查表中数据,就只有是7271条数据,经确认,数据ID不重复,以下是相关信息:
建表语句
CREATE TABLE jid.dwd.pat_main1 (
pid string COMMENT 'ID',
ct timestamp COMMENT '创建时间',
an string,
pn string,
ad date ,
pd date ,
db_name string,
primary key (pid)
) USING arctic
PARTITIONED BY (db_name,bucket(8,pid))
TBLPROPERTIES (
'format-version'='2',
'write.metadata.previous-versions-max' = '5',
'write.metadata.delete-after-commit.enabled'= 'true',
'write.upsert.enabled' = 'true',
'self-optimizing.enabled' = 'true',
'change.data.ttl.minutes' = '20',
'snapshot.change.keep.minutes' = '20',
'snapshot.base.keep.minutes' = '10',
'table-expire.enabled' = 'true',
'self-optimizing.max-file-count' = '1000000',
'clean-orphan-file.min-existing-time-minutes' = '15',
'self-optimizing.group' = 'amoro_e_flink',
'clean-orphan-file.enabled' = 'true'
);
查询表和对应的chang,结果如下
查看对应的hdfs数据,发现base中少了两个分区
Affects Versions
Amoro0.5.1
What engines are you seeing the problem on?
Spark
How to reproduce
No response
Relevant log output
No response
Anything else
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: