Skip to content

Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model) #6104

@SHuixo

Description

@SHuixo

Apache Iceberg version

0.14.1

Query engine

Flink

Please describe the bug 🐞

flink: 1.13.5
iceberg: 1.13.2 / 1.14.1

When using Rewrite files action API rewriteDataFiles(), the new compressed file is generated without a corresponding manifest file, I tried iceberg versions 1.13.2 and 1.14.1 which has a similar problem under the iceberg-catalog of Hive and Hadoop.

The Iceberg Maven dependent, table structure and code to compress the file using the Java API is as follows:

      <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-flink-runtime-1.13</artifactId>
<!--            <version>0.13.2</version>-->
            <version>0.14.1</version>
        </dependency>
name: iceberg_hive_catalog
type: iceberg
catalog-type: hive
uri: thrift://xxxxx:9083
clients: 5
property-version: 1
warehouse: hdfs://nameservice1/user/hive/warehouse/

create table iceberg_hive_catalog.dhome_db.ods_d_base_inf_229_iceberg (
`did` string,
`name` string,
`address` string,
`did_seq`  string,
PRIMARY KEY (did_seq) NOT ENFORCED
) with (
 'format-version'='2',
 'write.upsert.enabled'='true',
 'write.metadata.delete-after-commit.enabled'='true',
 'write.metadata.previous-versions-max'='5',
 'flink.rewrite.enable' = 'true',
 'flink.rewrite.parallelism' = '5',
 'flink.rewrite.target-file-size-bytes' = '536870912',
 'flink.rewrite.max-files-count' = '5'
);
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
TableIdentifier identifier = TableIdentifier.of("dhome_db", "ods_d_base_inf_229_iceberg");
TableLoader tableLoader = TableLoader.fromCatalog(hive_iceberg, identifier);
tableLoader.open();
Table table_iceberg = tableLoader.loadTable();

Actions.forTable(env, table_iceberg)
		.rewriteDataFiles()
		.maxParallelism(5)
		.targetSizeInBytes(128*1024*1024)
		.execute();

The results:

result

If there is anything wrong with the question, please correct it, thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions