-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed as not planned
Closed as not planned
Copy link
Labels
Description
Apache Iceberg version
0.14.1 (latest release)
Query engine
No response
Please describe the bug 🐞
Hi,
Iceberg does not respect the Avro properties i.e. write.avro.compression-codec and write.avro.compression-level from TBLPROPERTIES while writing Manifest and Manifest list files.
This is because the table properties are not forwarded to Avro WriteBuilder:
iceberg/core/src/main/java/org/apache/iceberg/ManifestWriter.java
Lines 293 to 301 in 731e5f0
| return Avro.write(file) | |
| .schema(manifestSchema) | |
| .named("manifest_entry") | |
| .meta("schema", SchemaParser.toJson(spec.schema())) | |
| .meta("partition-spec", PartitionSpecParser.toJsonFields(spec)) | |
| .meta("partition-spec-id", String.valueOf(spec.specId())) | |
| .meta("format-version", "1") | |
| .overwrite() | |
| .build(); |
Thus the Context defaults to TableProperties#AVRO_COMPRESSION_DEFAULT i.e gzip
iceberg/core/src/main/java/org/apache/iceberg/avro/Avro.java
Lines 207 to 211 in 731e5f0
| static Context dataContext(Map<String, String> config) { | |
| String codecAsString = config.getOrDefault(AVRO_COMPRESSION, AVRO_COMPRESSION_DEFAULT); | |
| String compressionLevel = | |
| config.getOrDefault(AVRO_COMPRESSION_LEVEL, AVRO_COMPRESSION_LEVEL_DEFAULT); | |
| CodecFactory codec = toCodec(codecAsString, compressionLevel); |
Steps to reproduce
scala> sql(" CREATE TABLE tpcds_1_tb_iceberg.manifest_compression (a INT) USING iceberg TBLPROPERTIES ('write.avro.compression-codec'='zstd')")
res0: org.apache.spark.sql.DataFrame = []
scala> spark.range(10).toDF("a").coalesce(1).writeTo("tpcds_1_tb_iceberg.manifest_compression").append()
scala>bash-5.1$ avro-tools getmeta iceberg_warehouse/tpcds_1_tb_iceberg/manifest_compression/metadata/snap-3374754284586474934-1-ac1d7acb-bbe0-484c-b4b2-4e4891a100a3.avro | grep -i avro.codec
22/09/29 16:59:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
avro.codec deflate
bash-5.1$Even though we set the compression to zstd, the underlying Avro file is compressed using Gzip.
Reactions are currently unavailable