Skip to content

Flaky test TestFlinkIcebergSinkRangeDistributionBucketing > testBucketNumberHigherThanWriterParallelismNotDivisible() #11397

@manuzhang

Description

@manuzhang

Apache Iceberg version

main (development)

Query engine

Flink

Please describe the bug 🐞

https://github.com/apache/iceberg/actions/runs/11525609495/job/32088339013

java.lang.AssertionError: 
    Expecting size of:
      [GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=3/00002-0-2c5978f8-a440-4c4f-be50-86c438ff63b7-00017.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=3}, record_count=11, file_size_in_bytes=1278, column_sizes=org.apache.iceberg.util.SerializableMap@184, value_counts=org.apache.iceberg.util.SerializableMap@1b, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@cb1481a6, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@d4f5ff3d, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=2/00002-0-2c5978f8-a440-4c4f-be50-86c438ff63b7-00018.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=2}, record_count=4, file_size_in_bytes=1138, column_sizes=org.apache.iceberg.util.SerializableMap@fe, value_counts=org.apache.iceberg.util.SerializableMap@12, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@969add9a, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@76639efe, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=1/00000-0-0861575b-99f3-410e-bf09-19007260ee22-00017.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=1}, record_count=7, file_size_in_bytes=1197, column_sizes=org.apache.iceberg.util.SerializableMap@137, value_counts=org.apache.iceberg.util.SerializableMap@f, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@8edc3f28, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@36d4a917, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=0/00000-0-0861575b-99f3-410e-bf09-19007260ee22-00018.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=0}, record_count=11, file_size_in_bytes=1280, column_sizes=org.apache.iceberg.util.SerializableMap@18a, value_counts=org.apache.iceberg.util.SerializableMap@1b, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@1c065f83, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@575e24b0, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=2/00001-0-c258fabc-b05b-4d5a-9930-714bdb254951-00018.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=2}, record_count=8, file_size_in_bytes=1218, column_sizes=org.apache.iceberg.util.SerializableMap@14e, value_counts=org.apache.iceberg.util.SerializableMap@1e, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@4f1d4eba, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@f741ceee, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=1/00001-0-c258fabc-b05b-4d5a-9930-714bdb254951-00017.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=1}, record_count=10, file_size_in_bytes=1266, column_sizes=org.apache.iceberg.util.SerializableMap@178, value_counts=org.apache.iceberg.util.SerializableMap@1c, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@ca85bf2b, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@123a5e0e, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=1/00000-0-0861575b-99f3-410e-bf09-19007260ee22-00019.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=1}, record_count=1, file_size_in_bytes=1084, column_sizes=org.apache.iceberg.util.SerializableMap@a1, value_counts=org.apache.iceberg.util.SerializableMap@5, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@b38647c, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@b38647c, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=1/00001-0-c258fabc-b05b-4d5a-9930-714bdb254951-00019.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=1}, record_count=2, file_size_in_bytes=1099, column_sizes=org.apache.iceberg.util.SerializableMap@db, value_counts=org.apache.iceberg.util.SerializableMap@4, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@961f8456, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@b4913cb9, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7},
        GenericDataFile{content=data, file_path=file:/tmp/junit5_hadoop_catalog-14525356505085993747/fc6fcd29-95ad-4477-8066-cd11f59528ba/default/t/data/ts_hour=2024-10-25-21/uuid_bucket=3/00002-0-2c5978f8-a440-4c4f-be50-86c438ff63b7-00019.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{ts_hour=480525, uuid_bucket=3}, record_count=1, file_size_in_bytes=1084, column_sizes=org.apache.iceberg.util.SerializableMap@a1, value_counts=org.apache.iceberg.util.SerializableMap@5, null_value_counts=org.apache.iceberg.util.SerializableMap@6, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@49f6be09, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@49f6be09, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0, data_sequence_number=7, file_sequence_number=7}]
    to be less than or equal to 7 but was 9
        at org.apache.iceberg.flink.sink.TestFlinkIcebergSinkRangeDistributionBucketing.testParallelism(TestFlinkIcebergSinkRangeDistributionBucketing.java:222)
        at org.apache.iceberg.flink.sink.TestFlinkIcebergSinkRangeDistributionBucketing.testBucketNumberHigherThanWriterParallelismNotDivisible(TestFlinkIcebergSinkRangeDistributionBucketing.java:166)
[TaskExecutorFileMergingManager shutdown hook] INFO org.apache.flink.runtime.state.TaskExecutorFileMergingManager - Shutting down TaskExecutorFileMergingManager.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions