Skip to content

pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values #6210

@alec-heif

Description

@alec-heif

Apache Iceberg version

No response

Query engine

No response

Please describe the bug 🐞

The logic in https://github.com/apache/iceberg/blob/master/python/pyiceberg/avro/decoder.py#L70 appears to be incorrect.

The spec for a binary-encoded int in the manifest files is as follows:

int | Stored as 4-byte little-endian

so, an example bytestring of 0xad4a0000 should be read as the decimal 19117:

  1. lsb 0xad is 173
  2. 2nd lsb 0x4a is 74
  3. (74 * 256) + 173 == 19117

however BinaryDecoder does not read this correctly:

import io
def as_fo(x):
    return io.BytesIO(bytes.fromhex(x))

assert as_fo('ad4a0000').read(4).hex() == 'ad4a0000'
assert BinaryDecoder(as_io('ad4a0000')).read_int() == -4759

it is not obvious by inspection of BinaryDecoder.read_int where the bug is, but it is clearly a bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions