Skip to content

[C++][Python] Duplicate csv header when table batches start with empty #36889

@0x26res

Description

@0x26res

Describe the bug, including details regarding any error messages, version, and platform.

pyarrow.csv.write_csv works as expected, but when the first record batch in a table is empty, it writes two headers (on 2 different lines) instead of one:

import pyarrow.csv
import pyarrow as pa
import io

table = pa.table({"col1": ["a", "b", "c"]})

with io.BytesIO() as fp:
    pyarrow.csv.write_csv(table, fp)
    fp.seek(0)
    assert fp.read() == b'"col1"\n"a"\n"b"\n"c"\n'

with io.BytesIO() as fp:
    pyarrow.csv.write_csv(pa.concat_tables([table.schema.empty_table(), table]), fp)
    fp.seek(0)
    # THIS IS WRONG:
    assert fp.read() == b'"col1"\n"col1"\n"a"\n"b"\n"c"\n'

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions