Skip to content

[parquet] Support writing logically equivalent types to ArrowWriter #8012

@albertlockett

Description

@albertlockett

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In #8005 we added the capability to have the ArrowWriter accept record batches containing columns that are either the native array type, or a dictionary of values containing the same Arrow DataType.

For example, RecordBatch A contains column col of type DataType::Utf8 and RecordBatch B containing column col with type DataType::Dictionary<_, DataType::Utf8> can both be written by the same writer.

We can further improve the capability of the to detect data types that are logically equivalent. For example String and LargeString, or String, LargeString, and StringView.

Describe the solution you'd like
When the ArrowColumnWriter checks if the type for the array being written is compatible with its field, it should the logic should be improved to account all types that are logically equivalent (e.g. array types that contain the same value).

Describe alternatives you've considered

Additional context
Related discussion: #8005 (review)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions