PyArrow Parquet column partitioning

I am writing some data to parquet format that requires partitioning the dataset by the column value and the column value contains non-alphanumeric characters. For example, our date format is '2021/08/30' and when the data gets written into the filesystem, the folder structure splits on the '/' character and it looks somehing like 'date=2021'/08/30/somedata.parquet. I was wonder how to get around the folder splitting behaviour, so that the folder would be something like this /'date=2021/08/30'/somedata.parquet

This is the most similar issue report that I can find on the java arrow/parquet repo.
https://github.com/apache/parquet-mr/pull/361

Thanks for all the help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyArrow Parquet column partitioning #11027

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyArrow Parquet column partitioning #11027

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions