ENH: xorbits's read_parquet compatible with pandas on pyarrow engine #770

luweizheng · 2024-05-20T14:40:51Z

Xorbits integrates the pyarrow backend. See this blog post for more info. And we also introduce use_arrow_dtype in read_parquet. If we install the pyarrow backend, Xorbits will detect it, marking use_arrow_dtype to True in the configuration and it will read parquet with arrow dtype. Some dtypes of pyarrow and pandas are different, for example, timestamp. Suppose time is a timestamp column. If time is a pandas dtype we can do it like this: df["time"].dt. But pyarrow does not have dt attribute.

If arrow is installed, xorbits use arrow and use_arrow_dtype of the configuration is set as true. So here we read data in pyarrow format: https://github.com/xorbitsai/xorbits/blob/b1f1107af931e9101b22e4f1e000add3820297b5/python/xorbits/_mars/dataframe/datasource/read_parquet.py#L181C1-L201C18

We may include this in our document or change the default behavior of the ArrowEngine when reading parquet files.

The text was updated successfully, but these errors were encountered:

XprobeBot added the enhancement New feature or request label May 20, 2024

XprobeBot added this to the v0.7.3 milestone May 20, 2024

XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: xorbits's read_parquet compatible with pandas on pyarrow engine #770

ENH: xorbits's read_parquet compatible with pandas on pyarrow engine #770

luweizheng commented May 20, 2024 •

edited

Loading

ENH: xorbits's read_parquet compatible with pandas on pyarrow engine #770

ENH: xorbits's read_parquet compatible with pandas on pyarrow engine #770

Comments

luweizheng commented May 20, 2024 • edited Loading

luweizheng commented May 20, 2024 •

edited

Loading