PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

phofl · 2023-04-08T22:20:32Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

dr = pd.Series(pd.date_range("2019-12-31", periods=1_000_000, freq="s").astype(pd.ArrowDtype(pa.timestamp(unit="ns"))), name="a")
dr.to_csv("tmp.csv")
pd.read_csv("tmp.csv", engine="pyarrow", dtype_backend="pyarrow", parse_dates=["a"])

The read call takes 1.6 seconds, without parse dates it's down to 0.01 and pyarrow already enforces timestamp

            int64[pyarrow]
a    timestamp[s][pyarrow]
dtype: object

This was introduced by the dtype backend I guess, so would like to fix soonish

Installed Versions

main

Prior Performance

No response

The text was updated successfully, but these errors were encountered:

phofl added Performance Memory or execution speed performance Needs Triage Issue that has not been reviewed by a pandas team member IO CSV read_csv, to_csv Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2023

phofl added this to the 2.0.1 milestone Apr 8, 2023

phofl mentioned this issue Apr 8, 2023

PERF: Improve performance for arrow engine and dtype_backend=pyarrow for datetime conversion #52548

Merged

5 tasks

phofl closed this as completed in #52548 Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

phofl commented Apr 8, 2023 •

edited

Loading

PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

Comments

phofl commented Apr 8, 2023 • edited Loading

Pandas version checks

Reproducible Example

Installed Versions

Prior Performance

phofl commented Apr 8, 2023 •

edited

Loading