Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Indexing with a date string and a pyarrow timestamp index raises #53154

Open
2 of 3 tasks
matteosantama opened this issue May 9, 2023 · 4 comments
Open
2 of 3 tasks
Labels
Arrow pyarrow functionality Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@matteosantama
Copy link
Contributor

matteosantama commented May 9, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame(range(31), index=pd.date_range("2022-01-01", "2022-01-31"))
df.to_parquet("data.parquet")

# this works
pd.read_parquet("data.parquet", dtype_backend="numpy_nullable").loc["2022-01-15":]

# this doesn't
pd.read_parquet("data.parquet", dtype_backend="pyarrow").loc["2022-01-15":]

# but this does
import datetime as dt

pd.read_parquet("data.parquet", dtype_backend="pyarrow").loc[dt.datetime(2022, 1, 15):]

Issue Description

dtype_backend="pyarrow" doesn't allow for the use of strings for indexing dates.

Expected Behavior

Consistency between the two backends. Beyond consistency, the string-based indexing is much more ergonomic.

Installed Versions

INSTALLED VERSIONS

commit : 37ea63d
python : 3.11.2.final.0
python-bits : 64
OS : Darwin
OS-release : 22.2.0
Version : Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.1
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.1.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@matteosantama matteosantama added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 9, 2023
@mroeschke mroeschke added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 9, 2023
@mroeschke mroeschke changed the title BUG: pyarrow vs. numpy datetime indexing BUG: Indexing with a date string and a pyarrow timestamp index raises May 9, 2023
@mroeschke mroeschke added the Arrow pyarrow functionality label May 9, 2023
@parthi-siva
Copy link
Contributor

take

@parthi-siva
Copy link
Contributor

When using dtype_backend=numpy_nullable the type of index is DatetimeIndex. But when using dtype_backend="pyarrow" the type of index is Index.

if index is DatetimeIndex _maybe_cast_slice_bound method in pandas/core/indexes/datetimelike.py correctly parses the string in loc to datetime. Thats why this is working
pd.read_parquet("data.parquet", dtype_backend="numpy_nullable").loc["2022-01-15":]

But index is just Index _maybe_cast_slice_bound method in pandas/core/indexes/base.py which don't parse the string causing it to through exception.

@mroeschke ^^

@Julian048
Copy link
Contributor

@parthi-siva are you still working on this issue?

@parthi-siva
Copy link
Contributor

Hi @Julian048 yeah.. I'm still working on it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants