Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.24.0rc1: infer_dtype(DatetimeIndex) returns "datetime64" not "datetime64[ns]" #24739

Closed
kou opened this issue Jan 12, 2019 · 5 comments · Fixed by #24806
Closed

0.24.0rc1: infer_dtype(DatetimeIndex) returns "datetime64" not "datetime64[ns]" #24739

kou opened this issue Jan 12, 2019 · 5 comments · Fixed by #24806
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@kou
Copy link

kou commented Jan 12, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

datetime_index = pd.DatetimeIndex(['2017-08-01', '2017-08-02'])
print(datetime_index)
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)
print(datetime_index.dtype)
# datetime64[ns]
infered_dtype = pd.api.types.infer_dtype(datetime_index, skipna=True)
print(infered_dtype)
# datetime64
print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
# Traceback (most recent call last):
#   File "/tmp/a.py", line 8, in <module>
#     print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
#   File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 308, in __new__
#     dtype=dtype, **kwargs)
#   File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/datetimes.py", line 303, in __new__
#     int_as_wall_time=True)
#   File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 368, in _from_sequence
#     ambiguous=ambiguous, int_as_wall_time=int_as_wall_time)
#   File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 1706, in sequence_to_dt64ns
#     dtype = _validate_dt64_dtype(dtype)
#   File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 1993, in _validate_dt64_dtype
#     .format(dtype=dtype))
# ValueError: Unexpected value for 'dtype': 'datetime64'. Must be 'datetime64[ns]' or DatetimeTZDtype'.

Problem description

We can't use dtype inferred from DatetimeIndex to convert an Index to DatetimeIndex with pandas 0.24.0rc1.

pyarrow uses this logic to convert Arrow objects to pandas objects.

FYI: Here are related codes:

FYI: Here is an error in pyarrow test: https://travis-ci.org/kszucs/crossbow/builds/478558634#L2724-L2788

#24478 introduces a validation for datetime64. But pd.api.types.infer_dtype still returns 'datetime64' for DatetimeIndex.

The following change fixes this problem. But I'm not sure whether this is a regression or pyarrow's use case is wrong.

diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx
index 85eb6c342..6271d1204 100644
--- a/pandas/_libs/lib.pyx
+++ b/pandas/_libs/lib.pyx
@@ -928,7 +928,7 @@ _TYPE_MAP = {
     'U': 'unicode' if PY2 else 'string',
     'bool': 'boolean',
     'b': 'boolean',
-    'datetime64[ns]': 'datetime64',
+    'datetime64[ns]': 'datetime64[ns]',
     'M': 'datetime64',
     'timedelta64[ns]': 'timedelta64',
     'm': 'timedelta64',

Expected Output

import pandas as pd

datetime_index = pd.DatetimeIndex(['2017-08-01', '2017-08-02'])
print(datetime_index)
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)
print(datetime_index.dtype)
# datetime64[ns]
infered_dtype = pd.api.types.infer_dtype(datetime_index, skipna=True)
print(infered_dtype)
# datetime64[ns]
print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: ja_JP.UTF-8
LOCALE: ja_JP.UTF-8

pandas: 0.24.0rc1
pytest: 3.10.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.2
numpy: 1.16.0rc2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@jreback
Copy link
Contributor

jreback commented Jan 12, 2019

these are not designed to be dtypes, rather a type indication of what’s actually inside a non strongly typed array (object); so this is correct

@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Jan 12, 2019
@jorisvandenbossche
Copy link
Member

The behaviour of infer_dtype might be correct, but what did change in 0.23.4 -> 0.24.0rc is that we no longer accept 'datetime64' as short-hand for 'datetime64[ns]' when passed as a dtype argument.

We should maybe deprecate that first instead of directly raising an error.

@jorisvandenbossche
Copy link
Member

So to clarify, we actually deprecated that before for Series, but appararently we did not do that for Index:

On 0.23.4:

In [17]: pd.Series([1, 2, 3], dtype='datetime64') 
/home/joris/miniconda3/bin/ipython:1: FutureWarning: Passing in 'datetime64' dtype with no frequency is deprecated and will raise in a future version. Please pass in 'datetime64[ns]' instead.
  #!/home/joris/miniconda3/bin/python
/home/joris/miniconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2862: FutureWarning: Passing in 'datetime64' dtype with no frequency is deprecated and will raise in a future version. Please pass in 'datetime64[ns]' instead.
  exec(code_obj, self.user_global_ns, self.user_ns)
Out[17]: 
0   1970-01-01 00:00:00.000000001
1   1970-01-01 00:00:00.000000002
2   1970-01-01 00:00:00.000000003
dtype: datetime64[ns]

In [18]: pd.Index([1, 2, 3], dtype='datetime64') 
Out[18]: 
DatetimeIndex(['1970-01-01 00:00:00.000000001',
               '1970-01-01 00:00:00.000000002',
               '1970-01-01 00:00:00.000000003'],
              dtype='datetime64[ns]', freq=None)

@mroeschke mroeschke added the Dtype Conversions Unexpected or buggy dtype conversions label Jan 13, 2019
@kou
Copy link
Author

kou commented Jan 14, 2019

this is correct

OK. pyarrow stops using infer_dtype for datetime data.

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche based on your post in
#24739 (comment) this seems to be the same issue as #24753, right? I'll close one or the other.

IMO, we should ensure that Index(..., dtype='datetime64') continues to work (with a warning) for a release.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 16, 2019
This deprecates passing dtypes without a precision to DatetimeIndex
and TimedeltaIndex

```python
In [2]: pd.DatetimeIndex(['2000'], dtype='datetime64')
/Users/taugspurger/.virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: Passing in 'datetime64' dtype with no precision is deprecated
and will raise in a future version. Please pass in
'datetime64[ns]' instead.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3
Out[2]: DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', freq=None)
```

Previously, we ignored the precision, so that things like

```
In [3]: pd.DatetimeIndex(['2000'], dtype='datetime64[us]')
Out[3]: DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', freq=None)
```

worked. That is deprecated as well.

Closes pandas-dev#24739
Closes pandas-dev#24753
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants