Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: concat_{categorical, compat} leads to erroneous result on non-ns datetime-EA #33331

Open
Tracked by #1 ...
xhochy opened this issue Apr 6, 2020 · 5 comments
Open
Tracked by #1 ...
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Needs Tests Unit test(s) needed to prevent regressions Non-Nano datetime64/timedelta64 with non-nanosecond resolution Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@xhochy
Copy link
Contributor

xhochy commented Apr 6, 2020

Code to reproduce:

import numpy as np
import pandas as pd

np_datetimes = np.array([datetime.date(2010, 1, 1)], dtype="datetime64[D]")
other = pd.array(["a", "b"], dtype="category")
pd.core.dtypes.concat.concat_categorical([np_datetimes, other])
# outputs:
#   array([Timestamp('1970-01-01 00:00:00.000014610'), 'a', 'b'], dtype=object)
# expected either one of
#   a) array([Timestamp('2010-01-01 00:00:00'), 'a', 'b'], dtype=object)
#   b) array([datetime.date(2010, 1, 1), 'a', 'b'], dtype=object)

This happens as concat_datetime / _convert_datetimelike_to_object assumes that datetimes are nanoseconds only.

@xhochy xhochy added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020
@xhochy
Copy link
Contributor Author

xhochy commented Apr 6, 2020

Possible options include support for non-nanosecond timestamps in _convert_datetimelike_to_object or that we directly convert to np.array(…, dtype=object) in concat_categorical.

@jreback
Copy link
Contributor

jreback commented Apr 6, 2020

FYI @xhochy happy to have these issues, but keep in mind that i8 backing of anything datetime related is quite baked in; we will likey need an extended period and a fair amount of effort to generalize this.

I would make a master issue that references these other issues (with check boxes).

@xhochy
Copy link
Contributor Author

xhochy commented Apr 6, 2020

I can make a master issue once I come across more of these.

Also I'm aware that this is non-trivial and I expect nobody but me to implement anything here.

@jbrockmendel jbrockmendel added ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 7, 2020
@jbrockmendel jbrockmendel added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Jan 17, 2022
@jbrockmendel
Copy link
Member

works on main. could use a test

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Dec 16, 2022
@urmikakasi
Copy link

working on the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Needs Tests Unit test(s) needed to prevent regressions Non-Nano datetime64/timedelta64 with non-nanosecond resolution Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants