-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dtype for tz-naive DatetimeArray and TimedeltaArray #24662
Comments
@jorisvandenbossche would you expect the following to be placed in an ExtensionBlock (backed by a TimedeltaArra), or a TimedeltaBlock (backend by an ndarray)? pd.Series(pd.array(['1H', '2H'], dtype=pd.core.dtypes.dtypes.TimedeltaDtype())) For now, I think this should continue to be a TimedeltaBlock. |
Yes, that is what I expect as well (so that for those arrays a special case is made to construct a different kind of block). Because we should internally have only one way to store timedelta data. |
I abandoned the PR #24674. There are a few issues around |
leaning towards "dont bother" |
On further consideration, I think this might be useful for code de-duplication. In particular a bunch of methods that are currently implemented on both Timestamp and DatetimeArray could instead be implemented on |
What methods are you thinking about? |
The clearest example is |
We're speaking about a dtype, right? |
The method on the dtype would take |
But if it is just a shared implementation that both Timestamp and DatetimeArray call, this shared implementation can also just be a function that takes To be clear: if there are more algos that can be shared between both, that sounds good, I am just trying to understand what the dtype (this issue) has to do with it. |
Both options work. ATM I'm leaning towards an class-based (i.e. dtype-based) approach being cleaner. |
A slight argument in favor of a class-based approach: pyarrow |
Right now,
Datetime.Array.dtype
can be eithernp.dtype('M8[ns])
or aDatetimeTZDtype
, depending on whether the values are tz-naive or tz-aware. This means that whileDatetimeArray[tz-naive]
is an instance ofExtensionArray
, it doesn't satisfy the minimum ExtensionArray API, which requires thatarray.dtype
be anExtensionDtype
.The causes some type-unsoundness for places that are supposed to return an ExtensionArray. The two most prominent being
pd.array
andSeries.array
. As an example, following isn't necessarily safe codethat will fail for tz-naive datetime data, because its
.dtype
is a NumPy dtype.Proposal:
DatetimeDtype
(or allowDatetimeTZDtype
to havetz=None
). Make aTimedeltaArray.dtype
DatetimeArray.dtype
andTimedeltaArray.dtype
is always an ExtensionDtypeSeries.dtype
andDatetimeIndex.dtype
to continue returning the NumPy dtypeThe last step is to avoid breaking code relying on
Series[tz-naive].dtype
being a NumPy dtype.The text was updated successfully, but these errors were encountered: