-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datetimelike Array Refactor #23185
Comments
cc @jreback @jorisvandenbossche @jbrockmendel OK I wanted to open this issue because I've had trouble keeping up with all the If necessary, we can move to a google doc or something to collaboratively edit I'll try to write up my current thoughts later today. |
You're right (and Joris has expressed this elsewhere) that this conversation has splintered across a lot of places and centralization will help. I'm a big fan of breaking up big problems into smaller more manageable problems (as evidenced by #23159, small-step tslibs refactor, doomed improvement efforts in statsmodels, ...). Are there any logically independent parts of the problem that can be split off? FWIW: |
I mentioned it on one of the PRs, but is it OK for you guys to agree on not merging any of the open PRs, before we have some agreement on the way forward? @jbrockmendel I understand you want to keep working on those PRs you opened (and it's also great that you do so much for pandas!), but let's maybe take a short pause doing new PRs until we agree on how we want to get to the finish-line here (in that sense: can you answer my mail regarding thursday?) |
Sounds good. |
Composition vs. inheritance. The case for composition:
The case against:
Since I'm pro-composition, I'll point out that I think we're required to have |
If there is a nice solution to the immutability/caching issue, then I can get on board with just-composition. Until then, I think both is the way to go for now. i.e. PeriodIndex subclasses PeriodArray and PeriodIndex.values returns a PeriodArray. Your point 1) is the one I find most compelling. It would be really nice if |
I'm not sure how doing both would work in practice. I don't have a good sense for what complications that's likely to create. But, I think that won't be necessary. I think we can manage to cache attributes like
Agreed. At this point, I think that's our only hope of having a consistent definition for what |
On the caching point, this seems to work diff --git a/pandas/core/arrays/period.py b/pandas/core/arrays/period.py
index 24d4b6e55..23a5845f0 100644
--- a/pandas/core/arrays/period.py
+++ b/pandas/core/arrays/period.py
@@ -465,6 +465,14 @@ class PeriodArray(dtl.DatetimeLikeArrayMixin, ExtensionArray):
"Got '{}' instead.".format(type(value).__name__))
raise TypeError(msg)
self._data[key] = value
+ self._invalidate_cache()
+
+ def _invalidate_cache(self):
+ self._cache = {}
+
+ @cache_readonly
+ def hasnans(self):
+ return self.isna().any()
def take(self, indices, allow_fill=False, fill_value=None):
from pandas.core.algorithms import take There may be edge cases, or cases were we could skip invalidating the cache, but it's at least feasible. It didn't come across in the diff, but the call to "tests": without the invalidation In [1]: import pandas as pd
ar
In [2]: arr = pd.core.arrays.period_array(['2000', None], 'D')
In [3]: arr.hasnans
Out[3]: True
In [4]: arr[1] = pd.Period(2000, 'D')
In [5]: arr.hasnans
Out[5]: True with invalidation In [1]: import pandas as pd
In [2]: arr = pd.core.arrays.period_array(['2000', None], 'D')
In [3]: arr.hasnans
Out[3]: True
In [4]: arr[1] = pd.Period(2000, 'D')
In [5]: arr.hasnans
Out[5]: False |
It overlaps with the first point of Tom, but an additional case for composition / disadvantage of inheritance:
I personally also don't see any compelling case for inheritance (also the dispatching is not a reason IMO, as we need to it anyway for Series) So personally, unless someone now actually makes an extensive and detailed argument for inheritance, I would propose to leave this discussion behind us and focus on how to solve possible remaining issues with the composition structure (the constructors, the caching). |
For the inheritance vs composition, we actually have examples to look at in practice: interval and categorical already do composition, while the datetimelikes are currently a kind of inheritance. I think the current datetimelike implementation shows the complexity of this. |
Related to the caching issue, I think there are several options:
So I think there is certainly a solution possible, and my first point about "is this important" is then more to know if this is essential to already have in an initial "big split" PR like the PeriodArray PR, or if this can be left for a follow-up PR. |
Warning, long post coming. My proposal for a possible way forward, let's call it the "minimally big" PR with follow-up PRs proposal:
My reasoning to go for the above way forward compared to a "first smaller clean-up PRs, then split", are the following:
This proposal would mean that master will be in a temporary "messy" state (but still green of course). If we find this a problem, we can always first merge those PRs to a refactor branch, and only after some of the follow-up PRs have been done to that branch as well, merge it into master. Doing the above, would in practice mean: first focus on some of the design discussions (eg the design of the constructors, and other issues mentioned in the top post), focus on doing and reviewing the actual splits, and for now wait with the other smaller PRs. So please give your thoughts about this proposal, feedback, alternative proposals, ... |
If patching
If we were to go whole-hog on inheritance, the data model would be "An Index is an Array with some additional fancy indexing/join/set methods (and a Block is an Array with
That's an interesting idea. I'd like to give it some thought before forming an opinion.
Are they though? My main objection to the Big PR is that it precludes working in parallel. If there was something about the implementation that required it be done All At Once that would be another matter, but as it is there is a lot of non-difficult stuff we can get out of the way before making final decisions about caching and constructors. i.e. the "Minimal Big Split" can be made more minimal. Suppose hypothetically that two things both turn out to be more difficult than expected: the The status quo is that non-#22862 PRs are on hold. While not my first choice, I'd rather see that move forward than go in circles here. Let's see what jreback has to say and reconnoiter. |
Some general comments / points.
|
I'm not sure, but I share this concern. I expect that well have a mixin or base class for DatetimeLikeArray with these common ops, and a base class for DatetimelikeIndex that just does the dispatching. I'm hopeful we won't need a mixin for DatetimelikeIndex.
These two are slightly in tension. AFAIK, right now the only way to update an ExtensionArray inplace is with edit: inplace ops is another, though nans usually (always for inplace?) propagate. |
I suspect this is because it's not on ndarrays, and we didn't have an intermediate array-like that could track these. ndarrays can be manipulated inplace in so many ways that a cache sounds infeasible. |
I feel like I still lack the information to make a judgement call on how to proceed here. So, how about I spend a chunk of time getting #22862 in to a reasonable state. I'll try to make it as minimal as possible while still passing, and in the process I'll identify pieces that can be reasonably split off. I think the biggest outstanding discussion / PR is around constructors and whether #23140 should go first. I'll try to form some thoughts around that quickly. |
Thanks for the answers! (and sorry again for my long answers :))
Because this is an important point, I had a paragraph above trying to explain why I think this does not need to be the case, as I can also argue for the opposite:
What do you mean exactly here? I think @TomAugspurger already figured this out in his PeriodArray PR (at least a minimal working solution that gets the job done). Tom, correct me if I am wrong.
Fully agree here. Apart from that we have an ExtensionBlock instead of the custom ones, there should not be much changes related to blocks.
I think we will typically end up with a base class / mixin for the Arrays to share functionality there, and a mixin / base class for the datetimelike indexes to share things. I think those two mixins can be completely separate (the current Datetimelike Mixin is indeed shared between Arrays and Index, but that is just the temporary confusing state where Index/Array is not yet splitted properly)
We are stuck with it for Index, but IMO that should not mean we should follow the exact same pattern for the Arrays (in all the different PRs related to this we have several times tried to discuss / understand what the different
We can certainly discuss, but in light of keeping the initial big-split PRs as minimal as possible / not have more discussion than needed on it, I would personally keep this discussion for a follow-up (of course, as long as the method is not essential to have the EA interface working). |
+1
If we focus first on the PeriodArray PR, I don't think that PR should be merged. But, it is the discussion that we had on one of the review comments (#23140 (comment)) about the constructors that ideally indeed should be resolved. |
Ops work on the PeriodArray PR by dispatch. Though maybe @jbrockmendel meant ops with caching. # PeriodIndex.__add__
def __add__(self, other):
# dispatch to ExtensionArray implementation
result = self._data.__add__(other)
return wrap_arithmetic_op(self, other, result) |
small FYI: I've updated the original post with a list of PeriodArray blockers that aren't touching any of the datetimelike files (e.g. #23155). Those are bugs in master that hopefully have a clear fix which won't lead away from our end goal. |
Since he mentioned both in the same sentence as two different things, I assumed this was not the case. So hence the question for clarification :-) |
The plan as I understand it is that DatetimeLikeArrayMixin will be mixed into the EA subclasses but will cease to be mixed in to DatetimeLikeIndexMixin.
I meant extending the tests in tests/arithmetic to include the EA subclasses. Since this was just a hypothetical example of "two things going wrong at the same time", let's not spend too much time on it.
Sounds good. In the interim, I'd like to get exceptions to the datetimelike PR moratorium for the following, which I think should have minimal overlap:
I'll hold off pending an explicit OK. |
I pushed an update to https://github.com/pandas-dev/pandas/pull/22862/files that reduced the scope somewhat. Outside of indexes/period.py and ararys/period.py, there shouldn't be any extraneous changes. I can work to reduce the changes to indexes/period.py and ararys/period.py a bit, but I'd like to get the constructors nailed down first. |
Do you have a plan / WIP for this? My WIP PeriodArray PR doesn't do too much there I think. |
I do have a branch about ready for the arithmetic fixes, will open a PR later today. |
Ah yes, sorry, was confusing them. What is the main difference between both branches? #23415 is not yet doing the inheritance/composition switch? (but the plan is to do it in that PR at some point?)
Yes, I know :-) But we still want those EA base tests once they are actual EAs. Maybe this can be done separately in advance, but not sure (can maybe take a look at that). In any case, IMO we should not do it after making them actual EAs.
Are there currently open issues with reported segfaults? In case so, yes that sounds good. |
@jbrockmendel any open PRs from https://github.com/pandas-dev/pandas/pulls/jbrockmendel that could use review? (#23415 isn't quite ready, right?) Any relevant pieces issues that can be offloaded? I'm going to mess around with DatetimeBlock shortly, to see what pieces can be simplified there. |
Not at the moment. Two coming up in the next hour or so: one extending timedelta64 arithmetic tests to TimedeltaArray, another implementing most of the rest of the EA interface for DTA/TDA. The latter doesn't yet have EA tests in place, which can absolutely be offloaded. The other one coming up a little later is implementing DatetimeArray._from_sequence (or more specifically, the datetime64 analogue of #23539) |
For those curious about what step 7 (inheritance to composition) might look
like, see
master...TomAugspurger:disown-datetimelike
Right now, `import pandas` works, but most everything beyond that is broken.
e16dd61
<e16dd61>
is +50
net LOC, and has everything for getting "import pandas" working, but
doesn't change the actual data for the various Index classes.
3af6109
starts on that, but isn't working yet.
@jbrockmendel thoughts on how where we should proceed for the next few
steps? I'd like to get us over to composition soon rather than later.
I think that'll help me better judge your step 5 (extra EA methods that
need to be implemented).
There's already merge conflicts on that branch since this morning, so I'll
not plan to work on it further unless we have consensus that it's time
to make the switch.
…On Mon, Nov 12, 2018 at 10:37 AM jbrockmendel ***@***.***> wrote:
any open PRs [...] that could use review?
Any relevant pieces issues that can be offloaded?
Not at the moment. Two coming up in the next hour or so: one extending
timedelta64 arithmetic tests to TimedeltaArray, another implementing most
of the rest of the EA interface for DTA/TDA. The latter doesn't yet have EA
tests in place, which can absolutely be offloaded.
The other one coming up a little later is implementing
DatetimeArray._from_sequence (or more specifically, the datetime64 analogue
of #23539 <#23539>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23185 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIgwhQkA5aWh85OylF1lxopb5Efo4ks5uuaA3gaJpZM4Xen4p>
.
|
Am I right in thinking that after #23643 the remaining pieces of the EA interface are just For the transition to composition, if someone wants to work on it before the EA interface is complete, the approach you're taking in the disown branch looks reasonable. I'm also seeing a lot of things in that branch that could be implemented immediately before the Arrays are disowned. If somehow we find ourselves with excess labor to throw at related tasks, some orthogonal-ish topics:
I'm not sure off the top of my head whether this bug affects the composition switchover, just that I've seen it a bunch recently. @TomAugspurger is this helpful or too much of a grab-bag? |
My main concern there is that until we inherit from ExtensionArray, those will be untested (or we'll have to duplicate tests). Unless... I suppose we could start inheriting the base tests, without actually inheriting from them yet? That may be worth exploring. Some things like I suspect the transpose bug will be fixed by inheriting from ExtensionArray. |
@TomAugspurger If you have currently the time to further work on that branch to try to switch to composition, I would say: let's push for it now and first (and thus temporarily hold on for other changes). I already said that before, but the sooner we can actually switch to composition, the clearer the follow-up PRs will be (eg now the Also all the datetime-arithmetic related issues, they can in principle be done after the split. As long as artithmetic works for Index/Series (for which it is already tested), it's fine (of course, before releasing we should also fix + test all arithmetic on the arrays as well, but just to say it is not necessary to do that first) |
@jorisvandenbossche I would really appreciate it if you didn't advocate shutting down all progress on things that I'm putting a lot of time and effort into. |
That's not how I read Joris' comment. I read "temporary hold" as... just that. A pause, not a shutting down or throwing away. I think all the effort in open PRs (and possibly some unpushed work you've done) is still vital. There are many paths from master (plus the open PRs) to DatetimeArray. To me, a path that frontloads the switch to composition makes sense, but it's hard to say ahead of time. I've had trouble thinking through all the ramifications of a diff, partly because the current class hierarchy "feels weird" to me, and partly because I'm not familiar with this section of the code base. Anyway, I think that #23675 On my own availability: I'm going to be ramping up on a largish project for dask in the few days / weeks. I'll still have time for pandas, but not as much over the next month or so. So I have a slight window to dump all my time into pandas, that I'd like to take advantage of if possible. |
@jbrockmendel yes, sorry if that is the way it came over. I certainly think we can do parallel work, and not all the items you listed in #23185 (comment) would go in such a split PR, it was just about what to merge first as it will probably be easier to rebase the smaller PRs than the other way around. And also, if we do that push, there will also be a lot of reviewing work :) |
And also, it will depend on the PR of course. If it is something that doesn't overlap a lot (like certain test changes), for sure it doesn't need to wait with being merged. |
Have we had a design discussion on Rename to class DatetimeDtype(ExtensionDtype):
def __new__(self, unit='ns', tz: Optional[Union[str, tzinfo]]=None):
... We remove the "magic" creation from string. In [3]: pd.core.dtypes.dtypes.DatetimeTZDtype('datetime64[ns, utc]')
Out[3]: datetime64[ns, utc] that would throw an error, since One question though... I'm worried about changing the In [3]: pd.Series(pd.date_range('2000', periods=4)).dtype
Out[3]: dtype('<M8[ns]') I'm not sure what the ramifications of changing that to always be |
Yes. What I thought before about this is that we would need to have an if/else logic there, to still return the numpy dtype if there is no tz. I would put this logic on the Series/Index, and have the array always return the extension dtype. However, there might be places where we check the dtype of values which can be coming from Series/Index to be an ExtensionDtype? (eg to take another code path for extension arrays) |
Agreed that's the right place to do it. I'll see if any of your concerns come up (I suspect something will). |
I would expect many of the places where we use |
Yeah. We could change those to
`is_extension_array_dtype(self._values.dtype)` where necessary.
…On Thu, Nov 15, 2018 at 9:23 AM Joris Van den Bossche < ***@***.***> wrote:
I'll see if any of your concerns come up (I suspect something will).
I would expect many of the places where we use is_extension_array_dtype
and pass it the actual dtype and not the array or container ..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23185 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIo6kZlwqutnhAIBlFItwRfHAg8bsks5uvYb9gaJpZM4Xen4p>
.
|
Small status update here: I played with moving to composition a bit last
week. The basic idea was
* DatetimeDtype(unit, tz) : an extension dtype for representing datetime
data with an optional timezone
* DatetimeArray : basically the same as DatetimelikeArrayMixin.
DatetimeArray.dtype is a DatetimeDtype
* DatetimeIndex : Index where _data is a DatetimeArray.
The biggest challenge was our `is_datetime*` functions. They were breaking
in a lot of places and in strange ways
when passed a DatetimeDtype rather than an np.dtype.
Today, I've experimented with a new branch that changes the data model of
DatetimeArray slightly. `DatetimeArray.dtype`
is now a Union of np.dtype or DatetimeDtype. We'll use DatetimeDtype (or
keep the name as DatetimeTZDtype) when there's
a timezone, and we'll use np.dtype('M8[ns]') otherwise. This should result
in a much smaller diff. I suspect that we can later
clean up the dtypes so that DatetimeArray.dtype is always a DatetimeDtype,
but I think that need not block the release.
I'll push something up by the end of the day.
On Thu, Nov 15, 2018 at 9:25 AM Tom Augspurger <tom.augspurger88@gmail.com>
wrote:
… Yeah. We could change those to
`is_extension_array_dtype(self._values.dtype)` where necessary.
On Thu, Nov 15, 2018 at 9:23 AM Joris Van den Bossche <
***@***.***> wrote:
> I'll see if any of your concerns come up (I suspect something will).
>
> I would expect many of the places where we use is_extension_array_dtype
> and pass it the actual dtype and not the array or container ..
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#23185 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQHIo6kZlwqutnhAIBlFItwRfHAg8bsks5uvYb9gaJpZM4Xen4p>
> .
>
|
@TomAugspurger thanks for the update, and for handling the tricky dtype stuff. Aside from review, is there anything the rest of us can do to be helpful? On my end, I'm preparing to push a branch that fixes the last of the arithmetic tests (mainly with DateOffset) for DTA/TDA (without this, these arithmetic ops would fail on the Index classes after the switch to composition). #23675 needs some edits+rebase, is otherwise close to the finish line, will put DatetimeArray._from_sequence within reach. Added a bunch of Issues to the "DatetimeArray Refactor" Project, most of them non-blockers, e.g. reduction methods we can get around to eventually. |
Still just grinding away at the inheritance -> composition move. Mostly
just moving around methods / adding wrappers in small places.
I haven't really touched internals yet. I'm not sure when the best time to
do that would be. For DatetimeArrray, we can actually push that discussion
off till after we switch things, since we already have two blocks. I'll
post again when I have a better-formed opinion here.
…On Mon, Nov 19, 2018 at 9:26 AM jbrockmendel ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> thanks for the update,
and for handling the tricky dtype stuff. Aside from review, is there
anything the rest of us can do to be helpful?
On my end, I'm preparing to push a branch that fixes the last of the
arithmetic tests (mainly with DateOffset) for DTA/TDA (without this, these
arithmetic ops would fail on the Index classes after the switch to
composition). #23675 <#23675>
needs some edits+rebase, is otherwise close to the finish line, will put
DatetimeArray._from_sequence within reach.
Added a bunch of Issues to the "DatetimeArray Refactor" Project, most of
them non-blockers, e.g. reduction methods we can get around to eventually.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23185 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInHRH7TnhC4sXtDzUFWQggFmpjM6ks5uws26gaJpZM4Xen4p>
.
|
I've isolated (one of?) the segfaults to DatetimeArray.__new__ calling
conversion.ensure_datetime64ns.
This segfaults on my branch
```diff
diff --git a/pandas/tests/groupby/test_apply.py
b/pandas/tests/groupby/test_apply.py
index 3bc5e51ca..e64bdc9ea 100644
--- a/pandas/tests/groupby/test_apply.py
+++ b/pandas/tests/groupby/test_apply.py
@@ -6,6 +6,13 @@ from pandas.util import testing as tm
from pandas import DataFrame, MultiIndex, compat, Series, bdate_range,
Index
+def test_apply_tz():
+ df = pd.DataFrame({'a': [1, 3, 3, 4]},
+ index=pd.DatetimeIndex(['2000', '2000', '2001',
'2001']))
+ gr = df.groupby(df.index.date)
+ gr.apply(lambda x: x.idxmax())
+
+
```
But passes when we don't call ensure_datetime64ns
```diff
diff --git a/pandas/core/arrays/datetimes.py
b/pandas/core/arrays/datetimes.py
index 65f6d6859..612e48792 100644
--- a/pandas/core/arrays/datetimes.py
+++ b/pandas/core/arrays/datetimes.py
@@ -258,7 +258,7 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin):
assert isinstance(values, np.ndarray), type(values)
assert is_datetime64_dtype(values) # not yet assured nanosecond
- values = conversion.ensure_datetime64ns(values, copy=False)
+ # values = conversion.ensure_datetime64ns(values, copy=False)
result = cls._simple_new(values, freq=freq, tz=tz)
if freq_infer:
```
I haven't figured out the actual cause yet.
On Wed, Nov 21, 2018 at 7:17 AM Tom Augspurger <tom.augspurger88@gmail.com>
wrote:
… Still just grinding away at the inheritance -> composition move. Mostly
just moving around methods / adding wrappers in small places.
I haven't really touched internals yet. I'm not sure when the best time to
do that would be. For DatetimeArrray, we can actually push that discussion
off till after we switch things, since we already have two blocks. I'll
post again when I have a better-formed opinion here.
On Mon, Nov 19, 2018 at 9:26 AM jbrockmendel ***@***.***>
wrote:
> @TomAugspurger <https://github.com/TomAugspurger> thanks for the update,
> and for handling the tricky dtype stuff. Aside from review, is there
> anything the rest of us can do to be helpful?
>
> On my end, I'm preparing to push a branch that fixes the last of the
> arithmetic tests (mainly with DateOffset) for DTA/TDA (without this, these
> arithmetic ops would fail on the Index classes after the switch to
> composition). #23675 <#23675>
> needs some edits+rebase, is otherwise close to the finish line, will put
> DatetimeArray._from_sequence within reach.
>
> Added a bunch of Issues to the "DatetimeArray Refactor" Project, most of
> them non-blockers, e.g. reduction methods we can get around to eventually.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#23185 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQHInHRH7TnhC4sXtDzUFWQggFmpjM6ks5uws26gaJpZM4Xen4p>
> .
>
|
Alrighty, we're close now
https://github.com/TomAugspurger/pandas/tree/disown-tz-only
Right now this diff is at
```
89 files changed, 1859 insertions(+), 906 deletions(-)
```
and I have 100 xfails / skips. I'm going to spend the rest of today
splitting of independent pieces,
cleaning things up, and organizing the history a bit, before making a PR
tonight or tomorrow.
On Tue, Nov 27, 2018 at 9:46 AM Tom Augspurger <tom.augspurger88@gmail.com>
wrote:
… I've isolated (one of?) the segfaults to DatetimeArray.__new__ calling
conversion.ensure_datetime64ns.
This segfaults on my branch
```diff
diff --git a/pandas/tests/groupby/test_apply.py
b/pandas/tests/groupby/test_apply.py
index 3bc5e51ca..e64bdc9ea 100644
--- a/pandas/tests/groupby/test_apply.py
+++ b/pandas/tests/groupby/test_apply.py
@@ -6,6 +6,13 @@ from pandas.util import testing as tm
from pandas import DataFrame, MultiIndex, compat, Series, bdate_range,
Index
+def test_apply_tz():
+ df = pd.DataFrame({'a': [1, 3, 3, 4]},
+ index=pd.DatetimeIndex(['2000', '2000', '2001',
'2001']))
+ gr = df.groupby(df.index.date)
+ gr.apply(lambda x: x.idxmax())
+
+
```
But passes when we don't call ensure_datetime64ns
```diff
diff --git a/pandas/core/arrays/datetimes.py
b/pandas/core/arrays/datetimes.py
index 65f6d6859..612e48792 100644
--- a/pandas/core/arrays/datetimes.py
+++ b/pandas/core/arrays/datetimes.py
@@ -258,7 +258,7 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin):
assert isinstance(values, np.ndarray), type(values)
assert is_datetime64_dtype(values) # not yet assured nanosecond
- values = conversion.ensure_datetime64ns(values, copy=False)
+ # values = conversion.ensure_datetime64ns(values, copy=False)
result = cls._simple_new(values, freq=freq, tz=tz)
if freq_infer:
```
I haven't figured out the actual cause yet.
On Wed, Nov 21, 2018 at 7:17 AM Tom Augspurger ***@***.***>
wrote:
> Still just grinding away at the inheritance -> composition move. Mostly
> just moving around methods / adding wrappers in small places.
>
> I haven't really touched internals yet. I'm not sure when the best time
> to do that would be. For DatetimeArrray, we can actually push that
> discussion
> off till after we switch things, since we already have two blocks. I'll
> post again when I have a better-formed opinion here.
>
> On Mon, Nov 19, 2018 at 9:26 AM jbrockmendel ***@***.***>
> wrote:
>
>> @TomAugspurger <https://github.com/TomAugspurger> thanks for the
>> update, and for handling the tricky dtype stuff. Aside from review, is
>> there anything the rest of us can do to be helpful?
>>
>> On my end, I'm preparing to push a branch that fixes the last of the
>> arithmetic tests (mainly with DateOffset) for DTA/TDA (without this, these
>> arithmetic ops would fail on the Index classes after the switch to
>> composition). #23675 <#23675>
>> needs some edits+rebase, is otherwise close to the finish line, will put
>> DatetimeArray._from_sequence within reach.
>>
>> Added a bunch of Issues to the "DatetimeArray Refactor" Project, most of
>> them non-blockers, e.g. reduction methods we can get around to eventually.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#23185 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/ABQHInHRH7TnhC4sXtDzUFWQggFmpjM6ks5uws26gaJpZM4Xen4p>
>> .
>>
>
|
Excellent, looking forward to taking a look. Are the skips segfault-free? With a little luck many of the xfails will be fixed by implementing the remaining methods on DTA/TDA, most of which are (hopefully) near merging. |
They are indeed segfault free. There's still a subtle failure involving a
groupby resample coming from us doing bad stuff in Cython.
We somehow manage to create a DatetimeIndex where DatetimeIndex._values is
an ndarray, rather than a DatetimeArray. This
causes an exception, but not a segfault.
…On Wed, Nov 28, 2018 at 1:59 PM jbrockmendel ***@***.***> wrote:
Excellent, looking forward to taking a look. Are the skips segfault-free?
With a little luck many of the xfails will be fixed by implementing the
remaining methods on DTA/TDA, most of which are (hopefully) near merging.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23185 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIknZGdH6zhZhTV2y0Dz4wsOwxKQbks5uzusdgaJpZM4Xen4p>
.
|
A thought on a way forward, seeing as how Tom has earned some down-time. With
Then do the entire inheritance/composition switchover, but dispatching to This limits the diff to the index classes without changing their outward-facing behavior, making for a much more manageable scope. Thoughts? |
My vote is for getting #24024 in sooner rather than later, but I'm the most familiar with the diff so it's easier for me to go through the entire thing at once. It's blocking several changes I'd like to wrap up, and my time for pandas is limited. |
AFAICT the sticking points are:
I have no strong opinion on which approach to take for the |
A master issue, to help keep track of things.
High-level, I think we have two things to sort out
1. Design
We have a few things to sort out
a. Composition vs. inheritance of Index / Series and array classes
b. ...
2. Implementation Plan
A few options
a. Big PR implementing one or more arrays, followed by smaller cleanup PRs
b. Incremental PRs (implementing most of the interface on the *ArrayMixin classes), followed by a final PR making the switch
c. ...
Project board with the relevant discussions / PRs: https://github.com/pandas-dev/pandas/projects/4
The text was updated successfully, but these errors were encountered: