Skip to content

POC: _eadata #24394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 27, 2018
Merged

POC: _eadata #24394

merged 7 commits into from
Dec 27, 2018

Conversation

jbrockmendel
Copy link
Member

With _index_data figured out, it is now feasible to implement the DTA/TDA inheritance->composition changeover independently of all of the rest of #24024. This PR is a proof of concept for what that would look like. The steps would be roughly:

  1. Instead of dispatching to self._data, we temporarily implement _eadata and dispatch to that
  2. Stop mixing DatetimeLikeArrayMixin into DatetimeIndexOpsMixin
  3. Stop mixing TimedeltaArrayMixin into TimedeltaIndex
  4. Stop mixing DatetimeArrayMixin into DatetimeIndex

For demonstration purposes this PR implements step 1. I have local (test-passing) branches for each of the other steps.

@pep8speaks
Copy link

pep8speaks commented Dec 22, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 27, 2018 at 22:48 Hours UTC

@codecov
Copy link

codecov bot commented Dec 22, 2018

Codecov Report

Merging #24394 into master will decrease coverage by <.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24394      +/-   ##
==========================================
- Coverage    92.3%   92.29%   -0.01%     
==========================================
  Files         162      162              
  Lines       51875    51856      -19     
==========================================
- Hits        47883    47861      -22     
- Misses       3992     3995       +3
Flag Coverage Δ
#multiple 90.7% <94.73%> (-0.01%) ⬇️
#single 43% <60.52%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/period.py 92.79% <100%> (-0.33%) ⬇️
pandas/core/indexes/datetimes.py 96.32% <100%> (-0.02%) ⬇️
pandas/core/indexes/datetimelike.py 97.4% <100%> (+0.11%) ⬆️
pandas/core/indexes/timedeltas.py 90.14% <75%> (-0.31%) ⬇️
pandas/tseries/offsets.py 96.69% <0%> (-0.26%) ⬇️
pandas/core/arrays/datetimelike.py 96.29% <0%> (-0.22%) ⬇️
pandas/core/arrays/timedeltas.py 87.38% <0%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e0358d...de5430d. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 22, 2018

Codecov Report

Merging #24394 into master will decrease coverage by 0.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24394      +/-   ##
==========================================
- Coverage   92.31%    92.3%   -0.02%     
==========================================
  Files         163      163              
  Lines       51987    51969      -18     
==========================================
- Hits        47990    47968      -22     
- Misses       3997     4001       +4
Flag Coverage Δ
#multiple 90.7% <94.73%> (-0.02%) ⬇️
#single 43.01% <60.52%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/period.py 92.75% <100%> (-0.33%) ⬇️
pandas/core/indexes/datetimes.py 96.31% <100%> (-0.02%) ⬇️
pandas/core/indexes/datetimelike.py 97.7% <100%> (+0.11%) ⬆️
pandas/core/indexes/timedeltas.py 90.11% <75%> (-0.31%) ⬇️
pandas/tseries/offsets.py 96.69% <0%> (-0.26%) ⬇️
pandas/core/arrays/datetimelike.py 95.93% <0%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3086e0a...00149f8. Read the comment docs.

@gfyoung gfyoung added Refactor Internal refactoring of code Internals Related to non-user accessible pandas implementation labels Dec 23, 2018
@@ -1089,6 +1080,11 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
# --------------------------------------------------------------------
# Wrapping DatetimeArray

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this always be cached? maybe call this _impl instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but for the proof of concept I didn't want to futz with it. If this is a direction we want to move forward with I'll do this.

@jbrockmendel
Copy link
Member Author

So just naively changing the two properties to cache_readonly doesn't break any tests locally, but I'm still not comfortable with it, would rather hold off

@@ -629,6 +651,28 @@ def f(self):
return property(f)


def maybe_unwrap_index(obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a general function no? if you simply check for a __eadata attribute. let's put this elsewhere, maybe pandas.core.base

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for _eadata to only exist for a few days until we're ready to complete the switchover. Regardless, maybe_unwrap_index should only be relevant for DTI/TDI/PI.

@jbrockmendel
Copy link
Member Author

If this and/or #24405 is a route we want to take, I can get a start on implementing another handful of methods from 24024

@jreback
Copy link
Contributor

jreback commented Dec 25, 2018

if this is temporary, then ok, though why can't you simply use _data?

@jbrockmendel
Copy link
Member Author

if this is temporary, then ok, though why can't you simply use _data?

Because _data is still an ndarray like in the status quo. The point of using _eadata is to let us separate the (DTA/TDA/DTI/TDI)-internal changes of #24024 from the external changes.

@jbrockmendel
Copy link
Member Author

I think we should merge this, then I can put together a PR breaking repeat, searchsorted, fillna, setitem off from #24024 while the astype issues get resolved in #24405.

@jreback jreback added this to the 0.24.0 milestone Dec 27, 2018
@jreback
Copy link
Contributor

jreback commented Dec 27, 2018

looks good. rebase just to be sure.

@jreback jreback merged commit 24a50fc into pandas-dev:master Dec 27, 2018
@jreback
Copy link
Contributor

jreback commented Dec 27, 2018

thanks!

@jbrockmendel jbrockmendel deleted the eadata2 branch December 28, 2018 00:11
thoo added a commit to thoo/pandas that referenced this pull request Dec 28, 2018
* upstream/master: (26 commits)
  DOC: Fixing doc upload (no such remote origin) (pandas-dev#24459)
  BLD: for C extension builds on mac, target macOS 10.9 where possible (pandas-dev#24274)
  POC: _eadata (pandas-dev#24394)
  DOC: Correct location (pandas-dev#24456)
  CI: Moving CircleCI build to Travis (pandas-dev#24449)
  BUG: Infer compression by default in read_fwf() (pandas-dev#22200)
  DOC: Fix minor typo in whatsnew (pandas-dev#24453)
  DOC: Add dateutil to intersphinx pandas-dev#24437 (pandas-dev#24443)
  DOC: Adding links to offset classes in timeseries.rst (pandas-dev#24448)
  DOC: Adding offsets to API ref (pandas-dev#24446)
  DOC: fix flake8 issue in groupby.rst (pandas-dev#24363)
  DOC: Fixing more doc warnings (pandas-dev#24438)
  API: Simplify repeat signature (pandas-dev#24447)
  BUG: to_datetime(Timestamp, utc=True) localizes to UTC (pandas-dev#24441)
  CLN: Cython Py2/3 Compatible Imports (pandas-dev#23940)
  DOC: Fixing more doc warnings (pandas-dev#24431)
  DOC: Removing old release.rst (pandas-dev#24427)
  BUG-24408 Series.dt does not maintain own copy of index (pandas-dev#24426)
  DOC: Fixing several doc warnings (pandas-dev#24430)
  ENH: fill_value argument for shift pandas-dev#15486 (pandas-dev#24128)
  ...
@@ -245,6 +245,11 @@ def _format_native_types(self, na_rep=u'NaT', date_format=None, **kwargs):
__rmod__ = _make_wrapped_arith_op("__rmod__")
__divmod__ = _make_wrapped_arith_op("__divmod__")
__rdivmod__ = _make_wrapped_arith_op("__rdivmod__")
__truediv__ = _make_wrapped_arith_op("__truediv__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_wrapped_arith_op is going away entirely. Why are we making these changes?

@@ -1973,15 +1974,15 @@ def test_dti_sub_tdi(self, tz_naive_fixture):
result = dti - tdi
tm.assert_index_equal(result, expected)

msg = 'cannot subtract .*TimedeltaIndex'
msg = 'cannot subtract .*TimedeltaArrayMixin'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixin is being removed from the class name. These error message will need to be updated again. I'd prefer making the match TimedeltaArray(Mixin)? so that we don't need to do that all at once.l

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants