Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Possible Regression in DatetimeIndex Slicing #35509

Closed
2 of 3 tasks
stefmolin opened this issue Aug 1, 2020 · 6 comments · Fixed by #37023
Closed
2 of 3 tasks

BUG: Possible Regression in DatetimeIndex Slicing #35509

stefmolin opened this issue Aug 1, 2020 · 6 comments · Fixed by #37023
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@stefmolin
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> df = pd.DataFrame(
...     {'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]}, 
...     index=pd.to_datetime(['2020-08-01', '2020-07-02', '2020-08-05'])
... )
>>> df['2020-08']
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-82-c8d0440364af> in <module>
----> 1 df['2020-08']

~/book_env/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2880             # either we have a slice or we have a string that can be converted
   2881             #  to a slice for partial-string date indexing
-> 2882             return self._slice(indexer, axis=0)
   2883 
   2884         # Do we have a (boolean) DataFrame?

~/book_env/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
   3546         Slicing with this method is *always* positional.
   3547         """
-> 3548         assert isinstance(slobj, slice), type(slobj)
   3549         axis = self._get_block_manager_axis(axis)
   3550         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>

You now have to run sort_index() before slicing:

>>> df.sort_index()['2020-08']
           col1  col2
2020-08-01    a     1
2020-08-05    c     3

Problem description

Before 1.1.0, you didn't have to sort the index before slicing. Now, you have to run sort_index() and the error message is not helpful at all.

Expected Output

>>> df['2020-08']
           col1  col2
2020-08-01    a     1
2020-08-05    c     3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.40-04224-g891a6cce2d44
Version : #1 SMP PREEMPT Tue Jun 23 20:21:29 PDT 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 18.1
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: 0.9.0
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.18
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@stefmolin stefmolin added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 1, 2020
@arw2019
Copy link
Member

arw2019 commented Aug 1, 2020

xref #26206

Also - confirming this is still an issue on master.

Ouput of pd.show_versions()

INSTALLED VERSIONS

commit : 9cbc3e8
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+17.g9cbc3e811
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@simonjayhawkins
Copy link
Member

The assertion was added in #31938 cc @jbrockmendel

can confirm expected output prior to that change.

>>> pd.__version__
'1.1.0.dev0+460.g56cc7f4f3'
>>> df = pd.DataFrame(
...     {"col1": ["a", "b", "c"], "col2": [1, 2, 3]},
...     index=pd.to_datetime(["2020-08-01", "2020-07-02", "2020-08-05"]),
... )
>>> df["2020-08"]
           col1  col2
2020-08-01    a     1
2020-08-05    c     3
>>>

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Aug 2, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 20, 2020
@dsaxton
Copy link
Member

dsaxton commented Aug 28, 2020

As far as I can tell this assertion can be removed (the method works fine with lists and numpy arrays it seems). In which case are the annotations for this and related arguments not correct?

@jbrockmendel
Copy link
Member

Another fix is to add in DataFrame.__getitem__

        indexer = convert_to_index_sliceable(self, key)
        if indexer is not None:
            # either we have a slice or we have a string that can be converted
            #  to a slice for partial-string date indexing
+            if isinstance(indexer, np.ndarray):
+                indexer = lib.maybe_indices_to_slice(indexer, len(self))
            return self._slice(indexer, axis=0)

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

smithara added a commit to Swarm-DISC/Swarm_notebooks that referenced this issue Sep 15, 2020
smithara added a commit to Swarm-DISC/Swarm_notebooks that referenced this issue Sep 18, 2020
* Disable cell that raises exception

* Fix Treebeard environment and config

* Fix treebeard run (#3)

* fixes

* Update treebeard.yaml

* Update treebeard.yaml

Co-authored-by: alex-treebeard <alex@treebeard.io>

* Migrate to treebeard action

* Disable dockerhub usage for now

* Add docker hub connection

* Add VirES DISC token

* Add initial AEBS LPL/PBL notebook

* Fix typo

* Fix typo #2

* Show EJ extents with shaded regions

* Use LPL:Quality & PBL Flags in figure

* Move notebooks to subdirectory

* Add initial AUX_OBS demo

* Clear notebook outputs

* Set Treebeard on push & PR's to master & staging

* Clear up README

* Update nb naming

* Tweak nb names

* Switch to eoxmagmod master

* Adjust spacepy install

* Use pandas<1.1 due to pandas bug

pandas-dev/pandas#35509

* Switch to staging.vires.services

* Add AEBS:LPS nb

* Add missing token config

* Ignore 04c1 in Treebeard run

Co-authored-by: alex-treebeard <alex@treebeard.io>
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants