Skip to content

Sparse Dataframe with multiindex error when slicing #21231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Xparx opened this issue May 28, 2018 · 1 comment · Fixed by #28425
Closed

Sparse Dataframe with multiindex error when slicing #21231

Xparx opened this issue May 28, 2018 · 1 comment · Fixed by #28425
Labels
Sparse Sparse Data Type

Comments

@Xparx
Copy link

Xparx commented May 28, 2018

Code Sample,

import pandas as pd
import numpy as np
spdf = pd.DataFrame(np.random.rand(5, 5) > 0.7).astype(float).to_sparse(fill_value=0)

spdf.columns = pd.MultiIndex.from_tuples((("A", 1), ("A", 1), ("B", 1), ("B", 2), ("C", 2)))

spdf["A"] # Throws error
spdf.to_dense()["A"] # Works

Problem description

Could not find this specific issue among the sparse issues here.
It seems that the sparse dataframe can not handle mutliindex slicing in the way that dense (regular dataframes can).

The error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-bf1d8d920880> in <module>()
----> 1 spdf["A"]

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/sparse/frame.py in __getitem__(self, key)
    439             return self._getitem_array(key)
    440         else:
--> 441             return self._get_item_cache(key)
    442 
    443     def get_value(self, index, col, takeable=False):

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4130                 raise TypeError("cannot label index with a null key")
   4131 
-> 4132             indexer = self.items.get_indexer_for([item])
   4133             return self.reindex_indexer(new_axis=self.items[indexer],
   4134                                         indexer=indexer, axis=0,

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_indexer_for(self, target, **kwargs)
   3367         if self.is_unique:
   3368             return self.get_indexer(target, **kwargs)
-> 3369         indexer, _ = self.get_indexer_non_unique(target, **kwargs)
   3370         return indexer
   3371 

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/multi.py in get_indexer_non_unique(self, target)
   2046     @Appender(_index_shared_docs['get_indexer_non_unique'] % _index_doc_kwargs)
   2047     def get_indexer_non_unique(self, target):
-> 2048         return super(MultiIndex, self).get_indexer_non_unique(target)
   2049 
   2050     def reindex(self, target, method=None, level=None, limit=None,

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_indexer_non_unique(self, target)
   3357             tgt_values = target._ndarray_values
   3358 
-> 3359         indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
   3360         return _ensure_platform_int(indexer), missing
   3361 

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer_non_unique()

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes()

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/multi.py in _codes_to_ints(self, codes)
     72         # Shift the representation of each level by the pre-calculated number
     73         # of bits:
---> 74         codes <<= self.offsets
     75 
     76         # Now sum and OR are in fact interchangeable. This is a simple

ValueError: non-broadcastable output operand with shape (1,1) doesn't match the broadcast shape (1,2)

Expected Output

The sparse dataframe should have the same capabilities as the dense one.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.6.0-040600-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@Xparx Xparx changed the title Sparse Dataframe with multiindex fails when slicing Sparse Dataframe with multiindex error when slicing May 28, 2018
@jbrockmendel jbrockmendel added the Sparse Sparse Data Type label Aug 1, 2018
@TomAugspurger
Copy link
Contributor

I can't reproduce this anymore on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants