Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: unorderable types" in Python3 when index values are dict keys of tuples or tuples with non-homogeneous dtypes #22077

Closed
Tracked by #7
toobaz opened this issue Jul 27, 2018 · 6 comments · Fixed by #52758
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@toobaz
Copy link
Member

toobaz commented Jul 27, 2018

Code Sample, a copy-pastable example if possible

In [2]: from collections import OrderedDict
In [3]: param_index = OrderedDict([((('a', 'b'), ('c', 'd')), 1),
   ...:                            ((('a', None), ('c', 'd')), 2),
   ...:                           ])
   ...: 

In [4]: pd.Series([1, 2], index=param_index.keys())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/nobackup/repo/pandas/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
    634         try:
--> 635             order = uniques.argsort()
    636             order2 = order.argsort()

TypeError: unorderable types: NoneType() < str()

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/nobackup/repo/pandas/pandas/core/sorting.py in safe_sort(values, labels, na_sentinel, assume_unique)
    450         try:
--> 451             sorter = values.argsort()
    452             ordered = values.take(sorter)

TypeError: unorderable types: NoneType() < str()

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/nobackup/repo/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    397             try:
--> 398                 codes, categories = factorize(values, sort=True)
    399             except TypeError:

~/nobackup/repo/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper

~/nobackup/repo/pandas/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
    642                                         na_sentinel=na_sentinel,
--> 643                                         assume_unique=True)
    644 

~/nobackup/repo/pandas/pandas/core/sorting.py in safe_sort(values, labels, na_sentinel, assume_unique)
    454             # try this anyway
--> 455             ordered = sort_mixed(values)
    456 

~/nobackup/repo/pandas/pandas/core/sorting.py in sort_mixed(values)
    440                            dtype=bool)
--> 441         nums = np.sort(values[~str_pos])
    442         strs = np.sort(values[str_pos])

~/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py in sort(a, axis, kind, order)
    846         a = asanyarray(a).copy(order="K")
--> 847     a.sort(axis=axis, kind=kind, order=order)
    848     return a

TypeError: unorderable types: NoneType() < str()

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-2fff0a9c0f74> in <module>()
----> 1 pd.Series([1, 2], index=param_index.keys())

~/nobackup/repo/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    191 
    192             if index is not None:
--> 193                 index = ensure_index(index)
    194 
    195             if data is None:

~/nobackup/repo/pandas/pandas/core/indexes/base.py in ensure_index(index_like, copy)
   5006             index_like = copy(index_like)
   5007 
-> 5008     return Index(index_like)
   5009 
   5010 

~/nobackup/repo/pandas/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
    448                     from .multi import MultiIndex
    449                     return MultiIndex.from_tuples(
--> 450                         data, names=name or kwargs.get('names'))
    451             # other iterable of some kind
    452             subarr = com.asarray_tuplesafe(data, dtype=object)

~/nobackup/repo/pandas/pandas/core/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
   1333             arrays = lzip(*tuples)
   1334 
-> 1335         return MultiIndex.from_arrays(arrays, sortorder=sortorder, names=names)
   1336 
   1337     @classmethod

~/nobackup/repo/pandas/pandas/core/indexes/multi.py in from_arrays(cls, arrays, sortorder, names)
   1277         from pandas.core.arrays.categorical import _factorize_from_iterables
   1278 
-> 1279         labels, levels = _factorize_from_iterables(arrays)
   1280         if names is None:
   1281             names = [getattr(arr, "name", None) for arr in arrays]

~/nobackup/repo/pandas/pandas/core/arrays/categorical.py in _factorize_from_iterables(iterables)
   2549         # For consistency, it should return a list of 2 lists.
   2550         return [[], []]
-> 2551     return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))

~/nobackup/repo/pandas/pandas/core/arrays/categorical.py in <listcomp>(.0)
   2549         # For consistency, it should return a list of 2 lists.
   2550         return [[], []]
-> 2551     return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))

~/nobackup/repo/pandas/pandas/core/arrays/categorical.py in _factorize_from_iterable(values)
   2521         codes = values.codes
   2522     else:
-> 2523         cat = Categorical(values, ordered=True)
   2524         categories = cat.categories
   2525         codes = cat.codes

~/nobackup/repo/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    402                     # raise, as we don't have a sortable data structure and so
    403                     # the user should give us one by specifying categories
--> 404                     raise TypeError("'values' is not ordered, please "
    405                                     "explicitly specify the categories order "
    406                                     "by passing in a categories argument.")

TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.

Problem description

The above is a simplified version of the example in this comment - and both of them used to work (I tested in 0.19.0+git14-ga40e185, @jolespin tested in 0.22). Creating this separate issue because #15457 itself is not a regression.

Notice the error changes if you replace param_index.keys() with list(param_index.keys()) (but stays the same if you just replace the OrderedDict with an ordinary dict).

Expected Output

In 0.19.0+git14-ga40e185:

In [4]: pd.Series([1, 2], index=param_index.keys())
Out[4]: 
((a, b), (c, d))       1
((a, None), (c, d))    2
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.24.0.dev0+360.g24fd90f66
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.2.2.post1634.dev0+ge8120cf6d
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
gcsfs: None

@evfro
Copy link

evfro commented Aug 7, 2018

having the same problem

@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@mroeschke mroeschke added Bug and removed 2/3 Compat labels Apr 4, 2020
@jbrockmendel jbrockmendel added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Sep 22, 2020
@mroeschke
Copy link
Member

The latest result on master looks okay to me (coercing to a MutiIndex instead of having a flat Index). Could use a test

In [13]: In [2]: from collections import OrderedDict
    ...: In [3]: param_index = OrderedDict([((('a', 'b'), ('c', 'd')), 1),
    ...:    ...:                            ((('a', None), ('c', 'd')), 2),
    ...:    ...:                           ])
    ...:    ...:
    ...:
    ...: In [4]: pd.Series([1, 2], index=param_index.keys())
Out[13]:
(a, b)     (c, d)    1
(a, None)  (c, d)    2
dtype: int64

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Constructors Series/DataFrame/Index/pd.array Constructors MultiIndex Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Regression Functionality that used to work in a prior pandas version labels Jun 21, 2021
@devdattakhoche
Copy link

devdattakhoche commented Jan 7, 2022

@mroeschke, @toobaz can we close this issue, I think its its fixed and a informative error message seems to show when we do pass multi dimensional index in Series .

ValueError: Index data must be 1-dimensional

#15457

@jreback
Copy link
Contributor

jreback commented Jan 7, 2022

would take a PR with a test like the OP

@devdattakhoche
Copy link

devdattakhoche commented Jan 7, 2022

would take a PR with a test like the OP

@jreback Can you elaborate, I didn't got you ? What do we need test for here ? I didn't understand 'OP' here ?

@devdattakhoche
Copy link

I am willing to contribute here, Can I know what is required ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants