-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ValueError in Categorical Constructor with empty data and boolean categories #22702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I haven't bisected on this to check exactly when it first started manifesting, but this issue is not present in version |
Turns out the first bad commit is 7818486859d1aba53ce359b93cfc772e688958e5. |
Ok, so the problem is that in this function, you have: if not is_dtype_equal(values.dtype, categories.dtype):
values = ensure_object(values)
categories = ensure_object(categories)
(hash_klass, vec_klass), vals = _get_data_algo(values, _hashtables)
(_, _), cats = _get_data_algo(categories, _hashtables) For boolean categories, the category's dtype is already You would think that this would be a problem for |
Sorry about that :/ Thanks for debugging.
…On Fri, Sep 14, 2018 at 8:48 AM Paul Ganssle ***@***.***> wrote:
Ok, so the problem is that in this function
<https://github.com/pandas-dev/pandas/blob/master/pandas/core/arrays/categorical.py#L2444>,
you have:
if not is_dtype_equal(values.dtype, categories.dtype):
values = ensure_object(values)
categories = ensure_object(categories)
(hash_klass, vec_klass), vals = _get_data_algo(values, _hashtables)
(_, _), cats = _get_data_algo(categories, _hashtables)
For boolean categories, the category's dtype is already object I guess,
so the if not is_dtype_equal(values.dtype, categories.dtype) check is True
and the no coercion to object takes place, but inside _get_data_algo,
there is a call to _ensure_data
<https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L48>,
which detects that categories is all booleans and converts it to uint64,
which invalidates the assumption that values.dtype and categories.dtype
are the same.
You would think that this would be a problem for pd.CategoricalIndex([],
categories=[1, 2]) as well, since integers should *also* be coerced, but
evidently _ensure_data treats Index and ndarray differently. PR incoming.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22702 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIgHT5gpKqc1XbybhBA5x99Y5xZscks5ua7OugaJpZM4WoMaH>
.
|
* TST: Add failing test for empty bool Categoricals * BUG: Failure in empty boolean CategoricalIndex Fixes GH #22702.
…#22710) * TST: Add failing test for empty bool Categoricals * BUG: Failure in empty boolean CategoricalIndex Fixes GH pandas-dev#22702.
This works,
This doesn't
the
values
there isarray([], dtype=object)
. It should be int dtype by this point.The text was updated successfully, but these errors were encountered: