Skip to content

Regression in hash_pandas_object with hash_key=None and object dtype #30887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jan 10, 2020 · 0 comments · Fixed by #30900
Closed

Regression in hash_pandas_object with hash_key=None and object dtype #30887

TomAugspurger opened this issue Jan 10, 2020 · 0 comments · Fixed by #30900
Labels
hashing hash_pandas_object Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger
Copy link
Contributor

On 1.0.0rc0, this raises

In [7]: pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-00d1f153287e> in <module>
----> 1 pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)

~/sandbox/pandas/pandas/core/util/hashing.py in hash_pandas_object(obj, index, encoding, hash_key, categorize)
     93
     94     elif isinstance(obj, ABCSeries):
---> 95         h = hash_array(obj.values, encoding, hash_key, categorize).astype(
     96             "uint64", copy=False
     97         )

~/sandbox/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    302             codes, categories = factorize(vals, sort=False)
    303             cat = Categorical(codes, Index(categories), ordered=False, fastpath=True)
--> 304             return _hash_categorical(cat, encoding, hash_key)
    305
    306         try:

~/sandbox/pandas/pandas/core/util/hashing.py in _hash_categorical(c, encoding, hash_key)
    221     # Convert ExtensionArrays to ndarrays
    222     values = np.asarray(c.categories.values)
--> 223     hashed = hash_array(values, encoding, hash_key, categorize=False)
    224
    225     # we have uint64, as we don't directly support missing values

~/sandbox/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    305
    306         try:
--> 307             vals = hashing.hash_object_array(vals, hash_key, encoding)
    308         except TypeError:
    309             # we have mixed types

~/sandbox/pandas/pandas/_libs/hashing.pyx in pandas._libs.hashing.hash_object_array()

AttributeError: 'NoneType' object has no attribute 'encode'

On 0.25.3

In [7]: pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)
Out[7]:
0     4578374827886788867
1    17338122309987883691
dtype: uint64

It's only for object dtype.

@TomAugspurger TomAugspurger added this to the 1.0.0 milestone Jan 10, 2020
@TomAugspurger TomAugspurger added hashing hash_pandas_object Regression Functionality that used to work in a prior pandas version labels Jan 10, 2020
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 10, 2020
TomAugspurger added a commit that referenced this issue Jan 13, 2020
* REGR: Fixed hash_key=None for object values

Closes #30887
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashing hash_pandas_object Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant