Skip to content

BUG:Pandas 1.0.3 → 1.1.1 behavior change on DataFrame.apply() whith raw option and func returning string #35940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
m-hunsicker opened this issue Aug 28, 2020 · 3 comments · Fixed by #36610
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@m-hunsicker
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


df_1 = pd.DataFrame({'A': ["aa","bbb"]})
df_2 = pd.DataFrame({'A': ["bbb","aa"]})

def get_value(array):
        return array[0]

r_1 = df_1.apply(get_value, axis=1, raw=True)
r_2 = df_2.apply(get_value, axis=1, raw=True)

print(r_1)
print(r_2)

Output

0 aa
1 bb
dtype: object
0 bbb
1 aa
dtype: object

Problem description

The results are truncated when the smallest strings is first. However, when the result (eg. array[0]) is printed before the return of the func, it's displays the correct value.
(This issue occurred when using apply with the raw option for a function using several columns)

Expected Output

0 aa
1 bbb
dtype: object
0 bbb
1 aa
dtype: object

Output of pd.show_versions()

Pandas 1.1.1
Numpy 1.19.1

@m-hunsicker m-hunsicker added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 28, 2020
@m-hunsicker m-hunsicker changed the title BUG:Pandas 1.0.3 → 1.1.1 behavior change on DataFrame.apply() whith raw option and func returns string BUG:Pandas 1.0.3 → 1.1.1 behavior change on DataFrame.apply() whith raw option and func returning string Aug 28, 2020
@asishm
Copy link
Contributor

asishm commented Aug 28, 2020

#34913

cc @jbrockmendel

7d0ee96 is the first bad commit
commit 7d0ee96
Author: jbrockmendel jbrockmendel@gmail.com
Date: Sat Jun 20 16:16:12 2020 -0700

REF: dont use compute_reduction (#34913)

@jbrockmendel jbrockmendel added Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
@incredibleray
Copy link

fascinating, I can reproduce this on master. (I did python setup.py install on the latest master, not sure if this is the right way)

Like @m-hunsicker said, make the first string longer, or setting raw=False will fix it

@m-hunsicker If you are willing, can you share what are you trying to do when you discovered this bug? Thanks.

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Sep 9, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.3 milestone Sep 9, 2020
@simonjayhawkins
Copy link
Member

it appears that the underlying issue is numpy/numpy#8352.

> c:\users\simon\pandas\pandas\core\apply.py(220)apply_raw()
-> result = np.apply_along_axis(self.f, self.axis, self.values)
(Pdb) self.f
<function get_value at 0x000001F31A9E7280>
(Pdb) self.axis
1
(Pdb) self.values
array([['aa'],
       ['bbb']], dtype=object)
(Pdb) np.apply_along_axis(self.f, self.axis, self.values)
array(['aa', 'bb'], dtype='<U2')

values is indeed passed as object dtype to np.apply_along_axis.

As a workaround, the function can force an object dtype result with

>>> pd.__version__
'1.2.0.dev0+487.g27aae225e8'
>>>
>>> def get_value(array):
...     return np.array(array[0], dtype=object)
...
>>>
>>> df_1 = pd.DataFrame({"A": ["aa", "bbb"]})
>>> df_1.apply(get_value, axis=1, raw=True)
0     aa
1    bbb
dtype: object
>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants