Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe.isin() doesnot work after upgrading to pandas 0.20.3 #17794

Closed
krishnan2107 opened this issue Oct 5, 2017 · 6 comments
Closed

Dataframe.isin() doesnot work after upgrading to pandas 0.20.3 #17794

krishnan2107 opened this issue Oct 5, 2017 · 6 comments

Comments

@krishnan2107
Copy link

Code Sample, a copy-pastable example if possible

 v=veh_frame.Accident_Index[(veh_frame.Accident_Index.isin(v2.tolist()))&(veh_frame.Vehicle_Type==11)]

Problem description

V2 is a series which is passed into the pandas dataframe. I have recently upgraded from pandas 0.19.1 to the latest stable release 0.20.3

In the previous version the series was passed as list and pandas was happy processing it, but now this throws an error with the Trace back as below

Output

Traceback (most recent call last):

File "", line 1, in
v=veh_frame.Accident_Index[(veh_frame.Accident_Index.isin(v2.tolist()))&(veh_frame.Vehicle_Type==11)]

File "C:\Users\TRLuser\Anaconda3\lib\site-packages\pandas\core\series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)

File "C:\Users\TRLuser\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 426, in isin
return f(comps, values)

File "C:\Users\TRLuser\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 406, in
f = lambda x, y: np.in1d(x, y)

File "C:\Users\TRLuser\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 401, in in1d
ar2 = np.unique(ar2)

File "C:\Users\TRLuser\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 214, in unique
ar.sort()

TypeError: '>' not supported between instances of 'int' and 'str'

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Could you edit your issue to include a reproducible example?

@krishnan2107
Copy link
Author

unfortunately not, Its part of a bigger database sample.
The weird observation is that the same accident index in other frames (shown below) works fine, only in this frame (veh_frame) it fails.
`
v1=acc_frame.Accident_Index[(acc_frame.Latitude>=51.2537)&(acc_frame.Latitude<=51.7181)]

v2=acc_frame.Accident_Index[(acc_frame.Accident_Index.isin(v1.tolist()))&((acc_frame.Longitude>=-0.5501)&(acc_frame.Longitude<=0.2644))]
`
Also when I tried to trim the list (in order to paste here) to about 10-20 long it works. But when I run it with a list of about 10000 strings it fails. This was not a problem on the previous version.

@jreback
Copy link
Contributor

jreback commented Oct 5, 2017

your issue is likely #16012, which is fixed for 0.21.0 (soon), but w/o a reproducible it is impossible to tell.

@krishnan2107
Copy link
Author

some more information:
I tested how long a list it can take before throwing an error and the number seems to be 26626. Could this be a datatype issue with the value that stores the series length ??

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 5, 2017 via email

@krishnan2107
Copy link
Author

Cracked it .. the datatype in the argument for isin() had both string and int types in it. This was working fine before but like stated under issue in #16012 , this does affect.
converted the int in v2 to string and then it worked fine.

Sorry for the trouble, thanks all for responding so quickly

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants