-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ValueError: Cannot convert non-finite values (NA or inf) to integer only when DF exceed certain size #35227
Comments
@ben-arnao could you provide a reproducible example? (see eg https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) |
Please see below
|
I think i've been able to narrow it down to the sum() function failing with larger dataframes. In the The execution enters the block of code where If we do a few print statements after this bit of code
We can see here that we correctly mark cells as True or False, for null/not null.
So the frame that However we can see that when we do |
The issue is this
Obviously this is a numpy issue, as there is some sort of overload going on here. But we can pretty easily deal with the problem for now in pandas by just checking that the mask size is positive since the mask size should never be negative. In line 1313 of nanops
We see that this value is incorrectly set negative. Which causes the next line
to return true and then set our result to nan
If we just add a condition to the above line requires the mask also be positive, this should resolve the issue |
or specifying dtype would give the correct answer
see also #34827 |
Is there a plan to fix this, or a workaround in the meantime? |
community PRs are always welcome @fonnesbeck as pandas is all volunteer and there issue prioritization is not possible in a direct way |
Actually, this may not be an issue with newer versions of NumPy. |
take |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Here is the code in question:
The line
dropna
line is what throws the errorSo when i run my code with adding 500 features/columns the output is following
However when i run the same exact code for 1000 features
I get an error
The version i am running is 1.0.5
The text was updated successfully, but these errors were encountered: