-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Changing Series and DataFrame repr
for NaN values
#15375
Comments
I would be in favor of that (NA instead of NaN). |
It's somewhat out of scope for this issue, but I've been thinking about the NaN problem in pandas 2.0, and I think for backwards compatibility reasons we're going to get forced to make I am pretty confident that using bitmaps everywhere will make our code much simpler and faster (i.e. can use SIMD operations on the bitmaps to deal with null analytics and propagation). |
As one example of why things will be faster, we can use bitmaps to eliminate branching in aggregations: sum_x += values[i] * GetBit(bitmap, i); compared with if (values[i] == values[i]) sum_x += values[i]; |
Yes, we could have a 'nan_as_missing' option that is first True and could possibly later change (similar to how we now have |
The issue to discuss this is probably: wesm/pandas2#46 |
FWIW, I think this will be effectively closed by #29964. That changes IntegerArray to use In [1]: import pandas as pd
In [2]: pd.Series(pd.array([1, 2, None]))
Out[2]:
0 1
1 2
2 NA
dtype: Int64 I don't think we'll want to move away from |
With future pandas internal improvements in contemplation, I have often wondered if it would be worth changing the
NaN
outputs to beNA
orNULL
instead to reflect the actual semantics of the data. This could be something that's configurable inpandas.options
(i.e. showing the semantic value or the physical value).The text was updated successfully, but these errors were encountered: