ENH: Changing Series and DataFrame `repr` for NaN values #15375

wesm · 2017-02-12T02:14:00Z

With future pandas internal improvements in contemplation, I have often wondered if it would be worth changing the NaN outputs to be NA or NULL instead to reflect the actual semantics of the data. This could be something that's configurable in pandas.options (i.e. showing the semantic value or the physical value).

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-02-12T21:19:48Z

I would be in favor of that (NA instead of NaN).
The only thing I am wondering is if we already know whether we want to make a distinction between NA and NaN for float data in pandas 2.0, where this would theoretically be possible when using bitmasks for NA. Because if we want that distinction, it might make sense to wait with the other repr until 2.0 instead 1.0.

wesm · 2017-02-12T21:32:08Z

It's somewhat out of scope for this issue, but I've been thinking about the NaN problem in pandas 2.0, and I think for backwards compatibility reasons we're going to get forced to make NaN and NA / NULL equivalent during the transition period. We could later add warnings when NaN is being treated as NA in operations like s[...] = np.nan (which I can attest litters people's pandas code), and then later add an option (where the default is that NaN and NA are differnet), and then later remove the option.

I am pretty confident that using bitmaps everywhere will make our code much simpler and faster (i.e. can use SIMD operations on the bitmaps to deal with null analytics and propagation).

wesm · 2017-02-12T21:37:28Z

As one example of why things will be faster, we can use bitmaps to eliminate branching in aggregations:

sum_x += values[i] * GetBit(bitmap, i);

compared with

if (values[i] == values[i]) sum_x += values[i];

jorisvandenbossche · 2017-02-12T21:37:46Z

Yes, we could have a 'nan_as_missing' option that is first True and could possibly later change (similar to how we now have pd.options.mode.use_inf_as_null option that is False by default).

jorisvandenbossche · 2017-02-12T21:38:03Z

The issue to discuss this is probably: wesm/pandas2#46

TomAugspurger · 2019-12-30T15:11:37Z

FWIW, I think this will be effectively closed by #29964. That changes IntegerArray to use pd.NA, which has NA as its repr.

In [1]: import pandas as pd

In [2]: pd.Series(pd.array([1, 2, None]))
Out[2]:
0     1
1     2
2    NA
dtype: Int64

I don't think we'll want to move away from np.nan for ndarrays, and I don't think we want to display nan as NA, since they have different behaviors.

wesm added the Enhancement label Feb 12, 2017

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string labels Feb 12, 2017

jreback added this to the 1.0 milestone Feb 12, 2017

TomAugspurger mentioned this issue Dec 30, 2019

API: Uses pd.NA in IntegerArray #29964

Merged

jreback closed this as completed in #29964 Dec 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Changing Series and DataFrame `repr` for NaN values #15375

ENH: Changing Series and DataFrame `repr` for NaN values #15375

wesm commented Feb 12, 2017

jorisvandenbossche commented Feb 12, 2017

Uh oh!

wesm commented Feb 12, 2017

Uh oh!

wesm commented Feb 12, 2017

Uh oh!

jorisvandenbossche commented Feb 12, 2017

Uh oh!

jorisvandenbossche commented Feb 12, 2017

Uh oh!

TomAugspurger commented Dec 30, 2019

Uh oh!

Uh oh!

ENH: Changing Series and DataFrame repr for NaN values #15375

ENH: Changing Series and DataFrame repr for NaN values #15375

Comments

wesm commented Feb 12, 2017

jorisvandenbossche commented Feb 12, 2017

Uh oh!

wesm commented Feb 12, 2017

Uh oh!

wesm commented Feb 12, 2017

Uh oh!

jorisvandenbossche commented Feb 12, 2017

Uh oh!

jorisvandenbossche commented Feb 12, 2017

Uh oh!

TomAugspurger commented Dec 30, 2019

Uh oh!

ENH: Changing Series and DataFrame `repr` for NaN values #15375

ENH: Changing Series and DataFrame `repr` for NaN values #15375