Description
Problem description
>>> s = pd.Series([1,2,3, np.nan, 5])
>>> s.value_counts()
5 1
3 1
2 1
1 1
>>> s.value_counts(dropna = False)
5 1
3 1
2 1
1 1
NaN 1
For beginner in pandas, it can be puzzling and misleading to do not see NaNs values when trying to understand a DataFrame, a Series, etc. Especially if value_counts
is used to check that previous operations were made in a correct way (e.g. join / merge-type operations).
As I can understand that it may seems natural to drop NaNs
for various operations in pandas (means, etc), and as a consequence of that, the general default value for dropna
arguments is True
(is it really the real reason?).
I feel uncomfortable with the value_counts default behavior and it has (and still) caused me some troubles.
The zen of python second aphorism state:
Explicit is better than implicit
I do feel that dropping NaN value is done in a implicit way, that this implicit way is harmful.
If find no drawbacks to have False
as default value, to the exception of having a Na in Series
index
.
The question:
So why is dropna
arguments default value of value_counts()
are True
?
Ps : I've looked into issues with filters : is:issue value_counts dropna
to the exception of #5569 I didn't find a lot of informations.