Skip to content

Why is dropna default value is True in value_counts() methods/functions ? #21890

Open
@adrienpacifico

Description

@adrienpacifico

Problem description

>>> s = pd.Series([1,2,3, np.nan, 5])
>>> s.value_counts()
5    1
3    1
2    1
1    1

>>> s.value_counts(dropna = False)

 5     1
 3     1
 2     1
 1     1
NaN    1

For beginner in pandas, it can be puzzling and misleading to do not see NaNs values when trying to understand a DataFrame, a Series, etc. Especially if value_counts is used to check that previous operations were made in a correct way (e.g. join / merge-type operations).

As I can understand that it may seems natural to drop NaNs for various operations in pandas (means, etc), and as a consequence of that, the general default value for dropna arguments is True (is it really the real reason?).

I feel uncomfortable with the value_counts default behavior and it has (and still) caused me some troubles.

The zen of python second aphorism state:

Explicit is better than implicit

I do feel that dropping NaN value is done in a implicit way, that this implicit way is harmful.
If find no drawbacks to have False as default value, to the exception of having a Na in Series index.

The question:

So why is dropna arguments default value of value_counts() are True ?

Ps : I've looked into issues with filters : is:issue value_counts dropna to the exception of #5569 I didn't find a lot of informations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorAlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions