-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
DocsDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionTransformationse.g. cumsum, diff, ranke.g. cumsum, diff, rank
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html#pandas.Series.diff
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html#pandas.DataFrame.diff
Documentation problem
The documentation for pandas.Series.diff
and pandas.DataFrame.diff
states that no matter the dtype of the original series/column, the output will be of dtype float64
. This is not true for series/columns of dtypes bool
-- the output here is of dtype object
.
For example:
import pandas as pd
# pd.__version__ == '2.2.0'
s = pd.Series([True, True, False, False, True])
d = s.diff()
# d.dtype is now 'object'
Indeed, the underlying function algorithms.diff explicitly differentiates between boolean and integer dtypes.
Suggested fix for documentation
The Notes
section should read something like this:
Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub` and the result's dtype is ``object``.
Otherwise, the result is calculated according to the current dtype in {klass},
however the dtype of the result is always float64.
Metadata
Metadata
Assignees
Labels
DocsDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionTransformationse.g. cumsum, diff, ranke.g. cumsum, diff, rank