Skip to content

API/BUG: to_datetime(..., box=True) should always return an Index #21864

Closed
@mroeschke

Description

@mroeschke

Currently box=True is specified in the docstring to return a DatetimeIndex, but sometimes this is not possible due to how the dates are parsed and a different Index subclass is returned. However, this policy is not consistent:

In [8]: pd.__version__
Out[8]: '0.24.0.dev0+312.gd48f34141.dirty'

In [9]: idx = pd.Index([15e9], name='name')

In [10]: pd.to_datetime(idx, errors='ignore', box=True, unit='s')  # Index returned
Out[10]: Float64Index([15000000000.0], dtype='float64')

In [11]: pd.to_datetime(idx, errors='ignore', box=False, unit='s')
Out[11]: array([15000000000.0], dtype=object)

In [12]: malformed = np.array(['1/100/2000', np.nan], dtype=object)

In [13]: pd.to_datetime(malformed, errors='ignore', box=True)  # Index not returned
Out[13]: array(['1/100/2000', nan], dtype=object)

In [14]: pd.to_datetime(malformed, errors='ignore', box=False)
Out[14]: array(['1/100/2000', nan], dtype=object)

For consistency, I think box=True should always attempt to wrap the argument in an Index

Addressing this in #21822

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions