Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Relative timedelta isn't plotted like datetime #60079

Open
3 tasks done
maoding opened this issue Oct 21, 2024 · 5 comments
Open
3 tasks done

ENH: Relative timedelta isn't plotted like datetime #60079

maoding opened this issue Oct 21, 2024 · 5 comments
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Timedelta Timedelta data type Visualization plotting

Comments

@maoding
Copy link

maoding commented Oct 21, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

num_rows = 10
data = {
    "datetime": [pd.to_datetime("now") + pd.Timedelta(hours=i) * i for i in range(num_rows)],
    "values": (i for i in range(10)),
       }
df = pd.DataFrame(data)
df["time"] = df.datetime.diff().fillna(pd.Timedelta(0)).cumsum()

# plot with datetime index
df.set_index("datetime")["values"].plot(style=".-")

# plot with timedelta index
df.set_index("time")["values"].plot(style=".-")

Issue Description

Plotting the same data with a datetime index or timedelta index yield different plot results. In the first case equispaced x-ticks are created whereas in the latter the time relation isn't considered.

image

image

Expected Behavior

I expected the same plot simply with different x-axis labels.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.5.final.0
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 75.1.0
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.2.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.27.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@maoding maoding added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 21, 2024
@rhshadrach
Copy link
Member

Thanks for the report. This is Matplotlib adjusting the x-axis for datetimes. I believe it only supports numeric and datetimes but not timedelta. I think this is matplotlib/matplotlib#8869.

That said, pandas could support timedeltas similar to datetime by converting back and forth under the hood. E.g. for this example, adding the code after

https://github.com/pandas-dev/pandas/blob/d11ed2f1193c7a45446a702170b8ca0368bc07d3/pandas/plotting/

elif isinstance(index, ABCTimedeltaIndex):
    import datetime
    x = (index + datetime.datetime(1970, 1, 1))._mpl_repr()
    self._need_to_set_index = True

and replacing get_label here:

def get_label(i):

def get_label(i):
    import datetime
    return pprint_thing(datetime.timedelta(days=i))

gives

image

If we were to make this change, I'd guess we'd also need to adjust the y-axis, secondary_y, and the interaction with other settings and backends. While the hack above is straightforward, making this into a full-fledged PR seems to me to take a lot of effort.

Marking as Needs Discussion for now.

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Enhancement Visualization plotting Timedelta Timedelta data type and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2024
@rhshadrach rhshadrach changed the title BUG: Relative timedelta isn't plotted like datetime ENH: Relative timedelta isn't plotted like datetime Nov 2, 2024
@gitsdfawfevi
Copy link

Hello, for me this is a blatant bug. In my case, I evaluate simulations. I write the values into a DataFrame with timedelta as index. However, when I plot the data, incorrect progressions are now displayed, but this is not immediately apparent.

Using datetime is the wrong choice in this case, as there is no starting day on which the simulation starts (it is relatively at 0 seconds). Nevertheless, I would like to have a timedelta progression with years, days etc. displayed in the plot (and no float values with seconds as unit). You can certainly create this yourself, but then a warning that matplotlib does not understand timedelta would be essential.

However, this brings me to the question of whether it makes sense to solve it in pandas at all or should it not rather be fixed directly in matplotlib?

@rhshadrach
Copy link
Member

I think pandas should strive to have as lightweight a layer for interoperability with matplotlib as possible. In addition, based on the comments in the above link, this has been a pain point for other users of matplotlib. As such, it seems preferable to solve this there.

@maoding
Copy link
Author

maoding commented Nov 12, 2024

Thanks @rhshadrach for the suggestion and pointing out that the origin lies within matplotlib.

The issue matplotlib/matplotlib#8869 was from 2017 and abandoned for a long time. Until 2023 where a comment was made that matplotlib is more or less fine with the way it is.
Of course it would be lovely to have the problem fixed in the root library but it seems we cannot expect any changes happening in that direction soon.

I am certainly not in a position or the person who can decide where this should be fixed.

Maybe I can add a use case where this feature might be helpful. Imagine the data example from above but you would like to plot a second measurement which happend 20 hours later but had the same length. When plotting with datetime index we would separate the data along x-axis whereas when using timedelta index we nicely overlap the data to see differences.

Two measurements 20 hours apart with datetime index:
image

The same measurements with timedelta index:
image
(Of course here is the fix from @rhshadrach missing for fixing the x-axis.)

@rhshadrach
Copy link
Member

rhshadrach commented Nov 12, 2024

@maoding

Until 2023 where a comment was made that matplotlib is more or less fine with the way it is.
Of course it would be lovely to have the problem fixed in the root library but it seems we cannot expect any changes happening in that direction soon.

My read of the linked issue is that matplotlib is open to solving this there, it just needs a contributor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Timedelta Timedelta data type Visualization plotting
Projects
None yet
Development

No branches or pull requests

3 participants