Skip to content

Pandas .plot() on regularly spaced timeseries result in slow plotting/interaction #31074

Closed
@rhkarls

Description

@rhkarls

Code Sample, a copy-pastable example if possible

import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd

N = 100000
y = np.linspace(0, 10, N)

base_dt = dt.datetime(2000,1,1)
dt_range = [base_dt + dt.timedelta(minutes=x) for x in range(N)]

plt.figure('pyplot timeseries')
plt.plot(dt_range,y)

s = pd.Series(index=dt_range, data=y)
plt.figure('Pandas plot timeseries')
s.plot()

s2 = pd.Series(data=y)
plt.figure('Pandas plot series')
s2.plot()

Problem description

I noticed that plotting time series using .plot() sometimes resulted in very slow and unresponsive plots, where it is difficult to interact with the figure (e.g. pan). I think this happens with regularly spaced time series where there either is a frequency defined or pandas is able to infer the frequency. Perhaps it has something to do with the plot tick labels on the x (time) axis? It does not happen when plotting time series that are irregular, and thus when pandas does not style the plot ticks and tick labels.

In the example code the first and the third plot are smooth to interact with, while the second plot is lagging terribly. Also see screenshot of the second plot with the pandas styled tick labels:
image

If changing the dt_range from minute frequency to hour frequency (replace minutes with hours) the pandas.plot() becomes much smooth to interact with despite having the same series size, and I think because it has much fewer ticks and labels:
image

So it might be a combination of the size of series plotted and how the ticks are drawn/updated?
Are there ways to disable the styled pandas ticks somehow? I also notice that pyplot.plot() and pandas.plot() result in very different conversions of timestamps to numeric x values - is there a way to also disable that behavior in pandas so it is compatible with pyplot?

Expected Output

Equally responsive plotting as with pyplot.plot()

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7

Edit: This performance issue might be related to #15071

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions