Description
Code Sample, a copy-pastable example if possible
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd
N = 100000
y = np.linspace(0, 10, N)
base_dt = dt.datetime(2000,1,1)
dt_range = [base_dt + dt.timedelta(minutes=x) for x in range(N)]
plt.figure('pyplot timeseries')
plt.plot(dt_range,y)
s = pd.Series(index=dt_range, data=y)
plt.figure('Pandas plot timeseries')
s.plot()
s2 = pd.Series(data=y)
plt.figure('Pandas plot series')
s2.plot()
Problem description
I noticed that plotting time series using .plot() sometimes resulted in very slow and unresponsive plots, where it is difficult to interact with the figure (e.g. pan). I think this happens with regularly spaced time series where there either is a frequency defined or pandas is able to infer the frequency. Perhaps it has something to do with the plot tick labels on the x (time) axis? It does not happen when plotting time series that are irregular, and thus when pandas does not style the plot ticks and tick labels.
In the example code the first and the third plot are smooth to interact with, while the second plot is lagging terribly. Also see screenshot of the second plot with the pandas styled tick labels:
If changing the dt_range from minute frequency to hour frequency (replace minutes
with hours
) the pandas.plot() becomes much smooth to interact with despite having the same series size, and I think because it has much fewer ticks and labels:
So it might be a combination of the size of series plotted and how the ticks are drawn/updated?
Are there ways to disable the styled pandas ticks somehow? I also notice that pyplot.plot() and pandas.plot() result in very different conversions of timestamps to numeric x values - is there a way to also disable that behavior in pandas so it is compatible with pyplot?
Expected Output
Equally responsive plotting as with pyplot.plot()
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
Edit: This performance issue might be related to #15071