You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Caching (memoizing) the date_parser function in read_csv might be an easy perf improvement. Seems it is not cached unless I am missing something?
In [55]: df = pd.DataFrame([datetime.datetime.today()] * 1000000)
In [56]: df.to_csv('j', index=False)
In [57]: !gzip j
In [58]: %time df = pd.read_csv('j.gz')
CPU times: user 703 ms, sys: 68.7 ms, total: 772 ms
Wall time: 774 ms
In [59]: d = {df['0'][0]: datetime.datetime.today()}
In [60]: %time s = df['0'].map(d)
CPU times: user 84.8 ms, sys: 14.8 ms, total: 99.6 ms
Wall time: 99.2 ms
In [61]: %time df = pd.read_csv('j.gz', parse_dates=['0'])
CPU times: user 1.49 s, sys: 88.7 ms, total: 1.58 s
Wall time: 1.58 s
The text was updated successfully, but these errors were encountered:
Caching (memoizing) the date_parser function in read_csv might be an easy perf improvement. Seems it is not cached unless I am missing something?
The text was updated successfully, but these errors were encountered: