Skip to content

ENH: Implemented lazy iteration #20796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 25, 2018
Merged

ENH: Implemented lazy iteration #20796

merged 12 commits into from
Dec 25, 2018

Conversation

mitar
Copy link
Contributor

@mitar mitar commented Apr 23, 2018

Fixes GH20783.

@TomAugspurger
Copy link
Contributor

Looks like the 2.7 failures are relevant.

This would need a release note.

How's the performance when actually iterating? e.g.

df = pd.DataFrame({"A": np.arange(100000)})
list(iter(df.itertuples()))

@TomAugspurger TomAugspurger added the Performance Memory or execution speed performance label Apr 23, 2018
@mitar
Copy link
Contributor Author

mitar commented Apr 23, 2018

Looks like the 2.7 failures are relevant.

Yea, I think I just managed to fix those. I had to import map from pandas.compat. Testing it locally and I will push once tests finish.

This would need a release note.

Where does this go? Any instructions anywhere?

How's the performance when actually iterating? e.g.

Memory wise: great. It just has to have one row at a time in memory. CPU wise: it looks like around 2-3x slower than tolist(). So list(series) vs. series.tolist(). I think this is good. So semantics between list(series) and series.tolist() is preserved, but if you want an optimized version you should call tolist() so that it constructs a list in C, and not in Python. Not all uses of iteration is to construct a list.

Oh, and of course. This version immediately starts returning values while before you had to wait for everything to finish. So latency is much lower, while overall time is around 2-3x.

@codecov
Copy link

codecov bot commented Apr 23, 2018

Codecov Report

Merging #20796 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20796      +/-   ##
==========================================
+ Coverage    92.3%    92.3%   +<.01%     
==========================================
  Files         163      163              
  Lines       51943    51947       +4     
==========================================
+ Hits        47946    47951       +5     
+ Misses       3997     3996       -1
Flag Coverage Δ
#multiple 90.71% <100%> (ø) ⬆️
#single 42.99% <55.55%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.91% <100%> (ø) ⬆️
pandas/core/base.py 97.68% <100%> (+0.02%) ⬆️
pandas/util/testing.py 87.84% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7bc3f...766ba8f. Read the comment docs.

@mitar
Copy link
Contributor Author

mitar commented Apr 23, 2018

Correction. I was testing before just iterating directly on Series. It seems making tuples has some overhead and underlying changes do not show through. I made more thorough performance evaluation now, averaging over 10 runs and there is not much impact on overall time for regular itertuples, but getting first element out is much faster.

I used this to test both how long it takes to create a list of all tuples, how long it takes to get the first tuple, and how long it takes to get only simple tuples.

import numpy as np
import pandas as pd
import time

print(pd.__path__)

def perf(f):
    start = time.perf_counter()
    f()
    end = time.perf_counter()
    return end - start

def a(): 
    list(iter(df.itertuples()))

def b(): 
    next(iter(df.itertuples()))

def c(): 
    list(iter(df.itertuples(index=False, name=None)))

df = pd.DataFrame({"A": np.arange(10000000)})
print(sum(perf(a) for i in range(10)) / 10)
print(sum(perf(b) for i in range(10)) / 10)
print(sum(perf(c) for i in range(10)) / 10)

Old version:

11.947458256299797
0.7665189374000875
1.5786312711999018

New version:

11.371349476900013
0.0006540167998537072
1.8471969083000659

@mitar
Copy link
Contributor Author

mitar commented Apr 23, 2018

It is interesting. It seems this even makes the regular (Series version) slightly faster, while simple-tuple version is slower a bit. I could repeat these results even after trying a bit more.

@mroeschke
Copy link
Member

If you want to measure performance, we also have an asv performance benchmark for this method here:

def time_itertuples(self):
for row in self.df2.itertuples():
pass

Here's a guide on how to run the asv benchmark to evaluate performance changes.

fields.append("Index")

# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
iterators.extend(self.iloc[:, k] for k in range(len(self.columns)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe iterators can also be evaluated lazily by using itertools.chain. Something along the lines of:

iterators = itertools.chain([self.index], (self.iloc[:, k] for k in range(len(self.columns)))

Copy link
Contributor Author

@mitar mitar Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but I do not think this is necessary, because those are per-column and have to be done immediately afterwards even for the first row. So you move forcing execution from one line in a function to another line in the same function. So later on we do zip(*iterators) which forces iterators to be made into a list. If we would manually zip things, then we could go without that. But passing *iterators forces the list. It is also better that any exception evaluating columns is thrown here and not later inside that other try/catch.

So I do not think this is necessary.

@mitar
Copy link
Contributor Author

mitar commented Apr 23, 2018

Tests are failing for some unrelated reason.

@mitar
Copy link
Contributor Author

mitar commented Apr 24, 2018

I ran benchmarks, but changes are all over the place. Both positive and negative. For same kind of methods (like rolling) some go up and down. I am not sure how stable are those benchmarks. I have to leave it for few hours to run and could not assure complete idleness of the computer. Also, absolute times are just few ms for many of them. I think this is hard to measure well.

     [0ae7e909]       [f2fbb39a]
+        79.6±1ms         214±10ms     2.69  binary_ops.Ops.time_frame_comparison(False, 'default')
+     1.11±0.02ms       2.97±0.7ms     2.68  inference.NumericInferOps.time_divide(<class 'numpy.uint32'>)
+         169±6μs         440±50μs     2.60  inference.NumericInferOps.time_subtract(<class 'numpy.int8'>)
+        83.6±4ms        203±0.7ms     2.43  binary_ops.Ops.time_frame_comparison(False, 1)
+      5.62±0.1ms       13.6±0.1ms     2.43  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'min')
+        190±30μs        459±100μs     2.41  inference.NumericInferOps.time_subtract(<class 'numpy.uint8'>)
+     4.18±0.05ms       9.87±0.2ms     2.36  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'count')
+      5.60±0.1ms       13.1±0.3ms     2.35  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'max')
+         231±6μs         542±60μs     2.34  inference.NumericInferOps.time_add(<class 'numpy.int16'>)
+     5.15±0.09ms       12.0±0.4ms     2.33  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     5.35±0.08ms       12.4±0.2ms     2.31  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+         498±9μs       1.13±0.1ms     2.26  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'direct')
+        679±10μs       1.48±0.2ms     2.19  groupby.GroupByMethods.time_dtype_as_group('float', 'pct_change', 'transformation')
+     4.31±0.06ms       9.10±0.1ms     2.11  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'mean')
+        592±20μs      1.22±0.05ms     2.06  groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'direct')
+      4.08±0.3ms      8.31±0.02ms     2.04  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'sum')
+     5.07±0.04ms       10.1±0.2ms     1.99  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     5.44±0.04ms      10.8±0.09ms     1.98  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'std')
+      3.89±0.4ms       7.69±0.3ms     1.98  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'sum')
+        715±10μs       1.38±0.1ms     1.93  groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'direct')
+      5.63±0.1ms       10.8±0.2ms     1.91  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'min')
+      4.21±0.1ms       8.04±0.2ms     1.91  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'count')
+      5.56±0.1ms       10.4±0.2ms     1.86  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'max')
+      5.22±0.1ms       9.72±0.4ms     1.86  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'kurt')
+         124±1ms          229±9ms     1.85  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'median')
+      5.52±0.1ms       9.97±0.3ms     1.81  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'std')
+     4.42±0.09ms       7.86±0.2ms     1.78  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'mean')
+         106±2ms          182±3ms     1.72  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'median')
+      40.5±0.5ms       66.4±0.9ms     1.64  frame_methods.Iteration.time_itertuples
+         323±3ms         515±20ms     1.60  sparse.SparseDataFrameConstructor.time_from_scipy
+     1.49±0.03μs       2.35±0.1μs     1.58  timestamp.TimestampConstruction.time_parse_iso8601_no_tz
+         666±5μs      1.01±0.06ms     1.52  groupby.GroupByMethods.time_dtype_as_field('int', 'pct_change', 'direct')
+         378±8μs         569±40μs     1.51  groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'transformation')
+      26.7±0.3ms       39.5±0.6ms     1.48  sparse.ToCoo.time_sparse_series_to_coo
+     3.82±0.08ms         5.65±1ms     1.48  inference.NumericInferOps.time_modulo(<class 'numpy.int32'>)
+      2.82±0.3ms       4.15±0.4ms     1.47  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+           2.75s            4.04s     1.47  sparse.SparseDataFrameConstructor.time_constructor
+        388±10μs         566±30μs     1.46  groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'direct')
+      16.0±0.3μs         23.3±3μs     1.45  timestamp.TimestampProperties.time_weekday_name(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+      5.98±0.4ms       8.65±0.2ms     1.45  groupby.Categories.time_groupby_extra_cat_nosort
+      5.43±0.2ms       7.82±0.2ms     1.44  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+      2.81±0.3ms       4.04±0.4ms     1.44  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'max')
+      2.81±0.3ms       4.01±0.4ms     1.43  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'min')
+         156±4ms          222±5ms     1.42  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'numpy.int64'>)
+         113±2ms          161±3ms     1.42  sparse.SparseDataFrameConstructor.time_from_dict
+     2.18±0.03ms       3.10±0.2ms     1.42  rolling.Methods.time_rolling('Series', 10, 'int', 'mean')
+          10.8μs           15.2μs     1.41  ctors.SeriesDtypesConstructors.time_dtindex_from_series
+     3.02±0.02ms       4.24±0.2ms     1.40  rolling.Methods.time_rolling('Series', 10, 'int', 'min')
+         105±1ms          148±4ms     1.40  stat_ops.Correlation.time_corr('spearman')
+     2.79±0.08ms       3.90±0.8ms     1.39  stat_ops.FrameMultiIndexOps.time_op(0, 'mean')
+         432±5μs         602±50μs     1.39  indexing.MultiIndexing.time_frame_ix
+      8.60±0.8ms       11.9±0.8ms     1.38  groupby.Categories.time_groupby_nosort
+      25.8±0.3ms       35.6±0.4ms     1.38  sparse.SparseArrayConstructor.time_sparse_array(0.01, 0, <class 'object'>)
+      3.19±0.1ms       4.37±0.3ms     1.37  rolling.Methods.time_rolling('Series', 1000, 'float', 'kurt')
+      13.0±0.3ms       17.8±0.5ms     1.37  groupby.Nth.time_frame_nth('datetime')
+         157±3ms          214±3ms     1.36  sparse.SparseSeriesToFrame.time_series_to_frame
+         416±6μs         564±20μs     1.36  groupby.GroupByMethods.time_dtype_as_group('float', 'median', 'direct')
+     3.68±0.06ms       4.99±0.6ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'mean')
+         109±1μs         147±20μs     1.36  groupby.GroupByMethods.time_dtype_as_group('datetime', 'count', 'transformation')
+     2.98±0.06ms       4.00±0.1ms     1.34  stat_ops.Correlation.time_corr('pearson')
+        419±10μs         560±20μs     1.34  groupby.GroupByMethods.time_dtype_as_group('float', 'median', 'transformation')
+         136±2μs         182±20μs     1.33  timestamp.TimestampConstruction.time_parse_dateutil
+     3.33±0.04ms       4.44±0.2ms     1.33  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'count')
+      24.6±0.5ms       32.7±0.5ms     1.33  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'object'>)
+     3.07±0.03ms       4.06±0.2ms     1.32  rolling.Methods.time_rolling('Series', 10, 'int', 'max')
+        1.06±0ms       1.40±0.1ms     1.32  inference.NumericInferOps.time_divide(<class 'numpy.uint8'>)
+         988±8μs      1.30±0.07ms     1.32  groupby.GroupByMethods.time_dtype_as_group('float', 'value_counts', 'transformation')
+     5.42±0.07ms       7.12±0.3ms     1.31  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'min')
+         234±2ms          308±4ms     1.31  stat_ops.Correlation.time_corr('kendall')
+     1.87±0.04ms       2.46±0.1ms     1.31  timeseries.ToDatetimeCache.time_dup_string_dates(False)
+     3.71±0.08ms      4.84±0.02ms     1.31  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'numpy.float64'>)
+        91.7±5μs          120±9μs     1.30  timeseries.SortIndex.time_sort_index(True)
+      2.71±0.3ms       3.53±0.4ms     1.30  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     3.95±0.07ms       5.12±0.2ms     1.29  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'mean')
+     2.13±0.02ms      2.76±0.06ms     1.29  rolling.Methods.time_rolling('Series', 10, 'float', 'mean')
+     4.99±0.06ms       6.44±0.5ms     1.29  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+         318±1ms         410±40ms     1.29  sparse.Arithmetic.time_make_union(0.1, nan)
+     1.15±0.01ms      1.48±0.01ms     1.29  index_object.Ops.time_subtract('float')
+        576±10μs         736±90μs     1.28  frame_methods.Quantile.time_frame_quantile(0)
+      9.37±0.2μs       12.0±0.4μs     1.28  timestamp.TimestampProperties.time_weekday_name(None, 'B')
+     2.84±0.04ms       3.62±0.3ms     1.27  timeseries.ToDatetimeCache.time_dup_string_dates_and_format(True)
+      3.92±0.4ms       4.98±0.3ms     1.27  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'count')
+      20.8±0.6ms       26.4±0.8ms     1.27  frame_methods.Repr.time_html_repr_trunc_mi
+      12.1±0.6ms       15.2±0.1ms     1.26  groupby.Categories.time_groupby_ordered_nosort
+         521±3μs          658±2μs     1.26  frame_methods.Iteration.time_iteritems_cached
+     3.63±0.06ms       4.56±0.4ms     1.26  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'sum')
+         176±3μs         221±20μs     1.26  groupby.GroupByMethods.time_dtype_as_group('float', 'first', 'direct')
+     1.97±0.04ms       2.46±0.1ms     1.25  rolling.Methods.time_rolling('Series', 10, 'float', 'sum')
+     3.01±0.06ms       3.76±0.2ms     1.25  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(True)
+      9.35±0.3μs       11.7±0.2μs     1.25  timestamp.TimestampProperties.time_weekday_name(None, None)
+     5.49±0.07ms       6.81±0.4ms     1.24  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'std')
+     3.49±0.08ms       4.32±0.4ms     1.24  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'count')
+         169±3ms          209±2ms     1.24  groupby.MultiColumn.time_lambda_sum
+      82.9±0.7μs          102±4μs     1.24  indexing.IntervalIndexing.time_loc_scalar
+         113±2ms          139±3ms     1.23  gil.ParallelFactorize.time_loop(8)
+     5.56±0.08ms       6.82±0.5ms     1.23  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'min')
+        984±10μs      1.21±0.04ms     1.23  inference.NumericInferOps.time_divide(<class 'numpy.int16'>)
+           1.10s            1.34s     1.22  groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'transformation')
+     4.42±0.09ms       5.39±0.3ms     1.22  timeseries.ToDatetimeCache.time_dup_seconds_and_unit(True)
+     4.85±0.06ms       5.91±0.3ms     1.22  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'max')
+           1.13s            1.37s     1.22  groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'direct')
+     3.34±0.04ms       4.06±0.2ms     1.22  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'count')
+         201±3ms          243±6ms     1.21  frame_methods.SortValues.time_frame_sort_values(False)
+     3.92±0.03ms       4.75±0.1ms     1.21  timeseries.ToDatetimeISO8601.time_iso8601_nosep
+         182±2ms          221±5ms     1.21  panel_ctor.DifferentIndexes.time_from_dict
+         147±4ms         178±10ms     1.21  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
+      78.0±0.8ms         93.6±3ms     1.20  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'median')
+     1.75±0.03μs      2.10±0.03μs     1.20  timestamp.TimestampConstruction.time_parse_today
+         237±6μs         284±30μs     1.20  reindex.Fillna.time_float_32('backfill')
+      4.97±0.1ms       5.93±0.3ms     1.19  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'std')
+           534ms            637ms     1.19  panel_methods.PanelMethods.time_pct_change('major')
+        89.9±1ms          107±8ms     1.19  frame_methods.Repr.time_frame_repr_wide
+         155±2μs          185±8μs     1.19  groupby.GroupByMethods.time_dtype_as_group('object', 'last', 'transformation')
+     3.76±0.08ms       4.48±0.1ms     1.19  inference.NumericInferOps.time_add(<class 'numpy.float64'>)
+     5.74±0.03ms       6.83±0.4ms     1.19  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'max')
+     2.59±0.02ms       3.06±0.1ms     1.18  timeseries.ResampleSeries.time_resample('datetime', '5min', 'ohlc')
+         383±3μs         452±20μs     1.18  indexing.MultiIndexing.time_series_ix
+         118±3μs          138±4μs     1.18  panel_methods.PanelMethods.time_shift('major')
+      5.97±0.1ms       7.02±0.2ms     1.18  groupby.Apply.time_scalar_function_single_col
+         181±3μs         212±10μs     1.17  groupby.GroupByMethods.time_dtype_as_group('datetime', 'bfill', 'transformation')
+         131±2μs          153±3μs     1.17  indexing.IntervalIndexing.time_loc_list
+        51.3±1ms       60.0±0.9ms     1.17  gil.ParallelGroupbyMethods.time_loop(4, 'max')
+      35.0±0.2ms         40.8±3ms     1.17  rolling.Methods.time_rolling('Series', 10, 'float', 'median')
+      35.3±0.4ms         41.1±1ms     1.17  rolling.Methods.time_rolling('Series', 10, 'int', 'median')
+      20.9±0.3ms       24.2±0.6ms     1.16  frame_methods.Repr.time_repr_tall
+         347±3μs          403±6μs     1.16  groupby.GroupByMethods.time_dtype_as_group('float', 'head', 'transformation')
+         120±2ms          139±6ms     1.16  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'median')
+     5.59±0.07ms       6.47±0.3ms     1.16  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'std')
+      10.4±0.2ms       12.0±0.2ms     1.16  inference.DateInferOps.time_add_timedeltas
+         381±8ns         440±10ns     1.16  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+     39.3±0.08ms         45.4±3ms     1.16  sparse.Arithmetic.time_divide(0.1, 0)
+     2.25±0.02ms       2.60±0.1ms     1.16  timeseries.ResampleSeries.time_resample('datetime', '5min', 'mean')
+      11.4±0.3ms       13.1±0.2ms     1.15  algorithms.Hashing.time_series_string
+         155±1μs         179±10μs     1.15  groupby.GroupByMethods.time_dtype_as_group('object', 'last', 'direct')
+      19.2±0.3μs       22.1±0.4μs     1.15  timestamp.TimestampProperties.time_is_quarter_start(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+          17.0ms           19.6ms     1.15  index_object.Indexing.time_boolean_series('String')
+      4.94±0.1ms       5.66±0.1ms     1.15  inference.NumericInferOps.time_modulo(<class 'numpy.float64'>)
+         480±4ms          550±9ms     1.15  groupby.GroupByMethods.time_dtype_as_field('float', 'mad', 'transformation')
+         152±3μs          174±3μs     1.14  indexing.IntervalIndexing.time_getitem_list
+         258±4ms          295±2ms     1.14  reshape.WideToLong.time_wide_to_long_big
+      34.7±0.8ms       39.8±0.7ms     1.14  sparse.Arithmetic.time_divide(0.01, nan)
+     3.05±0.06ms      3.49±0.06ms     1.14  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+         213±2μs         244±20μs     1.14  groupby.GroupByMethods.time_dtype_as_field('float', 'cumcount', 'transformation')
+           1.02s            1.16s     1.14  groupby.Apply.time_copy_function_multi_col
+     1.86±0.05ms      2.13±0.04ms     1.14  timeseries.ToDatetimeCache.time_dup_string_dates_and_format(False)
+      4.87±0.4ms       5.55±0.2ms     1.14  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'max')
+       105±0.9ms         120±10ms     1.14  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'median')
+         169±2μs         192±10μs     1.14  groupby.GroupByMethods.time_dtype_as_field('float', 'median', 'direct')
+      83.0±0.8μs         94.6±3μs     1.14  period.PeriodProperties.time_property('M', 'end_time')
+        394±10ns          449±3ns     1.14  timestamp.TimestampProperties.time_week(None, None)
+     6.91±0.09μs       7.88±0.2μs     1.14  offset.OnOffset.time_on_offset(<MonthBegin>)
+     4.99±0.06μs       5.68±0.2μs     1.14  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime')
+         304±5ns          347±7ns     1.14  timestamp.TimestampProperties.time_tz(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+     2.77±0.06ms      3.15±0.05ms     1.14  sparse.Arithmetic.time_intersect(0.1, 0)
+     4.03±0.05ms       4.58±0.3ms     1.14  groupby.GroupManyLabels.time_sum(1000)
+         381±3ms         434±10ms     1.14  groupby.Groups.time_series_groups('object_large')
+      32.3±0.4ms         36.7±1ms     1.14  sparse.Arithmetic.time_divide(0.1, nan)
+     3.50±0.04ms       3.96±0.1ms     1.13  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'count')
+     1.55±0.01μs      1.75±0.01μs     1.13  timestamp.TimestampConstruction.time_fromtimestamp
+         399±6ms          450±2ms     1.13  groupby.Apply.time_copy_overhead_single_col
+         128±1μs          145±8μs     1.13  groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'direct')
+      14.7±0.2ms       16.5±0.7ms     1.13  frame_methods.Repr.time_html_repr_trunc_si
+         132±2μs          149±4μs     1.13  groupby.GroupByMethods.time_dtype_as_field('float', 'min', 'transformation')
+     3.94±0.08ms      4.44±0.08ms     1.12  timeseries.ToDatetimeISO8601.time_iso8601_format
+         387±2ns          435±6ns     1.12  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+     2.23±0.04ms      2.51±0.05ms     1.12  algorithms.Hashing.time_series_dates
+           623ms            698ms     1.12  panel_methods.PanelMethods.time_pct_change('items')
+        84.0±2μs         94.2±4μs     1.12  groupby.GroupByMethods.time_dtype_as_group('object', 'any', 'direct')
+      42.2±0.9μs         47.3±1μs     1.12  timestamp.TimestampOps.time_replace_tz('US/Eastern')
+          5.61μs           6.28μs     1.12  index_object.Indexing.time_slice_step('String')
+         395±9ns         440±10ns     1.12  timestamp.TimestampProperties.time_week(None, 'B')
+      13.7±0.4ms       15.2±0.3ms     1.11  groupby.MultiColumn.time_col_select_numpy_sum
+         126±2μs          140±4μs     1.11  groupby.GroupByMethods.time_dtype_as_field('float', 'mean', 'direct')
+     1.00±0.02μs      1.12±0.02μs     1.11  index_object.Range.time_min_trivial
+      72.5±0.4μs         80.7±2μs     1.11  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('string')
+         315±2ns         350±10ns     1.11  timestamp.TimestampProperties.time_tz(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+     2.24±0.03ms       2.49±0.1ms     1.11  algorithms.Hashing.time_series_timedeltas
+      7.96±0.1ms       8.85±0.1ms     1.11  inference.NumericInferOps.time_modulo(<class 'numpy.uint64'>)
+         274±5μs          305±9μs     1.11  groupby.GroupByMethods.time_dtype_as_group('float', 'ffill', 'direct')
+     5.80±0.03μs       6.45±0.1μs     1.11  offset.OnOffset.time_on_offset(<SemiMonthBegin: day_of_month=15>)
+      9.11±0.1ms       10.1±0.5ms     1.11  stat_ops.FrameOps.time_op('var', 'int', 1, True)
+      16.7±0.2ms       18.5±0.4ms     1.11  groupby.MultiColumn.time_cython_sum
+        83.0±2μs         92.0±2μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'size', 'direct')
+         174±3μs          193±5μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'last', 'direct')
+      5.40±0.1ms      5.98±0.07ms     1.11  groupby.CountMultiDtype.time_multi_count
+        750±10ns         829±20ns     1.11  timestamp.TimestampOps.time_to_pydatetime('US/Eastern')
+        87.2±2μs         96.4±3μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'any', 'transformation')
+      13.3±0.3μs       14.7±0.2μs     1.11  timedelta.TimedeltaConstructor.time_from_components
+      4.76±0.1ms       5.25±0.2ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+         175±2μs         193±20μs     1.10  groupby.GroupByMethods.time_dtype_as_group('float', 'min', 'transformation')
-        64.0±2ms       58.1±0.6ms     0.91  io.excel.Excel.time_read_excel('xlwt')
-     1.06±0.03ms         961±10μs     0.91  groupby.GroupByMethods.time_dtype_as_group('int', 'value_counts', 'transformation')
-         146±1ms          132±1ms     0.91  io.excel.Excel.time_read_excel('openpyxl')
-      2.39±0.2ms      2.17±0.02ms     0.91  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', 'round_trip')
-        169±10ms          154±2ms     0.91  io.json.ToJSON.time_delta_int_tstamp_lines('columns')
-     10.6±0.06ms       9.61±0.2ms     0.91  io.hdf.HDFStoreDataFrame.time_query_store_table_wide
-         296±4μs          268±3μs     0.91  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<QuarterEnd: startingMonth=3>)
-       360±0.9ms          325±2ms     0.91  io.stata.Stata.time_read_stata('ty')
-      28.9±0.4μs       26.2±0.3μs     0.90  offset.OffestDatetimeArithmetic.time_subtract(<BusinessDay>)
-     2.20±0.09ms      1.99±0.04ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'high')
-         176±6ms        159±0.9ms     0.90  inference.ToNumericDowncast.time_downcast('string-int', None)
-     1.26±0.01ms      1.13±0.01ms     0.90  index_object.Ops.time_add('float')
-        12.0±1ms      10.9±0.04ms     0.90  index_object.Ops.time_modulo('int')
-      18.5±0.6μs       16.6±0.5μs     0.90  offset.OffestDatetimeArithmetic.time_apply(<BusinessQuarterBegin: startingMonth=3>)
-         272±9μs          244±4μs     0.90  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<MonthBegin>)
-        632±30μs          569±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'transformation')
-         130±5ms          117±2ms     0.90  io.json.ToJSON.time_floats_with_dt_index_lines('split')
-         184±8ms          166±3ms     0.90  join_merge.ConcatPanels.time_f_ordered(2, False)
-          6.44μs           5.79μs     0.90  index_object.Indexing.time_slice_step('Float')
-         607±3ms          546±2ms     0.90  join_merge.ConcatPanels.time_c_ordered(2, True)
-         172±4ms          154±1ms     0.90  inference.ToNumericDowncast.time_downcast('string-nint', 'unsigned')
-      44.8±0.8ms       40.2±0.2ms     0.90  io.csv.ReadCSVCategorical.time_convert_direct
-         255±8ms          229±2ms     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-        26.4±1μs       23.6±0.3μs     0.89  offset.OffestDatetimeArithmetic.time_add_10(<MonthEnd>)
-         107±2ms         95.9±2ms     0.89  io.json.ToJSON.time_delta_int_tstamp('columns')
-           823ms            736ms     0.89  join_merge.MergeCategoricals.time_merge_object
-      30.2±0.3ms       27.0±0.3ms     0.89  io.sql.SQL.time_read_sql_query('sqlite')
-        394±10μs          351±2μs     0.89  groupby.GroupByMethods.time_dtype_as_field('object', 'ffill', 'transformation')
-           144ms            128ms     0.89  index_object.IndexAppend.time_append_obj_list
-      29.7±0.6μs       26.5±0.5μs     0.89  offset.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterBegin: startingMonth=3>)
-         171±6ms          152±2ms     0.89  io.json.ToJSON.time_float_int_str_lines('columns')
-      33.7±0.8ms       29.9±0.3ms     0.89  inference.ToNumericDowncast.time_downcast('datetime64', 'integer')
-      46.2±0.2ms         41.0±1ms     0.89  io.hdf.HDFStoreDataFrame.time_write_store_table
-         287±1ms          255±3ms     0.89  io.stata.Stata.time_write_stata('td')
-        371±10ms        329±0.4ms     0.89  io.excel.Excel.time_write_excel('xlsxwriter')
-         145±8ms          129±2ms     0.89  io.csv.ToCSV.time_frame('wide')
-        18.5±1μs       16.3±0.2μs     0.88  offset.OffestDatetimeArithmetic.time_apply(<BusinessQuarterEnd: startingMonth=3>)
-       136±0.9ms        120±0.6ms     0.88  io.excel.Excel.time_read_excel('xlsxwriter')
-        20.2±1ms      17.9±0.07ms     0.88  io.csv.ReadCSVThousands.time_thousands('|', None)
-         184±7ms        163±0.7ms     0.88  inference.ToNumericDowncast.time_downcast('string-nint', 'signed')
-           113ms            100ms     0.88  strings.Repeat.time_repeat('int')
-        64.3±1ms       56.8±0.1ms     0.88  io.csv.ReadCSVCategorical.time_convert_post
-        327±30μs          288±2μs     0.88  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessMonthEnd>)
-          11.4ms      10.0±0.02ms     0.88  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'float')
-        20.2±1ms       17.7±0.3ms     0.88  reshape.PivotTable.time_pivot_table
-     1.75±0.06ms      1.53±0.04ms     0.87  groupby.Categories.time_groupby_sort
-         135±7ms        118±0.9ms     0.87  io.json.ToJSON.time_floats_with_int_idex_lines('index')
-     2.46±0.04ms      2.14±0.08ms     0.87  groupby.Datelike.time_sum('date_range')
-        32.9±1ms       28.7±0.8ms     0.87  io.csv.ToCSV.time_frame('mixed')
-      1.44±0.2μs      1.26±0.01μs     0.87  period.PeriodProperties.time_property('M', 'qyear')
-         373±6ms          323±1ms     0.87  io.json.ReadJSONLines.time_read_json_lines('datetime')
-      1.78±0.1ms      1.55±0.03ms     0.87  io.csv.ReadUint64Integers.time_read_uint64_neg_values
-     1.43±0.03μs      1.24±0.01μs     0.87  period.PeriodProperties.time_property('M', 'is_leap_year')
-      18.5±0.9μs       16.0±0.4μs     0.86  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthEnd>)
-           1.83s            1.58s     0.86  groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'transformation')
-        960±50μs          829±5μs     0.86  groupby.GroupByMethods.time_dtype_as_field('float', 'rank', 'transformation')
-         174±6ms          150±3ms     0.86  io.json.ToJSON.time_float_int_str_lines('index')
-           290ms            250ms     0.86  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'string')
-      28.8±0.9μs       24.8±0.5μs     0.86  offset.OffestDatetimeArithmetic.time_subtract(<QuarterEnd: startingMonth=3>)
-     3.40±0.01ms      2.92±0.05ms     0.86  io.hdf.HDFStoreDataFrame.time_read_store_table
-      2.34±0.1ms      2.00±0.04ms     0.85  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', 'high')
-      13.2±0.3ms      11.3±0.02ms     0.85  index_object.SetOperations.time_operation('int', 'symmetric_difference')
-      4.41±0.2ms      3.77±0.06ms     0.85  groupby.Datelike.time_sum('date_range_tz')
-         196±4μs          168±5μs     0.85  groupby.GroupByMethods.time_dtype_as_group('int', 'last', 'transformation')
-         133±5ms          113±1ms     0.85  io.json.ReadJSON.time_read_json('split', 'int')
-      1.87±0.1ms      1.60±0.01ms     0.85  io.csv.ReadUint64Integers.time_read_uint64_na_values
-     1.32±0.02ms      1.13±0.02ms     0.85  inference.NumericInferOps.time_add(<class 'numpy.int32'>)
-        305±10ms          260±4ms     0.85  io.stata.Stata.time_write_stata('th')
-        79.3±1μs         67.6±2μs     0.85  indexing.NumericSeriesIndexing.time_iloc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>)
-        95.7±3ms         81.5±2ms     0.85  io.hdf.HDF.time_write_hdf('table')
-          3.93ms           3.34ms     0.85  index_object.Indexing.time_boolean_series('Float')
-          11.8ms           10.0ms     0.85  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'float_with_nan')
-          15.7ms       13.3±0.1ms     0.85  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('float_with_nan')
-      2.44±0.2ms      2.06±0.05ms     0.85  rolling.Quantile.time_quantile('Series', 1000, 'int', 1)
-         139±7ms          117±1ms     0.84  io.json.ToJSON.time_floats_with_int_idex_lines('columns')
-        90.9±2μs         76.8±1μs     0.84  groupby.GroupByMethods.time_dtype_as_field('datetime', 'size', 'transformation')
-      2.55±0.1ms      2.15±0.01ms     0.84  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'round_trip')
-        337±20μs          283±4μs     0.84  groupby.GroupByMethods.time_dtype_as_group('int', 'bfill', 'direct')
-      12.3±0.3ms           10.3ms     0.84  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'string')
-        19.5±1ms       16.3±0.1ms     0.84  join_merge.Concat.time_concat_series(0)
-        93.4±5ms       78.2±0.3ms     0.84  rolling.Methods.time_rolling('Series', 1000, 'float', 'median')
-          16.0ms       13.4±0.1ms     0.84  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('bool')
-        476±20μs          397±7μs     0.83  join_merge.Append.time_append_homogenous
-        18.7±2μs       15.6±0.2μs     0.83  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthBegin>)
-     2.43±0.07ms      2.02±0.03ms     0.83  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', None)
-      2.61±0.1ms      2.17±0.04ms     0.83  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '_', None)
-        233±10μs          194±5μs     0.83  join_merge.Concat.time_concat_empty_right(1)
-        34.4±3μs       28.6±0.6μs     0.83  offset.OffestDatetimeArithmetic.time_add_10(<SemiMonthBegin: day_of_month=15>)
-          5.59μs           4.63μs     0.83  frame_methods.XS.time_frame_xs(1)
-         330±7ms          273±3ms     0.83  io.excel.Excel.time_write_excel('xlwt')
-        23.8±2ms       19.6±0.5ms     0.83  io.hdf.HDF.time_read_hdf('fixed')
-          54.8ms           45.2ms     0.83  io.sql.ReadSQLTable.time_read_sql_table_all
-         194±5μs          160±3μs     0.82  inference.ToNumericDowncast.time_downcast('int32', 'float')
-        193±10ms          159±2ms     0.82  inference.ToNumericDowncast.time_downcast('string-int', 'float')
-      2.46±0.1ms      2.02±0.02ms     0.82  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'round_trip')
-        318±10μs          260±2μs     0.82  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<MonthEnd>)
-          20.4ms           16.7ms     0.82  io.sql.ReadSQLTable.time_read_sql_table_parse_dates
-      16.0±0.3ms       13.1±0.4ms     0.82  inference.ToNumericDowncast.time_downcast('int32', 'unsigned')
-           542ms            443ms     0.82  io.stata.Stata.time_write_stata('tw')
-      25.0±0.7ms       20.3±0.2ms     0.81  io.csv.ReadCSVThousands.time_thousands('|', ',')
-      7.33±0.2ms      5.96±0.04ms     0.81  io.hdf.HDFStoreDataFrame.time_query_store_table
-      2.74±0.1ms      2.22±0.06ms     0.81  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', None)
-      4.01±0.2ms      3.24±0.05ms     0.81  rolling.Methods.time_rolling('Series', 1000, 'int', 'std')
-        69.7±2ms         56.3±2ms     0.81  binary_ops.Ops.time_frame_multi_and(True, 'default')
-      27.6±0.6ms       22.2±0.3ms     0.81  io.hdf.HDFStoreDataFrame.time_read_store_mixed
-        19.0±2ms      15.3±0.06ms     0.81  join_merge.MergeAsof.time_on_int
-      2.78±0.3ms      2.24±0.02ms     0.81  index_object.Ops.time_divide('int')
-        30.8±1μs       24.8±0.7μs     0.81  offset.OffestDatetimeArithmetic.time_add_10(<YearBegin: month=1>)
-     1.85±0.06ms      1.49±0.04ms     0.81  io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'ymd')
-         194±6ms          156±2ms     0.80  io.hdf.HDFStoreDataFrame.time_write_store_table_dc
-         304±4ms          243±2ms     0.80  io.stata.Stata.time_write_stata('ty')
-         316±5ms          253±2ms     0.80  io.stata.Stata.time_write_stata('tq')
-      2.50±0.1ms      2.00±0.02ms     0.80  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', 'high')
-         225±7μs          180±3μs     0.80  groupby.GroupByMethods.time_dtype_as_group('int', 'first', 'direct')
-        296±10μs          236±1μs     0.80  groupby.GroupByMethods.time_dtype_as_field('float', 'ffill', 'direct')
-        670±50μs         533±10μs     0.80  groupby.GroupByMethods.time_dtype_as_field('float', 'cumsum', 'direct')
-         137±3ms          109±2ms     0.79  join_merge.Align.time_series_align_int64_index
-     1.52±0.01ms      1.20±0.01ms     0.79  index_object.Ops.time_divide('float')
-         187±2ms        147±0.9ms     0.79  io.json.ToJSON.time_float_int_lines('index')
-           667ms            521ms     0.78  io.excel.Excel.time_write_excel('openpyxl')
-       165±0.1ms          129±1ms     0.78  index_object.SetOperations.time_operation('strings', 'union')
-           329ms            255ms     0.78  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'float_with_nan')
-         114±7ms         88.1±2ms     0.77  io.sas.SAS.time_read_msgpack('sas7bdat')
-        34.2±1ms       26.4±0.3ms     0.77  io.hdf.HDF.time_write_hdf('fixed')
-        171±20μs          132±1μs     0.77  groupby.GroupByMethods.time_dtype_as_field('float', 'first', 'direct')
-        220±10ms          169±2ms     0.77  inference.ToNumericDowncast.time_downcast('string-int', 'unsigned')
-         353±8μs         269±10μs     0.76  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessMonthBegin>)
-      5.94±0.2μs      4.51±0.06μs     0.76  io.hdf.HDFStoreDataFrame.time_store_repr
-     4.10±0.06ms      3.10±0.01ms     0.75  io.hdf.HDFStoreDataFrame.time_store_info
-        50.6±3ms       38.1±0.2ms     0.75  inference.ToNumericDowncast.time_downcast('int-list', 'integer')
-         117±3ms       87.7±0.5ms     0.75  io.hdf.HDF.time_read_hdf('table')
-        59.3±4ms       44.5±0.6ms     0.75  io.hdf.HDFStoreDataFrame.time_read_store_table_mixed
-      1.22±0.2ms         914±10μs     0.75  inference.NumericInferOps.time_divide(<class 'numpy.int8'>)
-      15.9±0.5μs       11.9±0.2μs     0.75  indexing.NumericSeriesIndexing.time_iloc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>)
-      4.22±0.2ms      3.15±0.03ms     0.75  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
-     1.51±0.07ms      1.13±0.01ms     0.75  inference.NumericInferOps.time_subtract(<class 'numpy.int32'>)
-      2.96±0.2ms      2.17±0.03ms     0.73  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', None)
-      20.6±0.9ms       15.1±0.6ms     0.73  io.msgpack.MSGPack.time_read_msgpack
-     1.43±0.08ms      1.04±0.01ms     0.73  inference.ToNumericDowncast.time_downcast('datetime64', None)
-      3.08±0.2ms      2.24±0.06ms     0.73  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'round_trip')
-        740±20μs         537±20μs     0.73  groupby.GroupByMethods.time_dtype_as_field('float', 'cumprod', 'transformation')
-        313±80μs          226±5μs     0.72  inference.NumericInferOps.time_multiply(<class 'numpy.int16'>)
-      3.21±0.1ms       2.24±0.2ms     0.70  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'high')
-      3.11±0.2ms       2.15±0.1ms     0.69  rolling.Methods.time_rolling('Series', 1000, 'int', 'sum')
-      6.40±0.4ms      4.41±0.03ms     0.69  io.sas.SAS.time_read_msgpack('xport')
-          20.4ms           13.8ms     0.68  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('float')
-      5.01±0.8ms      3.31±0.01ms     0.66  inference.NumericInferOps.time_modulo(<class 'numpy.int16'>)
-       46.4±10μs       30.0±0.5μs     0.65  inference.ToNumeric.time_from_float('ignore')
-        198±20μs          126±3μs     0.64  offset.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthEnd>)
-      2.74±0.1ms       1.54±0.2ms     0.56  inference.NumericInferOps.time_subtract(<class 'numpy.float32'>)
-        53.4±3μs       29.4±0.8μs     0.55  inference.ToNumeric.time_from_float('coerce')
-        61.4±8ms      31.7±0.08ms     0.52  index_object.SetOperations.time_operation('strings', 'intersection')
-        503±30μs         258±10μs     0.51  inference.NumericInferOps.time_subtract(<class 'numpy.int16'>)
-        710±60μs         338±10μs     0.48  join_merge.JoinNonUnique.time_join_non_unique_equal
-      1.14±0.2ms          512±9μs     0.45  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'transformation')
-      2.59±0.1ms      1.14±0.03ms     0.44  inference.NumericInferOps.time_multiply(<class 'numpy.float32'>)
-      7.09±0.7ms      2.99±0.08ms     0.42  inference.ToNumeric.time_from_numeric_str('ignore')
-        532±50μs          172±2μs     0.32  inference.NumericInferOps.time_multiply(<class 'numpy.int8'>)
-      1.57±0.3ms         388±10μs     0.25  inference.NumericInferOps.time_multiply(<class 'numpy.uint32'>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@toobaz
Copy link
Member

toobaz commented May 8, 2018

I ran benchmarks, but changes are all over the place. Both positive and negative. For same kind of methods (like rolling) some go up and down. I am not sure how stable are those benchmarks. I have to leave it for few hours to run and could not assure complete idleness of the computer.

What you can do if you suspect a given result is a fluke is to rerun only that benchmarks file, as in:

asv continuous -f 1.1 master iterators --bench groupby

this should take just a couple of minutes (assuming you just ran another asv test on the same commits), during which you can probably leave your computer idle.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once perf tests are run (and asv are added if needed), would need a whatsnew note.

@mitar mitar force-pushed the iterators branch 3 times, most recently from ad99292 to 868f8db Compare May 29, 2018 05:32
@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

I think I updated everything requested in the code. I am now running benchmarks once more on an idle machine.

@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

There already seems to be an itertuples asv test in frame_methods.py.

@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

I added few more asv tests.

@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

Updated benchmarks:

       before           after         ratio
     [1c2844ac]       [980403a3]
+       140±0.9ms          370±2ms     2.65  binary_ops.Ops.time_frame_comparison(False, 'default')
+         148±1ms          376±1ms     2.53  binary_ops.Ops.time_frame_comparison(False, 1)
+          43.9ms           97.5ms     2.22  frame_methods.Iteration.time_itertuples_raw_tuples
+          57.5ms            111ms     1.93  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+           113ms            183ms     1.61  frame_methods.Iteration.time_itertuples_to_list
+           104ms            162ms     1.56  frame_methods.Iteration.time_itertuples
+           729ms            1.10s     1.50  join_merge.ConcatPanels.time_c_ordered(2, False)
+      16.6±0.3ms       24.7±0.2ms     1.48  groupby.Categories.time_groupby_extra_cat_nosort
+      19.9±0.5ms       28.7±0.4ms     1.44  groupby.Categories.time_groupby_ordered_nosort
+      20.6±0.3ms       29.5±0.3ms     1.43  groupby.Categories.time_groupby_nosort
+      31.1±0.7ms         44.0±2ms     1.42  eval.Eval.time_add('numexpr', 1)
+     6.80±0.01ms         9.43±2ms     1.39  binary_ops.Timeseries.time_series_timestamp_compare('US/Eastern')
+          1.75ms           2.37ms     1.35  frame_methods.Iteration.time_iteritems_cached
+        32.5±1ms         42.9±2ms     1.32  eval.Eval.time_mult('numexpr', 1)
+      77.4±0.4ms       98.0±0.3ms     1.27  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'median')
+      32.0±0.6ms       40.1±0.1ms     1.25  join_merge.MergeAsof.time_on_int32
+           535ms            646ms     1.21  reindex.Reindex.time_reindex_multiindex
+       135±0.6ms          161±5ms     1.19  binary_ops.Ops.time_frame_multi_and(True, 'default')
+           1.25s            1.45s     1.16  join_merge.I8Merge.time_i8merge('left')
+      48.0±0.1ms       54.6±0.1ms     1.14  sparse.ToCoo.time_sparse_series_to_coo
+          3.49ms           3.94ms     1.13  index_object.Indexing.time_get_loc_non_unique('Float')
+     11.7±0.08μs      13.1±0.07μs     1.13  offset.OnOffset.time_on_offset(<BusinessYearBegin: month=1>)
+           721ms            812ms     1.13  join_merge.ConcatPanels.time_c_ordered(2, True)
+     6.64±0.02ms       7.40±0.2ms     1.12  categoricals.Concat.time_union
+         916±9ns      1.02±0.02μs     1.11  timestamp.TimestampProperties.time_tz(None, None)
+          20.0ms           22.3ms     1.11  eval.Query.time_query_datetime_column
+        3.75±0ms         4.13±0ms     1.10  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'round_trip')
-           1.21s            1.10s     0.91  join_merge.MergeCategoricals.time_merge_object
-     1.32±0.04μs      1.19±0.02μs     0.90  timestamp.TimestampProperties.time_days_in_month(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
-     1.10±0.02μs          992±5ns     0.90  timestamp.TimestampProperties.time_is_quarter_start(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-     1.39±0.02μs      1.25±0.02μs     0.90  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
-     1.11±0.02μs         1.00±0μs     0.90  timestamp.TimestampProperties.time_is_quarter_start(None, None)
-        4.09±0ms         3.68±0ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', None)
-     4.08±0.01ms      3.67±0.01ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', None)
-           102ms           90.9ms     0.90  frame_ctor.FromDicts.time_list_of_dict
-           2.62s            2.30s     0.88  join_merge.JoinIndex.time_left_outer_join_index
-      76.5±0.7ms       66.9±0.1ms     0.87  frame_methods.Isnull.time_isnull_strngs
-     6.92±0.01ms      5.97±0.01ms     0.86  rolling.Methods.time_rolling('Series', 10, 'int', 'std')
-     6.94±0.04ms      5.90±0.01ms     0.85  rolling.Methods.time_rolling('Series', 1000, 'int', 'std')
-         116±3ms         96.9±1ms     0.83  io.csv.ReadCSVCategorical.time_convert_direct
-        26.3±1ms       21.4±0.7ms     0.81  stat_ops.FrameOps.time_op('mad', 'int', 0, False)
-        14.3±1ms       8.04±0.1ms     0.56  binary_ops.Ops.time_frame_comparison(True, 1)
-     18.4±0.06ms      5.27±0.01ms     0.29  offset.OffsetSeriesArithmetic.time_add_offset(<SemiMonthEnd: day_of_month=15>)
-     16.3±0.02ms         4.62±0ms     0.28  offset.OffsetSeriesArithmetic.time_add_offset(<BusinessDay>)
-     18.1±0.01ms         4.87±0ms     0.27  offset.OffsetSeriesArithmetic.time_add_offset(<SemiMonthBegin: day_of_month=15>)
-     17.6±0.03ms         4.47±0ms     0.25  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<SemiMonthEnd: day_of_month=15>)
-     15.5±0.04ms      3.81±0.02ms     0.25  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessDay>)
-     17.1±0.03ms      4.07±0.01ms     0.24  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<SemiMonthBegin: day_of_month=15>)
-         129±1ms      11.0±0.08ms     0.08  offset.ApplyIndex.time_apply_index(<BusinessDay>)
-         144±1ms      11.9±0.04ms     0.08  offset.ApplyIndex.time_apply_index(<SemiMonthEnd: day_of_month=15>)
-       145±0.7ms      11.0±0.07ms     0.08  offset.ApplyIndex.time_apply_index(<SemiMonthBegin: day_of_month=15>)
-        98.5±1ms          256±4μs     0.00  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

Just iteration, ran again:

· Running 16 total benchmarks (2 commits * 1 environments * 8 benchmarks)
[  0.00%] · For pandas commit hash 980403a3:
[  0.00%] ·· Building for virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[  6.25%] ··· Running frame_methods.Iteration.time_iteritems                                                                                                                             75.0ms
[ 12.50%] ··· Running frame_methods.Iteration.time_iteritems_cached                                                                                                                      2.41ms
[ 18.75%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                                                                                                     385ms
[ 25.00%] ··· Running frame_methods.Iteration.time_iterrows                                                                                                                               598ms
[ 31.25%] ··· Running frame_methods.Iteration.time_itertuples                                                                                                                             154ms
[ 37.50%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples                                                                                                                  101ms
[ 43.75%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples_to_list                                                                                                          115ms
[ 50.00%] ··· Running frame_methods.Iteration.time_itertuples_to_list                                                                                                                     187ms
[ 50.00%] · For pandas commit hash 1c2844ac:
[ 50.00%] ·· Building for virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 56.25%] ··· Running frame_methods.Iteration.time_iteritems                                                                                                                             73.9ms
[ 62.50%] ··· Running frame_methods.Iteration.time_iteritems_cached                                                                                                                      1.90ms
[ 68.75%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                                                                                                     379ms
[ 75.00%] ··· Running frame_methods.Iteration.time_iterrows                                                                                                                               600ms
[ 81.25%] ··· Running frame_methods.Iteration.time_itertuples                                                                                                                             104ms
[ 87.50%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples                                                                                                                 44.2ms
[ 93.75%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples_to_list                                                                                                         58.5ms
[100.00%] ··· Running frame_methods.Iteration.time_itertuples_to_list                                                                                                                     111ms       before           after         ratio
     [1c2844ac]       [980403a3]
+          44.2ms            101ms     2.28  frame_methods.Iteration.time_itertuples_raw_tuples
+          58.5ms            115ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+           111ms            187ms     1.69  frame_methods.Iteration.time_itertuples_to_list
+           104ms            154ms     1.48  frame_methods.Iteration.time_itertuples
+          1.90ms           2.41ms     1.27  frame_methods.Iteration.time_iteritems_cached

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@@ -16,6 +16,8 @@ New features
~~~~~~~~~~~~

- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
- Iterating over a :class:`Series` and using :meth:`DataFrame.itertuples` now create iterators without internally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24.0 performance section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback
Copy link
Contributor

jreback commented May 29, 2018

@mitar did you run this backwards? those times have increased

@mitar
Copy link
Contributor Author

mitar commented May 29, 2018

@mitar did you run this backwards? those times have increased

I ran asv continuous -f 1.1 -E virtualenv upstream/master HEAD. I think they increased indeed. I have no explanation for that. I think it might be that it is costly to keep many iterators around. It is using less memory, but it is having overhead, it seems. So my explanation is that it is traditional lazy vs. strict execution performance trade-off.

I am not sure how to determine better what is happening.

(cherry picked from commit f566b46)
(cherry picked from commit eb219ac)
* Preserve sparsity
* Preserve fill value
@TomAugspurger
Copy link
Contributor

@rok split the fix for this off to #24372

@rok
Copy link
Contributor

rok commented Dec 20, 2018

@TomAugspurger - Thanks! I've rebased to your commit, I suppose we should be good now.

@rok
Copy link
Contributor

rok commented Dec 21, 2018

I've rerun the benchmark, I assume a lot of changes are due to the fix.

       before           after         ratio
     [5d134ec1]       [2780c6f3]
                      <iterators>
+         313±5ms         613±10ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples
+        458±10ms         861±10ms     1.88  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        864±30μs      1.61±0.02ms     1.86  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
+        773±20ms       1.33±0.03s     1.71  frame_methods.Iteration.time_itertuples_to_list
+        214±20ms        366±100ms     1.71  join_merge.ConcatPanels.time_f_ordered(1, False)
+        635±20ms       1.05±0.02s     1.65  frame_methods.Iteration.time_itertuples
+      15.2±0.5ms       24.3±0.4ms     1.59  index_object.Indexing.time_get_loc_non_unique_sorted('Float')
+        634±20μs        932±200μs     1.47  groupby.GroupByMethods.time_dtype_as_field('int', 'cumsum', 'transformation')
+      1.06±0.02s        1.52±0.1s     1.43  groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'transformation')
+        859±20μs      1.23±0.07ms     1.43  frame_methods.Iteration.time_iteritems_cached
+      6.66±0.1ms       9.41±0.3ms     1.41  groupby.Categories.time_groupby_ordered_nosort
+        194±10ms         274±30ms     1.41  join_merge.ConcatPanels.time_f_ordered(1, True)
+      6.76±0.1ms      9.45±0.04ms     1.40  groupby.Categories.time_groupby_nosort
+         281±4μs         390±30μs     1.39  groupby.GroupByMethods.time_dtype_as_field('int', 'ffill', 'transformation')
+      4.41±0.4ms       6.06±0.4ms     1.37  inference.DateInferOps.time_subtract_datetimes
+        779±10μs       1.03±0.2ms     1.32  groupby.GroupByMethods.time_dtype_as_field('int', 'cumprod', 'direct')
+        553±20μs        730±100μs     1.32  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb378>, True)
+        625±10μs        807±200μs     1.29  groupby.GroupByMethods.time_dtype_as_field('int', 'cummin', 'transformation')
+         198±5μs         251±20μs     1.27  groupby.GroupByMethods.time_dtype_as_field('int', 'first', 'transformation')
+        558±10μs        708±100μs     1.27  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb0d0>, True)
+      24.6±0.8ms       31.1±0.7ms     1.27  frame_ctor.FromDicts.time_list_of_dict
+         282±3μs         356±20μs     1.26  groupby.GroupByMethods.time_dtype_as_field('int', 'ffill', 'direct')
+        524±10μs        662±100μs     1.26  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb0d0>, False)
+        557±20μs        702±100μs     1.26  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb1e0>, True)
+         196±4μs         245±20μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'first', 'direct')
+        629±20μs        786±100μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'cummin', 'direct')
+         368±5μs         460±30μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'head', 'transformation')
+      13.0±0.3ms       16.2±0.7ms     1.25  index_object.SetOperations.time_operation('date_string', 'union')
+        373±20μs         464±30μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'head', 'direct')
+         186±4μs         232±20μs     1.24  groupby.GroupByMethods.time_dtype_as_field('int', 'last', 'direct')
+        481±10ms         597±40ms     1.24  groupby.GroupByMethods.time_dtype_as_field('int', 'mad', 'direct')
+        531±20μs         658±90μs     1.24  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb2f0>, False)
+      78.1±0.7μs        96.4±10μs     1.23  groupby.GroupByMethods.time_dtype_as_group('int', 'all', 'direct')
+        44.9±2ms         55.1±4ms     1.23  frame_methods.Isnull.time_isnull_obj
+     1.23±0.01ms      1.48±0.09ms     1.20  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+      13.9±0.1ms       16.1±0.2ms     1.16  reindex.DropDuplicates.time_frame_drop_dups_bool(False)
+     9.89±0.05ms       11.5±0.7ms     1.16  groupby.AggFunctions.time_different_numpy_functions
+     2.03±0.01ms       2.34±0.2ms     1.15  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', None)
+        916±10μs       1.04±0.1ms     1.14  groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'transformation')
+        538±20μs         611±30μs     1.14  frame_methods.Isnull.time_isnull_floats_no_null
+        284±10μs         322±10μs     1.13  categoricals.Constructor.time_from_codes_all_int8
+      80.8±0.5μs         90.9±8μs     1.12  groupby.GroupByMethods.time_dtype_as_group('int', 'any', 'direct')
+       130±0.8μs          146±6μs     1.12  groupby.GroupByMethods.time_dtype_as_group('int', 'count', 'direct')
+        91.1±1μs          102±5μs     1.12  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+        79.8±1ms         89.2±2ms     1.12  frame_ctor.FromDicts.time_nested_dict_int64
+      24.9±0.5ms       27.7±0.6ms     1.11  frame_ctor.FromDicts.time_nested_dict_index_columns
+         278±4μs         308±20μs     1.11  groupby.GroupByMethods.time_dtype_as_group('int', 'ffill', 'direct')
+         458±4ms         506±10ms     1.11  indexing.NonNumericSeriesIndexing.time_getitem_list_like('datetime', 'nonunique_monotonic_inc')
+         104±2μs          115±5μs     1.10  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
+         632±2μs         696±50μs     1.10  groupby.GroupByMethods.time_dtype_as_field('int', 'cummax', 'direct')
-            812M             732M     0.90  frame_methods.Iteration.peakmem_itertuples_to_list
-            758M             677M     0.89  frame_methods.Iteration.peakmem_itertuples_raw_to_list
-        439±20μs         391±10μs     0.89  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        621±20μs          553±7μs     0.89  timeseries.DatetimeIndex.time_unique('repeated')
-        577±40μs          513±4μs     0.89  timeseries.DatetimeIndex.time_to_date('dst')
-        15.8±1ms      14.0±0.07ms     0.89  timeseries.DatetimeIndex.time_to_date('tz_naive')
-      16.9±0.9ms      14.9±0.02ms     0.88  timeseries.DatetimeIndex.time_to_time('tz_naive')
-        15.8±1ms       13.9±0.1ms     0.88  timeseries.DatetimeIndex.time_to_date('repeated')
-        980±40μs         813±80μs     0.83  groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'direct')
-     1.19±0.08ms        987±100μs     0.83  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'transformation')
-        289±10μs         234±30μs     0.81  groupby.GroupByMethods.time_dtype_as_group('int', 'min', 'direct')
-      2.45±0.2ms       1.97±0.1ms     0.81  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-        151±20μs        120±0.9μs     0.80  period.PeriodUnaryMethods.time_to_timestamp('min')
-        178±40μs          132±2μs     0.74  period.PeriodProperties.time_property('M', 'end_time')
-       56.2±20μs       39.1±0.3μs     0.70  period.Indexing.time_series_loc
-       467±100μs          320±7μs     0.69  period.Indexing.time_intersection
-        199±60μs          136±2μs     0.68  period.PeriodIndexConstructor.time_from_date_range('D')
-       96.3±10μs       60.0±0.2μs     0.62  index_object.Indexing.time_get_loc_sorted('Float')
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_read_first
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_start
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples_start
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples
-         285±6ms       1.41±0.1ms     0.00  frame_methods.Iteration.time_itertuples_start
-         295±8ms       1.35±0.1ms     0.00  frame_methods.Iteration.time_itertuples_read_first
-         261±6ms          711±8μs     0.00  frame_methods.Iteration.time_itertuples_raw_read_first
-        267±10ms         718±40μs     0.00  frame_methods.Iteration.time_itertuples_raw_start

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback
Copy link
Contributor

jreback commented Dec 23, 2018

can you merge master

@jreback jreback added this to the 0.24.0 milestone Dec 23, 2018
@jreback
Copy link
Contributor

jreback commented Dec 23, 2018

can you add whatsnew note in perf section

@jreback
Copy link
Contributor

jreback commented Dec 24, 2018

why are these 3 increasing?

+         313±5ms         613±10ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples
+        458±10ms         861±10ms     1.88  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        773±20ms       1.33±0.03s     1.71  frame_methods.Iteration.time_itertuples_to_list

@rok
Copy link
Contributor

rok commented Dec 24, 2018

Previous implementation would load the entire dataset into Python memory and the iterate over it. Now we iterate over the underlying array which reduces read speed.

I think this is in line with the previous discussion, we are trying to optimize for (initialization) memory use and initialization time rather than total execution time.

The to_list benchmarks run list(df.itertuples()) which is probably not the way one would want to use this method.

@rok
Copy link
Contributor

rok commented Dec 24, 2018

asv continuous -f 1.1  --no-only-changed upstream/master iterators -b ^frame_methods.Iteration
       before           after         ratio
     [fc7bc3f7]       [766ba8f2]
     <iterators^2>       <iterators>
         323±20ms         603±30ms    ~1.87  frame_methods.Iteration.time_itertuples_raw_tuples
        466±100ms         845±30ms    ~1.82  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        787±40ms       1.22±0.02s     1.55  frame_methods.Iteration.time_itertuples_to_list
+        660±30ms         989±30ms     1.50  frame_methods.Iteration.time_itertuples
         843±30μs      1.12±0.05ms    ~1.33  frame_methods.Iteration.time_iteritems_cached
         98.5±4ms         116±20ms    ~1.18  frame_methods.Iteration.time_iteritems_indexing
         19.7±1ms         20.8±1ms     1.06  frame_methods.Iteration.time_iteritems
               64               64     1.00  frame_methods.Iteration.mem_itertuples_raw_start
               8M               8M     1.00  frame_methods.Iteration.mem_itertuples_raw_to_list
              136              136     1.00  frame_methods.Iteration.mem_itertuples_read_first
               56               56     1.00  frame_methods.Iteration.mem_itertuples_start
               8M               8M     1.00  frame_methods.Iteration.mem_itertuples_to_list
         283±10ms         278±10ms     0.98  frame_methods.Iteration.time_iterrows
-            812M             732M     0.90  frame_methods.Iteration.peakmem_itertuples_to_list
-            757M             677M     0.89  frame_methods.Iteration.peakmem_itertuples_raw_to_list
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_read_first
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_start
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples_start
-        294±10ms      1.34±0.07ms     0.00  frame_methods.Iteration.time_itertuples_read_first
-        280±20ms      1.18±0.02ms     0.00  frame_methods.Iteration.time_itertuples_start
-       267±100ms         697±30μs     0.00  frame_methods.Iteration.time_itertuples_raw_read_first
-        268±80ms         692±30μs     0.00  frame_methods.Iteration.time_itertuples_raw_start

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback jreback merged commit b2b877c into pandas-dev:master Dec 25, 2018
@jreback
Copy link
Contributor

jreback commented Dec 25, 2018

thanks @mitar and @rok

@mitar mitar deleted the iterators branch December 25, 2018 18:10
@rok
Copy link
Contributor

rok commented Dec 25, 2018

Thanks @jreback, @mitar and @TomAugspurger ! :)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make itertuples really an iterator/generator in implementation, not just return type
8 participants