ENH: Implemented lazy iteration #20796

mitar · 2018-04-23T14:52:30Z

Fixes GH20783.

closes Make itertuples really an iterator/generator in implementation, not just return type #20783
no tests needed because just internal implementation change
passes git diff upstream/master -u -- "*.py" | flake8 --diff
- there are some failures on existing code
whatsnew entry

TomAugspurger · 2018-04-23T18:14:25Z

Looks like the 2.7 failures are relevant.

This would need a release note.

How's the performance when actually iterating? e.g.

df = pd.DataFrame({"A": np.arange(100000)})
list(iter(df.itertuples()))

mitar · 2018-04-23T18:19:39Z

Looks like the 2.7 failures are relevant.

Yea, I think I just managed to fix those. I had to import map from pandas.compat. Testing it locally and I will push once tests finish.

This would need a release note.

Where does this go? Any instructions anywhere?

How's the performance when actually iterating? e.g.

Memory wise: great. It just has to have one row at a time in memory. CPU wise: it looks like around 2-3x slower than tolist(). So list(series) vs. series.tolist(). I think this is good. So semantics between list(series) and series.tolist() is preserved, but if you want an optimized version you should call tolist() so that it constructs a list in C, and not in Python. Not all uses of iteration is to construct a list.

Oh, and of course. This version immediately starts returning values while before you had to wait for everything to finish. So latency is much lower, while overall time is around 2-3x.

codecov · 2018-04-23T18:38:02Z

Codecov Report

Merging #20796 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20796      +/-   ##
==========================================
+ Coverage    92.3%    92.3%   +<.01%     
==========================================
  Files         163      163              
  Lines       51943    51947       +4     
==========================================
+ Hits        47946    47951       +5     
+ Misses       3997     3996       -1

Flag	Coverage Δ
#multiple	`90.71% <100%> (ø)`	⬆️
#single	`42.99% <55.55%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`96.91% <100%> (ø)`	⬆️
pandas/core/base.py	`97.68% <100%> (+0.02%)`	⬆️
pandas/util/testing.py	`87.84% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7bc3f...766ba8f. Read the comment docs.

mitar · 2018-04-23T19:08:38Z

Correction. I was testing before just iterating directly on Series. It seems making tuples has some overhead and underlying changes do not show through. I made more thorough performance evaluation now, averaging over 10 runs and there is not much impact on overall time for regular itertuples, but getting first element out is much faster.

I used this to test both how long it takes to create a list of all tuples, how long it takes to get the first tuple, and how long it takes to get only simple tuples.

import numpy as np
import pandas as pd
import time

print(pd.__path__)

def perf(f):
    start = time.perf_counter()
    f()
    end = time.perf_counter()
    return end - start

def a(): 
    list(iter(df.itertuples()))

def b(): 
    next(iter(df.itertuples()))

def c(): 
    list(iter(df.itertuples(index=False, name=None)))

df = pd.DataFrame({"A": np.arange(10000000)})
print(sum(perf(a) for i in range(10)) / 10)
print(sum(perf(b) for i in range(10)) / 10)
print(sum(perf(c) for i in range(10)) / 10)

Old version:

11.947458256299797
0.7665189374000875
1.5786312711999018

New version:

11.371349476900013
0.0006540167998537072
1.8471969083000659

mitar · 2018-04-23T19:22:06Z

It is interesting. It seems this even makes the regular (Series version) slightly faster, while simple-tuple version is slower a bit. I could repeat these results even after trying a bit more.

mroeschke · 2018-04-23T20:27:33Z

If you want to measure performance, we also have an asv performance benchmark for this method here:

pandas/asv_bench/benchmarks/frame_methods.py

Lines 106 to 108 in add3fbf

    
           def time_itertuples(self): 
        
               for row in self.df2.itertuples(): 
        
                   pass

Here's a guide on how to run the asv benchmark to evaluate performance changes.

mroeschke · 2018-04-23T20:31:34Z

pandas/core/frame.py

            fields.append("Index")

        # use integer indexing because of possible duplicate column names
-        arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
+        iterators.extend(self.iloc[:, k] for k in range(len(self.columns)))


I believe iterators can also be evaluated lazily by using itertools.chain. Something along the lines of:

iterators = itertools.chain([self.index], (self.iloc[:, k] for k in range(len(self.columns)))

Yea, but I do not think this is necessary, because those are per-column and have to be done immediately afterwards even for the first row. So you move forcing execution from one line in a function to another line in the same function. So later on we do zip(*iterators) which forces iterators to be made into a list. If we would manually zip things, then we could go without that. But passing *iterators forces the list. It is also better that any exception evaluating columns is thrown here and not later inside that other try/catch.

So I do not think this is necessary.

mitar · 2018-04-23T21:42:39Z

Tests are failing for some unrelated reason.

mitar · 2018-04-24T18:12:51Z

I ran benchmarks, but changes are all over the place. Both positive and negative. For same kind of methods (like rolling) some go up and down. I am not sure how stable are those benchmarks. I have to leave it for few hours to run and could not assure complete idleness of the computer. Also, absolute times are just few ms for many of them. I think this is hard to measure well.

     [0ae7e909]       [f2fbb39a]
+        79.6±1ms         214±10ms     2.69  binary_ops.Ops.time_frame_comparison(False, 'default')
+     1.11±0.02ms       2.97±0.7ms     2.68  inference.NumericInferOps.time_divide(<class 'numpy.uint32'>)
+         169±6μs         440±50μs     2.60  inference.NumericInferOps.time_subtract(<class 'numpy.int8'>)
+        83.6±4ms        203±0.7ms     2.43  binary_ops.Ops.time_frame_comparison(False, 1)
+      5.62±0.1ms       13.6±0.1ms     2.43  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'min')
+        190±30μs        459±100μs     2.41  inference.NumericInferOps.time_subtract(<class 'numpy.uint8'>)
+     4.18±0.05ms       9.87±0.2ms     2.36  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'count')
+      5.60±0.1ms       13.1±0.3ms     2.35  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'max')
+         231±6μs         542±60μs     2.34  inference.NumericInferOps.time_add(<class 'numpy.int16'>)
+     5.15±0.09ms       12.0±0.4ms     2.33  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     5.35±0.08ms       12.4±0.2ms     2.31  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+         498±9μs       1.13±0.1ms     2.26  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'direct')
+        679±10μs       1.48±0.2ms     2.19  groupby.GroupByMethods.time_dtype_as_group('float', 'pct_change', 'transformation')
+     4.31±0.06ms       9.10±0.1ms     2.11  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'mean')
+        592±20μs      1.22±0.05ms     2.06  groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'direct')
+      4.08±0.3ms      8.31±0.02ms     2.04  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'sum')
+     5.07±0.04ms       10.1±0.2ms     1.99  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     5.44±0.04ms      10.8±0.09ms     1.98  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'std')
+      3.89±0.4ms       7.69±0.3ms     1.98  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'sum')
+        715±10μs       1.38±0.1ms     1.93  groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'direct')
+      5.63±0.1ms       10.8±0.2ms     1.91  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'min')
+      4.21±0.1ms       8.04±0.2ms     1.91  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'count')
+      5.56±0.1ms       10.4±0.2ms     1.86  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'max')
+      5.22±0.1ms       9.72±0.4ms     1.86  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'kurt')
+         124±1ms          229±9ms     1.85  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'median')
+      5.52±0.1ms       9.97±0.3ms     1.81  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'std')
+     4.42±0.09ms       7.86±0.2ms     1.78  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'mean')
+         106±2ms          182±3ms     1.72  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'median')
+      40.5±0.5ms       66.4±0.9ms     1.64  frame_methods.Iteration.time_itertuples
+         323±3ms         515±20ms     1.60  sparse.SparseDataFrameConstructor.time_from_scipy
+     1.49±0.03μs       2.35±0.1μs     1.58  timestamp.TimestampConstruction.time_parse_iso8601_no_tz
+         666±5μs      1.01±0.06ms     1.52  groupby.GroupByMethods.time_dtype_as_field('int', 'pct_change', 'direct')
+         378±8μs         569±40μs     1.51  groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'transformation')
+      26.7±0.3ms       39.5±0.6ms     1.48  sparse.ToCoo.time_sparse_series_to_coo
+     3.82±0.08ms         5.65±1ms     1.48  inference.NumericInferOps.time_modulo(<class 'numpy.int32'>)
+      2.82±0.3ms       4.15±0.4ms     1.47  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+           2.75s            4.04s     1.47  sparse.SparseDataFrameConstructor.time_constructor
+        388±10μs         566±30μs     1.46  groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'direct')
+      16.0±0.3μs         23.3±3μs     1.45  timestamp.TimestampProperties.time_weekday_name(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+      5.98±0.4ms       8.65±0.2ms     1.45  groupby.Categories.time_groupby_extra_cat_nosort
+      5.43±0.2ms       7.82±0.2ms     1.44  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+      2.81±0.3ms       4.04±0.4ms     1.44  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'max')
+      2.81±0.3ms       4.01±0.4ms     1.43  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'min')
+         156±4ms          222±5ms     1.42  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'numpy.int64'>)
+         113±2ms          161±3ms     1.42  sparse.SparseDataFrameConstructor.time_from_dict
+     2.18±0.03ms       3.10±0.2ms     1.42  rolling.Methods.time_rolling('Series', 10, 'int', 'mean')
+          10.8μs           15.2μs     1.41  ctors.SeriesDtypesConstructors.time_dtindex_from_series
+     3.02±0.02ms       4.24±0.2ms     1.40  rolling.Methods.time_rolling('Series', 10, 'int', 'min')
+         105±1ms          148±4ms     1.40  stat_ops.Correlation.time_corr('spearman')
+     2.79±0.08ms       3.90±0.8ms     1.39  stat_ops.FrameMultiIndexOps.time_op(0, 'mean')
+         432±5μs         602±50μs     1.39  indexing.MultiIndexing.time_frame_ix
+      8.60±0.8ms       11.9±0.8ms     1.38  groupby.Categories.time_groupby_nosort
+      25.8±0.3ms       35.6±0.4ms     1.38  sparse.SparseArrayConstructor.time_sparse_array(0.01, 0, <class 'object'>)
+      3.19±0.1ms       4.37±0.3ms     1.37  rolling.Methods.time_rolling('Series', 1000, 'float', 'kurt')
+      13.0±0.3ms       17.8±0.5ms     1.37  groupby.Nth.time_frame_nth('datetime')
+         157±3ms          214±3ms     1.36  sparse.SparseSeriesToFrame.time_series_to_frame
+         416±6μs         564±20μs     1.36  groupby.GroupByMethods.time_dtype_as_group('float', 'median', 'direct')
+     3.68±0.06ms       4.99±0.6ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'mean')
+         109±1μs         147±20μs     1.36  groupby.GroupByMethods.time_dtype_as_group('datetime', 'count', 'transformation')
+     2.98±0.06ms       4.00±0.1ms     1.34  stat_ops.Correlation.time_corr('pearson')
+        419±10μs         560±20μs     1.34  groupby.GroupByMethods.time_dtype_as_group('float', 'median', 'transformation')
+         136±2μs         182±20μs     1.33  timestamp.TimestampConstruction.time_parse_dateutil
+     3.33±0.04ms       4.44±0.2ms     1.33  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'count')
+      24.6±0.5ms       32.7±0.5ms     1.33  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'object'>)
+     3.07±0.03ms       4.06±0.2ms     1.32  rolling.Methods.time_rolling('Series', 10, 'int', 'max')
+        1.06±0ms       1.40±0.1ms     1.32  inference.NumericInferOps.time_divide(<class 'numpy.uint8'>)
+         988±8μs      1.30±0.07ms     1.32  groupby.GroupByMethods.time_dtype_as_group('float', 'value_counts', 'transformation')
+     5.42±0.07ms       7.12±0.3ms     1.31  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'min')
+         234±2ms          308±4ms     1.31  stat_ops.Correlation.time_corr('kendall')
+     1.87±0.04ms       2.46±0.1ms     1.31  timeseries.ToDatetimeCache.time_dup_string_dates(False)
+     3.71±0.08ms      4.84±0.02ms     1.31  sparse.SparseArrayConstructor.time_sparse_array(0.01, nan, <class 'numpy.float64'>)
+        91.7±5μs          120±9μs     1.30  timeseries.SortIndex.time_sort_index(True)
+      2.71±0.3ms       3.53±0.4ms     1.30  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     3.95±0.07ms       5.12±0.2ms     1.29  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'mean')
+     2.13±0.02ms      2.76±0.06ms     1.29  rolling.Methods.time_rolling('Series', 10, 'float', 'mean')
+     4.99±0.06ms       6.44±0.5ms     1.29  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+         318±1ms         410±40ms     1.29  sparse.Arithmetic.time_make_union(0.1, nan)
+     1.15±0.01ms      1.48±0.01ms     1.29  index_object.Ops.time_subtract('float')
+        576±10μs         736±90μs     1.28  frame_methods.Quantile.time_frame_quantile(0)
+      9.37±0.2μs       12.0±0.4μs     1.28  timestamp.TimestampProperties.time_weekday_name(None, 'B')
+     2.84±0.04ms       3.62±0.3ms     1.27  timeseries.ToDatetimeCache.time_dup_string_dates_and_format(True)
+      3.92±0.4ms       4.98±0.3ms     1.27  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'count')
+      20.8±0.6ms       26.4±0.8ms     1.27  frame_methods.Repr.time_html_repr_trunc_mi
+      12.1±0.6ms       15.2±0.1ms     1.26  groupby.Categories.time_groupby_ordered_nosort
+         521±3μs          658±2μs     1.26  frame_methods.Iteration.time_iteritems_cached
+     3.63±0.06ms       4.56±0.4ms     1.26  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'sum')
+         176±3μs         221±20μs     1.26  groupby.GroupByMethods.time_dtype_as_group('float', 'first', 'direct')
+     1.97±0.04ms       2.46±0.1ms     1.25  rolling.Methods.time_rolling('Series', 10, 'float', 'sum')
+     3.01±0.06ms       3.76±0.2ms     1.25  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(True)
+      9.35±0.3μs       11.7±0.2μs     1.25  timestamp.TimestampProperties.time_weekday_name(None, None)
+     5.49±0.07ms       6.81±0.4ms     1.24  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'std')
+     3.49±0.08ms       4.32±0.4ms     1.24  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'count')
+         169±3ms          209±2ms     1.24  groupby.MultiColumn.time_lambda_sum
+      82.9±0.7μs          102±4μs     1.24  indexing.IntervalIndexing.time_loc_scalar
+         113±2ms          139±3ms     1.23  gil.ParallelFactorize.time_loop(8)
+     5.56±0.08ms       6.82±0.5ms     1.23  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'min')
+        984±10μs      1.21±0.04ms     1.23  inference.NumericInferOps.time_divide(<class 'numpy.int16'>)
+           1.10s            1.34s     1.22  groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'transformation')
+     4.42±0.09ms       5.39±0.3ms     1.22  timeseries.ToDatetimeCache.time_dup_seconds_and_unit(True)
+     4.85±0.06ms       5.91±0.3ms     1.22  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'max')
+           1.13s            1.37s     1.22  groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'direct')
+     3.34±0.04ms       4.06±0.2ms     1.22  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'count')
+         201±3ms          243±6ms     1.21  frame_methods.SortValues.time_frame_sort_values(False)
+     3.92±0.03ms       4.75±0.1ms     1.21  timeseries.ToDatetimeISO8601.time_iso8601_nosep
+         182±2ms          221±5ms     1.21  panel_ctor.DifferentIndexes.time_from_dict
+         147±4ms         178±10ms     1.21  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
+      78.0±0.8ms         93.6±3ms     1.20  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'median')
+     1.75±0.03μs      2.10±0.03μs     1.20  timestamp.TimestampConstruction.time_parse_today
+         237±6μs         284±30μs     1.20  reindex.Fillna.time_float_32('backfill')
+      4.97±0.1ms       5.93±0.3ms     1.19  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'std')
+           534ms            637ms     1.19  panel_methods.PanelMethods.time_pct_change('major')
+        89.9±1ms          107±8ms     1.19  frame_methods.Repr.time_frame_repr_wide
+         155±2μs          185±8μs     1.19  groupby.GroupByMethods.time_dtype_as_group('object', 'last', 'transformation')
+     3.76±0.08ms       4.48±0.1ms     1.19  inference.NumericInferOps.time_add(<class 'numpy.float64'>)
+     5.74±0.03ms       6.83±0.4ms     1.19  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'max')
+     2.59±0.02ms       3.06±0.1ms     1.18  timeseries.ResampleSeries.time_resample('datetime', '5min', 'ohlc')
+         383±3μs         452±20μs     1.18  indexing.MultiIndexing.time_series_ix
+         118±3μs          138±4μs     1.18  panel_methods.PanelMethods.time_shift('major')
+      5.97±0.1ms       7.02±0.2ms     1.18  groupby.Apply.time_scalar_function_single_col
+         181±3μs         212±10μs     1.17  groupby.GroupByMethods.time_dtype_as_group('datetime', 'bfill', 'transformation')
+         131±2μs          153±3μs     1.17  indexing.IntervalIndexing.time_loc_list
+        51.3±1ms       60.0±0.9ms     1.17  gil.ParallelGroupbyMethods.time_loop(4, 'max')
+      35.0±0.2ms         40.8±3ms     1.17  rolling.Methods.time_rolling('Series', 10, 'float', 'median')
+      35.3±0.4ms         41.1±1ms     1.17  rolling.Methods.time_rolling('Series', 10, 'int', 'median')
+      20.9±0.3ms       24.2±0.6ms     1.16  frame_methods.Repr.time_repr_tall
+         347±3μs          403±6μs     1.16  groupby.GroupByMethods.time_dtype_as_group('float', 'head', 'transformation')
+         120±2ms          139±6ms     1.16  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'median')
+     5.59±0.07ms       6.47±0.3ms     1.16  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'std')
+      10.4±0.2ms       12.0±0.2ms     1.16  inference.DateInferOps.time_add_timedeltas
+         381±8ns         440±10ns     1.16  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+     39.3±0.08ms         45.4±3ms     1.16  sparse.Arithmetic.time_divide(0.1, 0)
+     2.25±0.02ms       2.60±0.1ms     1.16  timeseries.ResampleSeries.time_resample('datetime', '5min', 'mean')
+      11.4±0.3ms       13.1±0.2ms     1.15  algorithms.Hashing.time_series_string
+         155±1μs         179±10μs     1.15  groupby.GroupByMethods.time_dtype_as_group('object', 'last', 'direct')
+      19.2±0.3μs       22.1±0.4μs     1.15  timestamp.TimestampProperties.time_is_quarter_start(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+          17.0ms           19.6ms     1.15  index_object.Indexing.time_boolean_series('String')
+      4.94±0.1ms       5.66±0.1ms     1.15  inference.NumericInferOps.time_modulo(<class 'numpy.float64'>)
+         480±4ms          550±9ms     1.15  groupby.GroupByMethods.time_dtype_as_field('float', 'mad', 'transformation')
+         152±3μs          174±3μs     1.14  indexing.IntervalIndexing.time_getitem_list
+         258±4ms          295±2ms     1.14  reshape.WideToLong.time_wide_to_long_big
+      34.7±0.8ms       39.8±0.7ms     1.14  sparse.Arithmetic.time_divide(0.01, nan)
+     3.05±0.06ms      3.49±0.06ms     1.14  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+         213±2μs         244±20μs     1.14  groupby.GroupByMethods.time_dtype_as_field('float', 'cumcount', 'transformation')
+           1.02s            1.16s     1.14  groupby.Apply.time_copy_function_multi_col
+     1.86±0.05ms      2.13±0.04ms     1.14  timeseries.ToDatetimeCache.time_dup_string_dates_and_format(False)
+      4.87±0.4ms       5.55±0.2ms     1.14  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'max')
+       105±0.9ms         120±10ms     1.14  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'median')
+         169±2μs         192±10μs     1.14  groupby.GroupByMethods.time_dtype_as_field('float', 'median', 'direct')
+      83.0±0.8μs         94.6±3μs     1.14  period.PeriodProperties.time_property('M', 'end_time')
+        394±10ns          449±3ns     1.14  timestamp.TimestampProperties.time_week(None, None)
+     6.91±0.09μs       7.88±0.2μs     1.14  offset.OnOffset.time_on_offset(<MonthBegin>)
+     4.99±0.06μs       5.68±0.2μs     1.14  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime')
+         304±5ns          347±7ns     1.14  timestamp.TimestampProperties.time_tz(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+     2.77±0.06ms      3.15±0.05ms     1.14  sparse.Arithmetic.time_intersect(0.1, 0)
+     4.03±0.05ms       4.58±0.3ms     1.14  groupby.GroupManyLabels.time_sum(1000)
+         381±3ms         434±10ms     1.14  groupby.Groups.time_series_groups('object_large')
+      32.3±0.4ms         36.7±1ms     1.14  sparse.Arithmetic.time_divide(0.1, nan)
+     3.50±0.04ms       3.96±0.1ms     1.13  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'count')
+     1.55±0.01μs      1.75±0.01μs     1.13  timestamp.TimestampConstruction.time_fromtimestamp
+         399±6ms          450±2ms     1.13  groupby.Apply.time_copy_overhead_single_col
+         128±1μs          145±8μs     1.13  groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'direct')
+      14.7±0.2ms       16.5±0.7ms     1.13  frame_methods.Repr.time_html_repr_trunc_si
+         132±2μs          149±4μs     1.13  groupby.GroupByMethods.time_dtype_as_field('float', 'min', 'transformation')
+     3.94±0.08ms      4.44±0.08ms     1.12  timeseries.ToDatetimeISO8601.time_iso8601_format
+         387±2ns          435±6ns     1.12  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
+     2.23±0.04ms      2.51±0.05ms     1.12  algorithms.Hashing.time_series_dates
+           623ms            698ms     1.12  panel_methods.PanelMethods.time_pct_change('items')
+        84.0±2μs         94.2±4μs     1.12  groupby.GroupByMethods.time_dtype_as_group('object', 'any', 'direct')
+      42.2±0.9μs         47.3±1μs     1.12  timestamp.TimestampOps.time_replace_tz('US/Eastern')
+          5.61μs           6.28μs     1.12  index_object.Indexing.time_slice_step('String')
+         395±9ns         440±10ns     1.12  timestamp.TimestampProperties.time_week(None, 'B')
+      13.7±0.4ms       15.2±0.3ms     1.11  groupby.MultiColumn.time_col_select_numpy_sum
+         126±2μs          140±4μs     1.11  groupby.GroupByMethods.time_dtype_as_field('float', 'mean', 'direct')
+     1.00±0.02μs      1.12±0.02μs     1.11  index_object.Range.time_min_trivial
+      72.5±0.4μs         80.7±2μs     1.11  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('string')
+         315±2ns         350±10ns     1.11  timestamp.TimestampProperties.time_tz(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+     2.24±0.03ms       2.49±0.1ms     1.11  algorithms.Hashing.time_series_timedeltas
+      7.96±0.1ms       8.85±0.1ms     1.11  inference.NumericInferOps.time_modulo(<class 'numpy.uint64'>)
+         274±5μs          305±9μs     1.11  groupby.GroupByMethods.time_dtype_as_group('float', 'ffill', 'direct')
+     5.80±0.03μs       6.45±0.1μs     1.11  offset.OnOffset.time_on_offset(<SemiMonthBegin: day_of_month=15>)
+      9.11±0.1ms       10.1±0.5ms     1.11  stat_ops.FrameOps.time_op('var', 'int', 1, True)
+      16.7±0.2ms       18.5±0.4ms     1.11  groupby.MultiColumn.time_cython_sum
+        83.0±2μs         92.0±2μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'size', 'direct')
+         174±3μs          193±5μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'last', 'direct')
+      5.40±0.1ms      5.98±0.07ms     1.11  groupby.CountMultiDtype.time_multi_count
+        750±10ns         829±20ns     1.11  timestamp.TimestampOps.time_to_pydatetime('US/Eastern')
+        87.2±2μs         96.4±3μs     1.11  groupby.GroupByMethods.time_dtype_as_group('datetime', 'any', 'transformation')
+      13.3±0.3μs       14.7±0.2μs     1.11  timedelta.TimedeltaConstructor.time_from_components
+      4.76±0.1ms       5.25±0.2ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+         175±2μs         193±20μs     1.10  groupby.GroupByMethods.time_dtype_as_group('float', 'min', 'transformation')
-        64.0±2ms       58.1±0.6ms     0.91  io.excel.Excel.time_read_excel('xlwt')
-     1.06±0.03ms         961±10μs     0.91  groupby.GroupByMethods.time_dtype_as_group('int', 'value_counts', 'transformation')
-         146±1ms          132±1ms     0.91  io.excel.Excel.time_read_excel('openpyxl')
-      2.39±0.2ms      2.17±0.02ms     0.91  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', 'round_trip')
-        169±10ms          154±2ms     0.91  io.json.ToJSON.time_delta_int_tstamp_lines('columns')
-     10.6±0.06ms       9.61±0.2ms     0.91  io.hdf.HDFStoreDataFrame.time_query_store_table_wide
-         296±4μs          268±3μs     0.91  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<QuarterEnd: startingMonth=3>)
-       360±0.9ms          325±2ms     0.91  io.stata.Stata.time_read_stata('ty')
-      28.9±0.4μs       26.2±0.3μs     0.90  offset.OffestDatetimeArithmetic.time_subtract(<BusinessDay>)
-     2.20±0.09ms      1.99±0.04ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'high')
-         176±6ms        159±0.9ms     0.90  inference.ToNumericDowncast.time_downcast('string-int', None)
-     1.26±0.01ms      1.13±0.01ms     0.90  index_object.Ops.time_add('float')
-        12.0±1ms      10.9±0.04ms     0.90  index_object.Ops.time_modulo('int')
-      18.5±0.6μs       16.6±0.5μs     0.90  offset.OffestDatetimeArithmetic.time_apply(<BusinessQuarterBegin: startingMonth=3>)
-         272±9μs          244±4μs     0.90  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<MonthBegin>)
-        632±30μs          569±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'transformation')
-         130±5ms          117±2ms     0.90  io.json.ToJSON.time_floats_with_dt_index_lines('split')
-         184±8ms          166±3ms     0.90  join_merge.ConcatPanels.time_f_ordered(2, False)
-          6.44μs           5.79μs     0.90  index_object.Indexing.time_slice_step('Float')
-         607±3ms          546±2ms     0.90  join_merge.ConcatPanels.time_c_ordered(2, True)
-         172±4ms          154±1ms     0.90  inference.ToNumericDowncast.time_downcast('string-nint', 'unsigned')
-      44.8±0.8ms       40.2±0.2ms     0.90  io.csv.ReadCSVCategorical.time_convert_direct
-         255±8ms          229±2ms     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-        26.4±1μs       23.6±0.3μs     0.89  offset.OffestDatetimeArithmetic.time_add_10(<MonthEnd>)
-         107±2ms         95.9±2ms     0.89  io.json.ToJSON.time_delta_int_tstamp('columns')
-           823ms            736ms     0.89  join_merge.MergeCategoricals.time_merge_object
-      30.2±0.3ms       27.0±0.3ms     0.89  io.sql.SQL.time_read_sql_query('sqlite')
-        394±10μs          351±2μs     0.89  groupby.GroupByMethods.time_dtype_as_field('object', 'ffill', 'transformation')
-           144ms            128ms     0.89  index_object.IndexAppend.time_append_obj_list
-      29.7±0.6μs       26.5±0.5μs     0.89  offset.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterBegin: startingMonth=3>)
-         171±6ms          152±2ms     0.89  io.json.ToJSON.time_float_int_str_lines('columns')
-      33.7±0.8ms       29.9±0.3ms     0.89  inference.ToNumericDowncast.time_downcast('datetime64', 'integer')
-      46.2±0.2ms         41.0±1ms     0.89  io.hdf.HDFStoreDataFrame.time_write_store_table
-         287±1ms          255±3ms     0.89  io.stata.Stata.time_write_stata('td')
-        371±10ms        329±0.4ms     0.89  io.excel.Excel.time_write_excel('xlsxwriter')
-         145±8ms          129±2ms     0.89  io.csv.ToCSV.time_frame('wide')
-        18.5±1μs       16.3±0.2μs     0.88  offset.OffestDatetimeArithmetic.time_apply(<BusinessQuarterEnd: startingMonth=3>)
-       136±0.9ms        120±0.6ms     0.88  io.excel.Excel.time_read_excel('xlsxwriter')
-        20.2±1ms      17.9±0.07ms     0.88  io.csv.ReadCSVThousands.time_thousands('|', None)
-         184±7ms        163±0.7ms     0.88  inference.ToNumericDowncast.time_downcast('string-nint', 'signed')
-           113ms            100ms     0.88  strings.Repeat.time_repeat('int')
-        64.3±1ms       56.8±0.1ms     0.88  io.csv.ReadCSVCategorical.time_convert_post
-        327±30μs          288±2μs     0.88  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessMonthEnd>)
-          11.4ms      10.0±0.02ms     0.88  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'float')
-        20.2±1ms       17.7±0.3ms     0.88  reshape.PivotTable.time_pivot_table
-     1.75±0.06ms      1.53±0.04ms     0.87  groupby.Categories.time_groupby_sort
-         135±7ms        118±0.9ms     0.87  io.json.ToJSON.time_floats_with_int_idex_lines('index')
-     2.46±0.04ms      2.14±0.08ms     0.87  groupby.Datelike.time_sum('date_range')
-        32.9±1ms       28.7±0.8ms     0.87  io.csv.ToCSV.time_frame('mixed')
-      1.44±0.2μs      1.26±0.01μs     0.87  period.PeriodProperties.time_property('M', 'qyear')
-         373±6ms          323±1ms     0.87  io.json.ReadJSONLines.time_read_json_lines('datetime')
-      1.78±0.1ms      1.55±0.03ms     0.87  io.csv.ReadUint64Integers.time_read_uint64_neg_values
-     1.43±0.03μs      1.24±0.01μs     0.87  period.PeriodProperties.time_property('M', 'is_leap_year')
-      18.5±0.9μs       16.0±0.4μs     0.86  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthEnd>)
-           1.83s            1.58s     0.86  groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'transformation')
-        960±50μs          829±5μs     0.86  groupby.GroupByMethods.time_dtype_as_field('float', 'rank', 'transformation')
-         174±6ms          150±3ms     0.86  io.json.ToJSON.time_float_int_str_lines('index')
-           290ms            250ms     0.86  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'string')
-      28.8±0.9μs       24.8±0.5μs     0.86  offset.OffestDatetimeArithmetic.time_subtract(<QuarterEnd: startingMonth=3>)
-     3.40±0.01ms      2.92±0.05ms     0.86  io.hdf.HDFStoreDataFrame.time_read_store_table
-      2.34±0.1ms      2.00±0.04ms     0.85  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', 'high')
-      13.2±0.3ms      11.3±0.02ms     0.85  index_object.SetOperations.time_operation('int', 'symmetric_difference')
-      4.41±0.2ms      3.77±0.06ms     0.85  groupby.Datelike.time_sum('date_range_tz')
-         196±4μs          168±5μs     0.85  groupby.GroupByMethods.time_dtype_as_group('int', 'last', 'transformation')
-         133±5ms          113±1ms     0.85  io.json.ReadJSON.time_read_json('split', 'int')
-      1.87±0.1ms      1.60±0.01ms     0.85  io.csv.ReadUint64Integers.time_read_uint64_na_values
-     1.32±0.02ms      1.13±0.02ms     0.85  inference.NumericInferOps.time_add(<class 'numpy.int32'>)
-        305±10ms          260±4ms     0.85  io.stata.Stata.time_write_stata('th')
-        79.3±1μs         67.6±2μs     0.85  indexing.NumericSeriesIndexing.time_iloc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>)
-        95.7±3ms         81.5±2ms     0.85  io.hdf.HDF.time_write_hdf('table')
-          3.93ms           3.34ms     0.85  index_object.Indexing.time_boolean_series('Float')
-          11.8ms           10.0ms     0.85  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'float_with_nan')
-          15.7ms       13.3±0.1ms     0.85  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('float_with_nan')
-      2.44±0.2ms      2.06±0.05ms     0.85  rolling.Quantile.time_quantile('Series', 1000, 'int', 1)
-         139±7ms          117±1ms     0.84  io.json.ToJSON.time_floats_with_int_idex_lines('columns')
-        90.9±2μs         76.8±1μs     0.84  groupby.GroupByMethods.time_dtype_as_field('datetime', 'size', 'transformation')
-      2.55±0.1ms      2.15±0.01ms     0.84  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'round_trip')
-        337±20μs          283±4μs     0.84  groupby.GroupByMethods.time_dtype_as_group('int', 'bfill', 'direct')
-      12.3±0.3ms           10.3ms     0.84  io.sql.WriteSQLDtypes.time_read_sql_query_select_column('sqlalchemy', 'string')
-        19.5±1ms       16.3±0.1ms     0.84  join_merge.Concat.time_concat_series(0)
-        93.4±5ms       78.2±0.3ms     0.84  rolling.Methods.time_rolling('Series', 1000, 'float', 'median')
-          16.0ms       13.4±0.1ms     0.84  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('bool')
-        476±20μs          397±7μs     0.83  join_merge.Append.time_append_homogenous
-        18.7±2μs       15.6±0.2μs     0.83  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthBegin>)
-     2.43±0.07ms      2.02±0.03ms     0.83  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', None)
-      2.61±0.1ms      2.17±0.04ms     0.83  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '_', None)
-        233±10μs          194±5μs     0.83  join_merge.Concat.time_concat_empty_right(1)
-        34.4±3μs       28.6±0.6μs     0.83  offset.OffestDatetimeArithmetic.time_add_10(<SemiMonthBegin: day_of_month=15>)
-          5.59μs           4.63μs     0.83  frame_methods.XS.time_frame_xs(1)
-         330±7ms          273±3ms     0.83  io.excel.Excel.time_write_excel('xlwt')
-        23.8±2ms       19.6±0.5ms     0.83  io.hdf.HDF.time_read_hdf('fixed')
-          54.8ms           45.2ms     0.83  io.sql.ReadSQLTable.time_read_sql_table_all
-         194±5μs          160±3μs     0.82  inference.ToNumericDowncast.time_downcast('int32', 'float')
-        193±10ms          159±2ms     0.82  inference.ToNumericDowncast.time_downcast('string-int', 'float')
-      2.46±0.1ms      2.02±0.02ms     0.82  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'round_trip')
-        318±10μs          260±2μs     0.82  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<MonthEnd>)
-          20.4ms           16.7ms     0.82  io.sql.ReadSQLTable.time_read_sql_table_parse_dates
-      16.0±0.3ms       13.1±0.4ms     0.82  inference.ToNumericDowncast.time_downcast('int32', 'unsigned')
-           542ms            443ms     0.82  io.stata.Stata.time_write_stata('tw')
-      25.0±0.7ms       20.3±0.2ms     0.81  io.csv.ReadCSVThousands.time_thousands('|', ',')
-      7.33±0.2ms      5.96±0.04ms     0.81  io.hdf.HDFStoreDataFrame.time_query_store_table
-      2.74±0.1ms      2.22±0.06ms     0.81  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', None)
-      4.01±0.2ms      3.24±0.05ms     0.81  rolling.Methods.time_rolling('Series', 1000, 'int', 'std')
-        69.7±2ms         56.3±2ms     0.81  binary_ops.Ops.time_frame_multi_and(True, 'default')
-      27.6±0.6ms       22.2±0.3ms     0.81  io.hdf.HDFStoreDataFrame.time_read_store_mixed
-        19.0±2ms      15.3±0.06ms     0.81  join_merge.MergeAsof.time_on_int
-      2.78±0.3ms      2.24±0.02ms     0.81  index_object.Ops.time_divide('int')
-        30.8±1μs       24.8±0.7μs     0.81  offset.OffestDatetimeArithmetic.time_add_10(<YearBegin: month=1>)
-     1.85±0.06ms      1.49±0.04ms     0.81  io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'ymd')
-         194±6ms          156±2ms     0.80  io.hdf.HDFStoreDataFrame.time_write_store_table_dc
-         304±4ms          243±2ms     0.80  io.stata.Stata.time_write_stata('ty')
-         316±5ms          253±2ms     0.80  io.stata.Stata.time_write_stata('tq')
-      2.50±0.1ms      2.00±0.02ms     0.80  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', 'high')
-         225±7μs          180±3μs     0.80  groupby.GroupByMethods.time_dtype_as_group('int', 'first', 'direct')
-        296±10μs          236±1μs     0.80  groupby.GroupByMethods.time_dtype_as_field('float', 'ffill', 'direct')
-        670±50μs         533±10μs     0.80  groupby.GroupByMethods.time_dtype_as_field('float', 'cumsum', 'direct')
-         137±3ms          109±2ms     0.79  join_merge.Align.time_series_align_int64_index
-     1.52±0.01ms      1.20±0.01ms     0.79  index_object.Ops.time_divide('float')
-         187±2ms        147±0.9ms     0.79  io.json.ToJSON.time_float_int_lines('index')
-           667ms            521ms     0.78  io.excel.Excel.time_write_excel('openpyxl')
-       165±0.1ms          129±1ms     0.78  index_object.SetOperations.time_operation('strings', 'union')
-           329ms            255ms     0.78  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'float_with_nan')
-         114±7ms         88.1±2ms     0.77  io.sas.SAS.time_read_msgpack('sas7bdat')
-        34.2±1ms       26.4±0.3ms     0.77  io.hdf.HDF.time_write_hdf('fixed')
-        171±20μs          132±1μs     0.77  groupby.GroupByMethods.time_dtype_as_field('float', 'first', 'direct')
-        220±10ms          169±2ms     0.77  inference.ToNumericDowncast.time_downcast('string-int', 'unsigned')
-         353±8μs         269±10μs     0.76  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessMonthBegin>)
-      5.94±0.2μs      4.51±0.06μs     0.76  io.hdf.HDFStoreDataFrame.time_store_repr
-     4.10±0.06ms      3.10±0.01ms     0.75  io.hdf.HDFStoreDataFrame.time_store_info
-        50.6±3ms       38.1±0.2ms     0.75  inference.ToNumericDowncast.time_downcast('int-list', 'integer')
-         117±3ms       87.7±0.5ms     0.75  io.hdf.HDF.time_read_hdf('table')
-        59.3±4ms       44.5±0.6ms     0.75  io.hdf.HDFStoreDataFrame.time_read_store_table_mixed
-      1.22±0.2ms         914±10μs     0.75  inference.NumericInferOps.time_divide(<class 'numpy.int8'>)
-      15.9±0.5μs       11.9±0.2μs     0.75  indexing.NumericSeriesIndexing.time_iloc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>)
-      4.22±0.2ms      3.15±0.03ms     0.75  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
-     1.51±0.07ms      1.13±0.01ms     0.75  inference.NumericInferOps.time_subtract(<class 'numpy.int32'>)
-      2.96±0.2ms      2.17±0.03ms     0.73  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', None)
-      20.6±0.9ms       15.1±0.6ms     0.73  io.msgpack.MSGPack.time_read_msgpack
-     1.43±0.08ms      1.04±0.01ms     0.73  inference.ToNumericDowncast.time_downcast('datetime64', None)
-      3.08±0.2ms      2.24±0.06ms     0.73  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'round_trip')
-        740±20μs         537±20μs     0.73  groupby.GroupByMethods.time_dtype_as_field('float', 'cumprod', 'transformation')
-        313±80μs          226±5μs     0.72  inference.NumericInferOps.time_multiply(<class 'numpy.int16'>)
-      3.21±0.1ms       2.24±0.2ms     0.70  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'high')
-      3.11±0.2ms       2.15±0.1ms     0.69  rolling.Methods.time_rolling('Series', 1000, 'int', 'sum')
-      6.40±0.4ms      4.41±0.03ms     0.69  io.sas.SAS.time_read_msgpack('xport')
-          20.4ms           13.8ms     0.68  io.sql.ReadSQLTableDtypes.time_read_sql_table_column('float')
-      5.01±0.8ms      3.31±0.01ms     0.66  inference.NumericInferOps.time_modulo(<class 'numpy.int16'>)
-       46.4±10μs       30.0±0.5μs     0.65  inference.ToNumeric.time_from_float('ignore')
-        198±20μs          126±3μs     0.64  offset.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthEnd>)
-      2.74±0.1ms       1.54±0.2ms     0.56  inference.NumericInferOps.time_subtract(<class 'numpy.float32'>)
-        53.4±3μs       29.4±0.8μs     0.55  inference.ToNumeric.time_from_float('coerce')
-        61.4±8ms      31.7±0.08ms     0.52  index_object.SetOperations.time_operation('strings', 'intersection')
-        503±30μs         258±10μs     0.51  inference.NumericInferOps.time_subtract(<class 'numpy.int16'>)
-        710±60μs         338±10μs     0.48  join_merge.JoinNonUnique.time_join_non_unique_equal
-      1.14±0.2ms          512±9μs     0.45  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'transformation')
-      2.59±0.1ms      1.14±0.03ms     0.44  inference.NumericInferOps.time_multiply(<class 'numpy.float32'>)
-      7.09±0.7ms      2.99±0.08ms     0.42  inference.ToNumeric.time_from_numeric_str('ignore')
-        532±50μs          172±2μs     0.32  inference.NumericInferOps.time_multiply(<class 'numpy.int8'>)
-      1.57±0.3ms         388±10μs     0.25  inference.NumericInferOps.time_multiply(<class 'numpy.uint32'>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

toobaz · 2018-05-08T06:36:24Z

I ran benchmarks, but changes are all over the place. Both positive and negative. For same kind of methods (like rolling) some go up and down. I am not sure how stable are those benchmarks. I have to leave it for few hours to run and could not assure complete idleness of the computer.

What you can do if you suspect a given result is a fluke is to rerun only that benchmarks file, as in:

asv continuous -f 1.1 master iterators --bench groupby

this should take just a couple of minutes (assuming you just ran another asv test on the same commits), during which you can probably leave your computer idle.

jreback

once perf tests are run (and asv are added if needed), would need a whatsnew note.

pandas/core/base.py

mitar · 2018-05-29T05:34:50Z

I think I updated everything requested in the code. I am now running benchmarks once more on an idle machine.

mitar · 2018-05-29T05:36:23Z

There already seems to be an itertuples asv test in frame_methods.py.

mitar · 2018-05-29T05:39:15Z

I added few more asv tests.

mitar · 2018-05-29T09:39:48Z

Updated benchmarks:

       before           after         ratio
     [1c2844ac]       [980403a3]
+       140±0.9ms          370±2ms     2.65  binary_ops.Ops.time_frame_comparison(False, 'default')
+         148±1ms          376±1ms     2.53  binary_ops.Ops.time_frame_comparison(False, 1)
+          43.9ms           97.5ms     2.22  frame_methods.Iteration.time_itertuples_raw_tuples
+          57.5ms            111ms     1.93  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+           113ms            183ms     1.61  frame_methods.Iteration.time_itertuples_to_list
+           104ms            162ms     1.56  frame_methods.Iteration.time_itertuples
+           729ms            1.10s     1.50  join_merge.ConcatPanels.time_c_ordered(2, False)
+      16.6±0.3ms       24.7±0.2ms     1.48  groupby.Categories.time_groupby_extra_cat_nosort
+      19.9±0.5ms       28.7±0.4ms     1.44  groupby.Categories.time_groupby_ordered_nosort
+      20.6±0.3ms       29.5±0.3ms     1.43  groupby.Categories.time_groupby_nosort
+      31.1±0.7ms         44.0±2ms     1.42  eval.Eval.time_add('numexpr', 1)
+     6.80±0.01ms         9.43±2ms     1.39  binary_ops.Timeseries.time_series_timestamp_compare('US/Eastern')
+          1.75ms           2.37ms     1.35  frame_methods.Iteration.time_iteritems_cached
+        32.5±1ms         42.9±2ms     1.32  eval.Eval.time_mult('numexpr', 1)
+      77.4±0.4ms       98.0±0.3ms     1.27  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'median')
+      32.0±0.6ms       40.1±0.1ms     1.25  join_merge.MergeAsof.time_on_int32
+           535ms            646ms     1.21  reindex.Reindex.time_reindex_multiindex
+       135±0.6ms          161±5ms     1.19  binary_ops.Ops.time_frame_multi_and(True, 'default')
+           1.25s            1.45s     1.16  join_merge.I8Merge.time_i8merge('left')
+      48.0±0.1ms       54.6±0.1ms     1.14  sparse.ToCoo.time_sparse_series_to_coo
+          3.49ms           3.94ms     1.13  index_object.Indexing.time_get_loc_non_unique('Float')
+     11.7±0.08μs      13.1±0.07μs     1.13  offset.OnOffset.time_on_offset(<BusinessYearBegin: month=1>)
+           721ms            812ms     1.13  join_merge.ConcatPanels.time_c_ordered(2, True)
+     6.64±0.02ms       7.40±0.2ms     1.12  categoricals.Concat.time_union
+         916±9ns      1.02±0.02μs     1.11  timestamp.TimestampProperties.time_tz(None, None)
+          20.0ms           22.3ms     1.11  eval.Query.time_query_datetime_column
+        3.75±0ms         4.13±0ms     1.10  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'round_trip')
-           1.21s            1.10s     0.91  join_merge.MergeCategoricals.time_merge_object
-     1.32±0.04μs      1.19±0.02μs     0.90  timestamp.TimestampProperties.time_days_in_month(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
-     1.10±0.02μs          992±5ns     0.90  timestamp.TimestampProperties.time_is_quarter_start(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-     1.39±0.02μs      1.25±0.02μs     0.90  timestamp.TimestampProperties.time_week(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
-     1.11±0.02μs         1.00±0μs     0.90  timestamp.TimestampProperties.time_is_quarter_start(None, None)
-        4.09±0ms         3.68±0ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', None)
-     4.08±0.01ms      3.67±0.01ms     0.90  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', None)
-           102ms           90.9ms     0.90  frame_ctor.FromDicts.time_list_of_dict
-           2.62s            2.30s     0.88  join_merge.JoinIndex.time_left_outer_join_index
-      76.5±0.7ms       66.9±0.1ms     0.87  frame_methods.Isnull.time_isnull_strngs
-     6.92±0.01ms      5.97±0.01ms     0.86  rolling.Methods.time_rolling('Series', 10, 'int', 'std')
-     6.94±0.04ms      5.90±0.01ms     0.85  rolling.Methods.time_rolling('Series', 1000, 'int', 'std')
-         116±3ms         96.9±1ms     0.83  io.csv.ReadCSVCategorical.time_convert_direct
-        26.3±1ms       21.4±0.7ms     0.81  stat_ops.FrameOps.time_op('mad', 'int', 0, False)
-        14.3±1ms       8.04±0.1ms     0.56  binary_ops.Ops.time_frame_comparison(True, 1)
-     18.4±0.06ms      5.27±0.01ms     0.29  offset.OffsetSeriesArithmetic.time_add_offset(<SemiMonthEnd: day_of_month=15>)
-     16.3±0.02ms         4.62±0ms     0.28  offset.OffsetSeriesArithmetic.time_add_offset(<BusinessDay>)
-     18.1±0.01ms         4.87±0ms     0.27  offset.OffsetSeriesArithmetic.time_add_offset(<SemiMonthBegin: day_of_month=15>)
-     17.6±0.03ms         4.47±0ms     0.25  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<SemiMonthEnd: day_of_month=15>)
-     15.5±0.04ms      3.81±0.02ms     0.25  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<BusinessDay>)
-     17.1±0.03ms      4.07±0.01ms     0.24  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<SemiMonthBegin: day_of_month=15>)
-         129±1ms      11.0±0.08ms     0.08  offset.ApplyIndex.time_apply_index(<BusinessDay>)
-         144±1ms      11.9±0.04ms     0.08  offset.ApplyIndex.time_apply_index(<SemiMonthEnd: day_of_month=15>)
-       145±0.7ms      11.0±0.07ms     0.08  offset.ApplyIndex.time_apply_index(<SemiMonthBegin: day_of_month=15>)
-        98.5±1ms          256±4μs     0.00  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

mitar · 2018-05-29T09:45:18Z

Just iteration, ran again:

· Running 16 total benchmarks (2 commits * 1 environments * 8 benchmarks)
[  0.00%] · For pandas commit hash 980403a3:
[  0.00%] ·· Building for virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[  6.25%] ··· Running frame_methods.Iteration.time_iteritems                                                                                                                             75.0ms
[ 12.50%] ··· Running frame_methods.Iteration.time_iteritems_cached                                                                                                                      2.41ms
[ 18.75%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                                                                                                     385ms
[ 25.00%] ··· Running frame_methods.Iteration.time_iterrows                                                                                                                               598ms
[ 31.25%] ··· Running frame_methods.Iteration.time_itertuples                                                                                                                             154ms
[ 37.50%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples                                                                                                                  101ms
[ 43.75%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples_to_list                                                                                                          115ms
[ 50.00%] ··· Running frame_methods.Iteration.time_itertuples_to_list                                                                                                                     187ms
[ 50.00%] · For pandas commit hash 1c2844ac:
[ 50.00%] ·· Building for virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking virtualenv-py3.5-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 56.25%] ··· Running frame_methods.Iteration.time_iteritems                                                                                                                             73.9ms
[ 62.50%] ··· Running frame_methods.Iteration.time_iteritems_cached                                                                                                                      1.90ms
[ 68.75%] ··· Running frame_methods.Iteration.time_iteritems_indexing                                                                                                                     379ms
[ 75.00%] ··· Running frame_methods.Iteration.time_iterrows                                                                                                                               600ms
[ 81.25%] ··· Running frame_methods.Iteration.time_itertuples                                                                                                                             104ms
[ 87.50%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples                                                                                                                 44.2ms
[ 93.75%] ··· Running frame_methods.Iteration.time_itertuples_raw_tuples_to_list                                                                                                         58.5ms
[100.00%] ··· Running frame_methods.Iteration.time_itertuples_to_list                                                                                                                     111ms       before           after         ratio
     [1c2844ac]       [980403a3]
+          44.2ms            101ms     2.28  frame_methods.Iteration.time_itertuples_raw_tuples
+          58.5ms            115ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+           111ms            187ms     1.69  frame_methods.Iteration.time_itertuples_to_list
+           104ms            154ms     1.48  frame_methods.Iteration.time_itertuples
+          1.90ms           2.41ms     1.27  frame_methods.Iteration.time_iteritems_cached

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2018-05-29T10:43:02Z

doc/source/whatsnew/v0.23.1.txt

@@ -16,6 +16,8 @@ New features
 ~~~~~~~~~~~~

 - :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
+- Iterating over a :class:`Series` and using :meth:`DataFrame.itertuples` now create iterators without internally


move to 0.24.0 performance section

jreback · 2018-05-29T10:44:43Z

@mitar did you run this backwards? those times have increased

mitar · 2018-05-29T10:50:04Z

@mitar did you run this backwards? those times have increased

I ran asv continuous -f 1.1 -E virtualenv upstream/master HEAD. I think they increased indeed. I have no explanation for that. I think it might be that it is costly to keep many iterators around. It is using less memory, but it is having overhead, it seems. So my explanation is that it is traditional lazy vs. strict execution performance trade-off.

I am not sure how to determine better what is happening.

pandas/core/base.py

(cherry picked from commit f566b46)

(cherry picked from commit eb219ac)

* Preserve sparsity * Preserve fill value

TomAugspurger · 2018-12-20T12:50:29Z

@rok split the fix for this off to #24372

Fixes GH20783.

…d_first

…e benchmark

…1d_sparse benchmark" This reverts commit e2f892a.

rok · 2018-12-20T16:00:38Z

@TomAugspurger - Thanks! I've rebased to your commit, I suppose we should be good now.

rok · 2018-12-21T11:54:46Z

I've rerun the benchmark, I assume a lot of changes are due to the fix.

       before           after         ratio
     [5d134ec1]       [2780c6f3]
                      <iterators>
+         313±5ms         613±10ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples
+        458±10ms         861±10ms     1.88  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        864±30μs      1.61±0.02ms     1.86  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
+        773±20ms       1.33±0.03s     1.71  frame_methods.Iteration.time_itertuples_to_list
+        214±20ms        366±100ms     1.71  join_merge.ConcatPanels.time_f_ordered(1, False)
+        635±20ms       1.05±0.02s     1.65  frame_methods.Iteration.time_itertuples
+      15.2±0.5ms       24.3±0.4ms     1.59  index_object.Indexing.time_get_loc_non_unique_sorted('Float')
+        634±20μs        932±200μs     1.47  groupby.GroupByMethods.time_dtype_as_field('int', 'cumsum', 'transformation')
+      1.06±0.02s        1.52±0.1s     1.43  groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'transformation')
+        859±20μs      1.23±0.07ms     1.43  frame_methods.Iteration.time_iteritems_cached
+      6.66±0.1ms       9.41±0.3ms     1.41  groupby.Categories.time_groupby_ordered_nosort
+        194±10ms         274±30ms     1.41  join_merge.ConcatPanels.time_f_ordered(1, True)
+      6.76±0.1ms      9.45±0.04ms     1.40  groupby.Categories.time_groupby_nosort
+         281±4μs         390±30μs     1.39  groupby.GroupByMethods.time_dtype_as_field('int', 'ffill', 'transformation')
+      4.41±0.4ms       6.06±0.4ms     1.37  inference.DateInferOps.time_subtract_datetimes
+        779±10μs       1.03±0.2ms     1.32  groupby.GroupByMethods.time_dtype_as_field('int', 'cumprod', 'direct')
+        553±20μs        730±100μs     1.32  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb378>, True)
+        625±10μs        807±200μs     1.29  groupby.GroupByMethods.time_dtype_as_field('int', 'cummin', 'transformation')
+         198±5μs         251±20μs     1.27  groupby.GroupByMethods.time_dtype_as_field('int', 'first', 'transformation')
+        558±10μs        708±100μs     1.27  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb0d0>, True)
+      24.6±0.8ms       31.1±0.7ms     1.27  frame_ctor.FromDicts.time_list_of_dict
+         282±3μs         356±20μs     1.26  groupby.GroupByMethods.time_dtype_as_field('int', 'ffill', 'direct')
+        524±10μs        662±100μs     1.26  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb0d0>, False)
+        557±20μs        702±100μs     1.26  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb1e0>, True)
+         196±4μs         245±20μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'first', 'direct')
+        629±20μs        786±100μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'cummin', 'direct')
+         368±5μs         460±30μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'head', 'transformation')
+      13.0±0.3ms       16.2±0.7ms     1.25  index_object.SetOperations.time_operation('date_string', 'union')
+        373±20μs         464±30μs     1.25  groupby.GroupByMethods.time_dtype_as_field('int', 'head', 'direct')
+         186±4μs         232±20μs     1.24  groupby.GroupByMethods.time_dtype_as_field('int', 'last', 'direct')
+        481±10ms         597±40ms     1.24  groupby.GroupByMethods.time_dtype_as_field('int', 'mad', 'direct')
+        531±20μs         658±90μs     1.24  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f96271bb2f0>, False)
+      78.1±0.7μs        96.4±10μs     1.23  groupby.GroupByMethods.time_dtype_as_group('int', 'all', 'direct')
+        44.9±2ms         55.1±4ms     1.23  frame_methods.Isnull.time_isnull_obj
+     1.23±0.01ms      1.48±0.09ms     1.20  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+      13.9±0.1ms       16.1±0.2ms     1.16  reindex.DropDuplicates.time_frame_drop_dups_bool(False)
+     9.89±0.05ms       11.5±0.7ms     1.16  groupby.AggFunctions.time_different_numpy_functions
+     2.03±0.01ms       2.34±0.2ms     1.15  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', None)
+        916±10μs       1.04±0.1ms     1.14  groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'transformation')
+        538±20μs         611±30μs     1.14  frame_methods.Isnull.time_isnull_floats_no_null
+        284±10μs         322±10μs     1.13  categoricals.Constructor.time_from_codes_all_int8
+      80.8±0.5μs         90.9±8μs     1.12  groupby.GroupByMethods.time_dtype_as_group('int', 'any', 'direct')
+       130±0.8μs          146±6μs     1.12  groupby.GroupByMethods.time_dtype_as_group('int', 'count', 'direct')
+        91.1±1μs          102±5μs     1.12  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+        79.8±1ms         89.2±2ms     1.12  frame_ctor.FromDicts.time_nested_dict_int64
+      24.9±0.5ms       27.7±0.6ms     1.11  frame_ctor.FromDicts.time_nested_dict_index_columns
+         278±4μs         308±20μs     1.11  groupby.GroupByMethods.time_dtype_as_group('int', 'ffill', 'direct')
+         458±4ms         506±10ms     1.11  indexing.NonNumericSeriesIndexing.time_getitem_list_like('datetime', 'nonunique_monotonic_inc')
+         104±2μs          115±5μs     1.10  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
+         632±2μs         696±50μs     1.10  groupby.GroupByMethods.time_dtype_as_field('int', 'cummax', 'direct')
-            812M             732M     0.90  frame_methods.Iteration.peakmem_itertuples_to_list
-            758M             677M     0.89  frame_methods.Iteration.peakmem_itertuples_raw_to_list
-        439±20μs         391±10μs     0.89  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        621±20μs          553±7μs     0.89  timeseries.DatetimeIndex.time_unique('repeated')
-        577±40μs          513±4μs     0.89  timeseries.DatetimeIndex.time_to_date('dst')
-        15.8±1ms      14.0±0.07ms     0.89  timeseries.DatetimeIndex.time_to_date('tz_naive')
-      16.9±0.9ms      14.9±0.02ms     0.88  timeseries.DatetimeIndex.time_to_time('tz_naive')
-        15.8±1ms       13.9±0.1ms     0.88  timeseries.DatetimeIndex.time_to_date('repeated')
-        980±40μs         813±80μs     0.83  groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'direct')
-     1.19±0.08ms        987±100μs     0.83  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'transformation')
-        289±10μs         234±30μs     0.81  groupby.GroupByMethods.time_dtype_as_group('int', 'min', 'direct')
-      2.45±0.2ms       1.97±0.1ms     0.81  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-        151±20μs        120±0.9μs     0.80  period.PeriodUnaryMethods.time_to_timestamp('min')
-        178±40μs          132±2μs     0.74  period.PeriodProperties.time_property('M', 'end_time')
-       56.2±20μs       39.1±0.3μs     0.70  period.Indexing.time_series_loc
-       467±100μs          320±7μs     0.69  period.Indexing.time_intersection
-        199±60μs          136±2μs     0.68  period.PeriodIndexConstructor.time_from_date_range('D')
-       96.3±10μs       60.0±0.2μs     0.62  index_object.Indexing.time_get_loc_sorted('Float')
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_read_first
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_start
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples_start
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples
-         285±6ms       1.41±0.1ms     0.00  frame_methods.Iteration.time_itertuples_start
-         295±8ms       1.35±0.1ms     0.00  frame_methods.Iteration.time_itertuples_read_first
-         261±6ms          711±8μs     0.00  frame_methods.Iteration.time_itertuples_raw_read_first
-        267±10ms         718±40μs     0.00  frame_methods.Iteration.time_itertuples_raw_start

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2018-12-23T17:34:48Z

can you merge master

jreback · 2018-12-23T17:36:50Z

can you add whatsnew note in perf section

jreback · 2018-12-24T18:56:04Z

why are these 3 increasing?

+         313±5ms         613±10ms     1.96  frame_methods.Iteration.time_itertuples_raw_tuples
+        458±10ms         861±10ms     1.88  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        773±20ms       1.33±0.03s     1.71  frame_methods.Iteration.time_itertuples_to_list

rok · 2018-12-24T23:35:41Z

Previous implementation would load the entire dataset into Python memory and the iterate over it. Now we iterate over the underlying array which reduces read speed.

I think this is in line with the previous discussion, we are trying to optimize for (initialization) memory use and initialization time rather than total execution time.

The to_list benchmarks run list(df.itertuples()) which is probably not the way one would want to use this method.

rok · 2018-12-24T23:48:08Z

asv continuous -f 1.1  --no-only-changed upstream/master iterators -b ^frame_methods.Iteration

       before           after         ratio
     [fc7bc3f7]       [766ba8f2]
     <iterators^2>       <iterators>
         323±20ms         603±30ms    ~1.87  frame_methods.Iteration.time_itertuples_raw_tuples
        466±100ms         845±30ms    ~1.82  frame_methods.Iteration.time_itertuples_raw_tuples_to_list
+        787±40ms       1.22±0.02s     1.55  frame_methods.Iteration.time_itertuples_to_list
+        660±30ms         989±30ms     1.50  frame_methods.Iteration.time_itertuples
         843±30μs      1.12±0.05ms    ~1.33  frame_methods.Iteration.time_iteritems_cached
         98.5±4ms         116±20ms    ~1.18  frame_methods.Iteration.time_iteritems_indexing
         19.7±1ms         20.8±1ms     1.06  frame_methods.Iteration.time_iteritems
               64               64     1.00  frame_methods.Iteration.mem_itertuples_raw_start
               8M               8M     1.00  frame_methods.Iteration.mem_itertuples_raw_to_list
              136              136     1.00  frame_methods.Iteration.mem_itertuples_read_first
               56               56     1.00  frame_methods.Iteration.mem_itertuples_start
               8M               8M     1.00  frame_methods.Iteration.mem_itertuples_to_list
         283±10ms         278±10ms     0.98  frame_methods.Iteration.time_iterrows
-            812M             732M     0.90  frame_methods.Iteration.peakmem_itertuples_to_list
-            757M             677M     0.89  frame_methods.Iteration.peakmem_itertuples_raw_to_list
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_read_first
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw_start
-            613M             289M     0.47  frame_methods.Iteration.peakmem_itertuples_raw
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples
-            654M             297M     0.45  frame_methods.Iteration.peakmem_itertuples_start
-        294±10ms      1.34±0.07ms     0.00  frame_methods.Iteration.time_itertuples_read_first
-        280±20ms      1.18±0.02ms     0.00  frame_methods.Iteration.time_itertuples_start
-       267±100ms         697±30μs     0.00  frame_methods.Iteration.time_itertuples_raw_read_first
-        268±80ms         692±30μs     0.00  frame_methods.Iteration.time_itertuples_raw_start

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2018-12-25T17:59:21Z

thanks @mitar and @rok

rok · 2018-12-25T23:03:58Z

Thanks @jreback, @mitar and @TomAugspurger ! :)

mitar mentioned this pull request Apr 23, 2018

Make itertuples really an iterator/generator in implementation, not just return type #20783

Closed

mitar force-pushed the iterators branch from 1823ff9 to 721e503 Compare April 23, 2018 15:59

mitar mentioned this pull request Apr 23, 2018

iterrows: when upcasting to object, values are converted to python types #13468

Open

TomAugspurger added the Performance Memory or execution speed performance label Apr 23, 2018

mitar force-pushed the iterators branch from 721e503 to f2fbb39 Compare April 23, 2018 18:37

mroeschke reviewed Apr 23, 2018

View reviewed changes

jreback requested changes May 8, 2018

View reviewed changes

pandas/core/base.py Show resolved Hide resolved

mitar force-pushed the iterators branch 3 times, most recently from ad99292 to 868f8db Compare May 29, 2018 05:32

mitar force-pushed the iterators branch from 868f8db to 980403a Compare May 29, 2018 05:38

jreback requested changes May 29, 2018

View reviewed changes

mitar force-pushed the iterators branch from 980403a to 2773e34 Compare May 29, 2018 10:51

mitar commented May 29, 2018

View reviewed changes

pandas/core/base.py Show resolved Hide resolved

TomAugspurger added 3 commits December 20, 2018 06:41

Fixed warnings in asv files

aa08a6d

(cherry picked from commit f566b46)

avoid series constructor

ae026b2

(cherry picked from commit eb219ac)

BUG: Fix concat(Series[sparse], axis=1)

b253674

* Preserve sparsity * Preserve fill value

TomAugspurger and others added 7 commits December 20, 2018 07:21

SparseSeries unstack

6a65cbc

ENH: Implemented lazy iteration.

080f0bd

Fixes GH20783.

adding benchmarks for itertuples

4bc7a78

making dataframe size greater for itertuples benchmark

ff3174a

switching mem_itertuples_raw_read_first to peakmem_itertuples_raw_rea…

854ac01

…d_first

reduce number of rows in reshape.GetDummies.time_get_dummies_1d_spars…

49248d2

…e benchmark

Revert "reduce number of rows in reshape.GetDummies.time_get_dummies_…

2780c6f

…1d_sparse benchmark" This reverts commit e2f892a.

rok force-pushed the iterators branch from b66ab43 to 2780c6f Compare December 20, 2018 15:58

TomAugspurger mentioned this pull request Dec 21, 2018

BUG/PERF: Sparse get_dummies uses concat #24372

Merged

jreback added this to the 0.24.0 milestone Dec 23, 2018

rok added 2 commits December 24, 2018 02:32

change to whatsnew perf section

05ef2f8

Merge remote-tracking branch 'upstream/master' into iterators

766ba8f

jreback approved these changes Dec 25, 2018

View reviewed changes

jreback merged commit b2b877c into pandas-dev:master Dec 25, 2018

mitar deleted the iterators branch December 25, 2018 18:10

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: Implemented lazy iteration (pandas-dev#20796)

fc4c580

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: Implemented lazy iteration (pandas-dev#20796)

c26602f

Uh oh!

ENH: Implemented lazy iteration #20796

ENH: Implemented lazy iteration #20796

Uh oh!

Conversation

mitar commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Apr 23, 2018

Uh oh!

mitar commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mitar commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitar commented Apr 23, 2018

Uh oh!

mroeschke commented Apr 23, 2018

Uh oh!

mroeschke Apr 23, 2018

Choose a reason for hiding this comment

Uh oh!

mitar Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mitar commented Apr 23, 2018

Uh oh!

mitar commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toobaz commented May 8, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mitar commented May 29, 2018

Uh oh!

mitar commented May 29, 2018

Uh oh!

mitar commented May 29, 2018

Uh oh!

mitar commented May 29, 2018

Uh oh!

mitar commented May 29, 2018

Uh oh!

jreback May 29, 2018

Choose a reason for hiding this comment

Uh oh!

mitar May 29, 2018

Choose a reason for hiding this comment

Uh oh!

jreback commented May 29, 2018

Uh oh!

mitar commented May 29, 2018

Uh oh!

Uh oh!

TomAugspurger commented Dec 20, 2018

Uh oh!

rok commented Dec 20, 2018

Uh oh!

rok commented Dec 21, 2018

Uh oh!

jreback commented Dec 23, 2018

Uh oh!

jreback commented Dec 23, 2018

Uh oh!

jreback commented Dec 24, 2018

Uh oh!

rok commented Dec 24, 2018

Uh oh!

rok commented Dec 24, 2018

Uh oh!

jreback commented Dec 25, 2018

mitar commented Apr 23, 2018 •

edited

Loading

mitar commented Apr 23, 2018 •

edited

Loading

codecov bot commented Apr 23, 2018 •

edited

Loading

mitar commented Apr 23, 2018 •

edited

Loading

mitar Apr 23, 2018 •

edited

Loading

mitar commented Apr 24, 2018 •

edited

Loading