implementation of scale using jumps #1452

mstoykov · 2020-05-16T15:07:06Z

This implementation saves the offset from the start of the cycle instead
of from the previous step, called jumps later. This favors big numerator
segments as can be seen by the benchmarks.

I decided to drop the offsets and just use the jumps to get the offsets
as the most CPU intensive part of this is calculating the jumps
Obviously this means that if most uses cases require the offsets this
is not better performing

I would expect that in the cases that this gets worsen it gets worsen
insignificantly compared to the cases where the new "jumps" give much
better performance. See the absolute values as well as the difference.

The sort.Search is inlined because this gives something like 30-70%
boost which I would argue here it is worth the few extra lines

name                                                                                                                  old time/op        new time/op        delta
pkg:go.k6.io/k6/lib goos:linux goarch:amd64
GetStripedOffsets/length10,seed777-8                                                                                        36.0µs ±29%        34.1µs ±36%      ~     (p=0.796 n=10+10)
GetStripedOffsets/length100,seed777-8                                                                                       1.57ms ± 9%        1.36ms ±16%   -13.47%  (p=0.001 n=9+10)
GetStripedOffsetsEven/length10-8                                                                                            5.74µs ± 5%        5.01µs ± 6%   -12.78%  (p=0.000 n=10+10)
GetStripedOffsetsEven/length100-8                                                                                           68.9µs ±10%        57.7µs ± 6%   -16.28%  (p=0.000 n=10+10)
GetStripedOffsetsEven/length1000-8                                                                                          3.16ms ±12%        3.03ms ± 7%      ~     (p=0.089 n=10+10)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5)-8                                                                      2.52ns ± 5%        2.56ns ± 4%      ~     (p=0.184 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5)-8                                                                           3.50µs ± 5%        3.14µs ±14%   -10.47%  (p=0.001 n=9+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5)_prefilled-8                                                                 0.65ns ± 8%        2.53ns ± 1%  +289.75%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5523)-8                                                                   2.48ns ± 5%        2.58ns ± 5%    +4.30%  (p=0.009 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5523)-8                                                                        3.71µs ± 9%        3.14µs ± 8%   -15.54%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5523)_prefilled-8                                                              0.61ns ± 7%        2.49ns ± 2%  +304.56%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5000000)-8                                                                2.33ns ± 6%        2.56ns ± 4%    +9.78%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)-8                                                                     3.84µs ± 5%        3.13µs ± 3%   -18.57%  (p=0.000 n=9+9)
ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)_prefilled-8                                                           0.63ns ± 7%        2.52ns ± 3%  +297.39%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/segment.Scale(67280421310721)-8                                                         2.36ns ± 5%        2.55ns ± 4%    +8.09%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)-8                                                              3.78µs ± 6%        3.18µs ± 6%   -15.98%  (p=0.000 n=9+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)_prefilled-8                                                    0.62ns ± 6%        2.51ns ± 4%  +302.29%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5)-8                                                                   2.22µs ± 7%        1.94µs ± 8%   -12.42%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)-8                                                                        4.03µs ± 7%        3.35µs ± 5%   -17.03%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)_prefilled-8                                                              0.65ns ± 7%        2.49ns ± 3%  +283.93%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5523)-8                                                                2.24µs ± 7%        1.94µs ± 8%   -13.50%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)-8                                                                     3.94µs ± 5%        3.45µs ± 7%   -12.47%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)_prefilled-8                                                           0.63ns ± 8%        2.49ns ± 5%  +297.91%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5000000)-8                                                             2.31µs ± 8%        1.95µs ± 9%   -15.43%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)-8                                                                  3.89µs ± 8%        3.32µs ± 6%   -14.62%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)_prefilled-8                                                        0.62ns ± 5%        2.53ns ± 4%  +309.71%  (p=0.000 n=9+10)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(67280421310721)-8                                                      2.26µs ± 2%        1.87µs ± 5%   -17.13%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)-8                                                           3.88µs ± 7%        3.48µs ± 7%   -10.17%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)_prefilled-8                                                 0.61ns ± 6%        2.51ns ± 3%  +309.80%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5)-8                                      2.93µs ± 6%        2.55µs ± 4%   -13.11%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)-8                                           4.65µs ± 5%        4.03µs ± 6%   -13.50%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)_prefilled-8                                 12.1ns ± 3%         6.3ns ± 3%   -47.73%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5523)-8                                   2.80µs ± 5%        2.35µs ± 5%   -16.12%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)-8                                        4.66µs ± 5%        4.04µs ± 8%   -13.21%  (p=0.000 n=9+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)_prefilled-8                              10.3ns ± 3%         6.3ns ± 3%   -39.20%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5000000)-8                                2.43µs ± 8%        2.05µs ± 8%   -15.65%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)-8                                     4.80µs ± 6%        4.04µs ± 9%   -15.74%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)_prefilled-8                           6.61ns ± 5%        7.67ns ± 7%   +15.93%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(67280421310721)-8                         2.63µs ± 6%        2.25µs ± 8%   -14.41%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)-8                              4.70µs ± 7%        3.98µs ±14%   -15.29%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)_prefilled-8                    20.2ns ± 3%        10.6ns ± 5%   -47.60%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5)-8                          2.79µs ± 6%        2.35µs ± 4%   -15.79%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)-8                               5.49µs ± 7%        4.65µs ± 8%   -15.32%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)_prefilled-8                     8.07ns ± 3%        4.96ns ± 5%   -38.60%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5523)-8                       2.58µs ± 8%        2.23µs ± 4%   -13.49%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)-8                            5.42µs ± 4%        4.51µs ± 4%   -16.76%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)_prefilled-8                  10.3ns ± 3%         5.7ns ± 4%   -45.04%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5000000)-8                    2.41µs ± 9%        2.04µs ± 9%   -15.37%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)-8                         5.48µs ± 8%        4.55µs ±11%   -17.08%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)_prefilled-8               6.70ns ± 1%        7.03ns ± 2%    +4.88%  (p=0.000 n=9+9)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(67280421310721)-8             2.60µs ± 7%        2.20µs ±10%   -15.28%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)-8                  5.49µs ± 5%        4.51µs ± 6%   -17.94%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)_prefilled-8        20.2ns ± 4%        10.5ns ± 4%   -47.85%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5)-8                                                               3.41µs ±11%        2.94µs ± 7%   -13.93%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)-8                                                                    6.62µs ± 5%        5.93µs ± 7%   -10.44%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)_prefilled-8                                                          4.03ns ± 3%        4.40ns ± 2%    +9.16%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5523)-8                                                            3.60µs ± 8%        3.09µs ± 8%   -14.02%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)-8                                                                 6.79µs ± 6%        6.13µs ± 6%    -9.77%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)_prefilled-8                                                       11.0ns ± 3%         6.4ns ± 7%   -41.82%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5000000)-8                                                         3.37µs ± 6%        2.89µs ±10%   -14.11%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)-8                                                              6.94µs ± 2%        5.93µs ± 3%   -14.59%  (p=0.000 n=9+8)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)_prefilled-8                                                    6.98ns ± 3%        7.38ns ± 3%    +5.86%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(67280421310721)-8                                                  3.86µs ± 3%        3.29µs ± 5%   -14.81%  (p=0.000 n=8+9)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)-8                                                       6.75µs ± 6%        6.07µs ± 7%   -10.05%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)_prefilled-8                                             10.0ns ± 3%        10.5ns ± 2%    +5.63%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5)-8                                                         3.51µs ± 8%        3.03µs ± 8%   -13.52%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)-8                                                               809µs ± 4%         635µs ± 7%   -21.45%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)_prefilled-8                                                    12.0ns ± 2%        18.8ns ± 5%   +55.93%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5523)-8                                                      3.71µs ± 7%        3.20µs ± 4%   -13.69%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)-8                                                            828µs ± 5%         635µs ± 6%   -23.35%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)_prefilled-8                                                 8.11µs ± 3%        0.02µs ± 5%   -99.78%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5000000)-8                                                   3.63µs ± 3%        3.04µs ± 7%   -16.06%  (p=0.000 n=8+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)-8                                                         929µs ± 6%         642µs ± 5%   -30.91%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)_prefilled-8                                              41.2µs ± 2%         0.0µs ± 4%   -99.96%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(67280421310721)-8                                            3.94µs ± 4%        3.40µs ± 5%   -13.77%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)-8                                                  908µs ± 9%         630µs ± 8%   -30.63%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)_prefilled-8                                       41.4µs ± 2%         0.0µs ± 6%   -99.95%  (p=0.000 n=9+10)
pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64
Cal/1s-8                                                                                                                    4.25µs ±14%        5.24µs ±13%   +23.21%  (p=0.000 n=10+10)
Cal/1m0s-8                                                                                                                   273µs ± 3%         309µs ± 8%   +13.09%  (p=0.000 n=9+9)
CalRat/1s-8                                                                                                                 12.1ms ± 2%        14.4ms ± 2%   +18.80%  (p=0.000 n=8+8)
CalRat/1m0s-8                                                                                                                8.12s ± 2%         8.06s ± 1%      ~     (p=0.408 n=10+8)
RampingVUsGetRawExecutionSteps/seq:;segment:/normal-8                                                                        390µs ± 5%         312µs ± 6%   -19.97%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:/rollercoaster-8                                                                3.99ms ± 7%        3.21ms ± 9%   -19.58%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:0:1/normal-8                                                                     386µs ± 5%         304µs ± 5%   -21.39%  (p=0.000 n=9+10)
RampingVUsGetRawExecutionSteps/seq:;segment:0:1/rollercoaster-8                                                             3.95ms ± 5%        3.21ms ± 6%   -18.81%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/normal-8                                        114µs ± 5%          91µs ± 9%   -19.58%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/rollercoaster-8                                1.25ms ± 6%        1.00ms ± 9%   -19.82%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/normal-8                           38.5µs ± 5%        32.8µs ± 5%   -14.80%  (p=0.000 n=10+9)
RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/rollercoaster-8                     425µs ± 8%         324µs ± 5%   -23.90%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/normal-8                                                                 152µs ±10%         121µs ±10%   -20.07%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/rollercoaster-8                                                         1.60ms ± 6%        1.28ms ± 5%   -20.02%  (p=0.000 n=10+9)
RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/normal-8                                                           148µs ± 5%         138µs ± 7%    -6.56%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/rollercoaster-8                                                   1.42ms ±11%        1.20ms ± 6%   -15.34%  (p=0.000 n=10+10)
VUHandleIterations-8                                                                                                         1.00s ± 0%         1.00s ± 0%      ~     (p=0.529 n=10+10)

name                                                                                                                  old iterations/s   new iterations/s   delta
pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64
RampingArrivalRateRun/VUs10-8                                                                                                 251k ± 5%          262k ± 6%    +4.33%  (p=0.043 n=10+10)
RampingArrivalRateRun/VUs100-8                                                                                                315k ± 2%          321k ± 2%    +2.07%  (p=0.002 n=10+10)
RampingArrivalRateRun/VUs1000-8                                                                                               291k ± 2%          306k ± 1%    +5.02%  (p=0.000 n=10+10)
RampingArrivalRateRun/VUs10000-8                                                                                              266k ± 2%          286k ± 2%    +7.84%  (p=0.000 n=10+10)
VUHandleIterations-8                                                                                                          0.09 ± 6%          0.08 ± 6%   -14.06%  (p=0.000 n=10+10)

imiric

The scaling looks a bit hairy to me, and the fact it's repeated in VLV doesn't help, so not sure if this is worth the minor optimization, but I'll leave approval to @na--. :)

imiric · 2020-05-18T08:56:50Z

lib/execution_segment.go

+	jumps := essw.jumps[segmentIndex]
+	endValue := (value / essw.lcd) * int64(len(jumps))
+	remaining := value % essw.lcd
+	if jumps[0] <= remaining {
+		i, j := 0, len(jumps)
+		for i < j {
+			h := int(uint(i+j) >> 1) // avoid overflow when computing h
+			// i ≤ h < j
+			if jumps[h] < remaining {
+				i = h + 1 // preserves f(i-1) == false
+			} else {
+				j = h // preserves f(j) == true
+			}
+		}
+		endValue += int64(i)


This is impossible to follow by reading for me, will have to step through it with an example. The comments don't help much either... :-/

Probably no action needed from your part, just pointing it out. :)

I haven't (really) documented anything ... I wanted to see if it's worth it ... also to rebase it (as I previously did a month ago or so).

as an explanation/help for you to understand it.

The jumps instead of the difference between indexes in each cycle are the actual indexes
Example: the LCD(cycle size) is 7 and the jumps = {2, 4}previously that would've been start=2 offsets={2, 5} (the first 2 is the difference between the two indexes and the 5 is how much you need to add to 4 to loop to the start(2) - (4 + 5) %7 == 2.

Previously in order to find how many actual elements you have in NOT full cycle, we iterated over the offsets and added 1 until we pass the mark, which more or less is linearly searching through the offsets.
This new approach searches for the "jump" up to which we will go to (jump to :P). So the first three lines above are just calculating the full cycles and finding the remaining which is to search in the list of jumps ... which are sorted in increasing order ... because they go from smallest to biggest .. by definition :D.
There is a small tricky thing that .. the index actually needs to be 1 bigger (as the start is indexed as 0 ;) )which is why I just search for the jump that is bigger than the remaining, not something else. This is more tricky in the VLV unfortunately but I would argue it might've gotten more readable:rofl:

For the record literally the whole part between i, j :=... and the end of the if is the copy of the sort.Search ... as I mentioned in the commit message I saw 30-70% better performance this way ... which IMO is significant enough for having this 6 lines (we can probably comment around them)

Now as you can imagine the one has O(n) and the other is O(log(n)) which while great is not awesome for small n as I now need to do a whole search instead of just iterate over ... 1,2,3,4 offsets :). Luckily apparently even for small inputs, the difference is negligible, but for inputs where there are a lot of offsets, the difference is 99%+ which given that this is not entirely out of the question is probably a good idea.

I was debating whether to add an if to go back to linear search if the jumps are less then ... choose a number .. but decided against it as that would make it even longer ... and now there is still a chance this will be inlined (I hope, I probably need to check).

imiric · 2020-05-18T09:02:28Z

lib/execution_segment.go

+	return jumps[0], offsets, essw.lcd
+}
+
+// GetStripedJumps returns the stripped jumps for the given segment


Suggested change

// GetStripedJumps returns the stripped jumps for the given segment

// GetStripedJumps returns the striped jumps for the given segment

(Copied from GetStripedOffsets so might as well fix it there too.)

This implementation saves the offset from the start of the cycle instead of from the previous step, called jumps later. This favors big numerator segments as can be seen by the benchmarks. I decided to drop the offsets and just use the jumps to get the offsets as the most CPU intensive part of this is calculating the jumps Obviously this means that if most uses cases require the offsets this is not better performing I would expect that in the cases that this gets worsen it gets worsen insignificantly compared to the cases where the new "jumps" give much better performance. See the absolute values as well as the difference. The sort.Search is inlined because this gives something like 30-70% boost which I would argue here it is worth the few extra lines name old time/op new time/op delta pkg:go.k6.io/k6/lib goos:linux goarch:amd64 GetStripedOffsets/length10,seed777-8 36.0µs ±29% 34.1µs ±36% ~ (p=0.796 n=10+10) GetStripedOffsets/length100,seed777-8 1.57ms ± 9% 1.36ms ±16% -13.47% (p=0.001 n=9+10) GetStripedOffsetsEven/length10-8 5.74µs ± 5% 5.01µs ± 6% -12.78% (p=0.000 n=10+10) GetStripedOffsetsEven/length100-8 68.9µs ±10% 57.7µs ± 6% -16.28% (p=0.000 n=10+10) GetStripedOffsetsEven/length1000-8 3.16ms ±12% 3.03ms ± 7% ~ (p=0.089 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(5)-8 2.52ns ± 5% 2.56ns ± 4% ~ (p=0.184 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5)-8 3.50µs ± 5% 3.14µs ±14% -10.47% (p=0.001 n=9+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5)_prefilled-8 0.65ns ± 8% 2.53ns ± 1% +289.75% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:/segment.Scale(5523)-8 2.48ns ± 5% 2.58ns ± 5% +4.30% (p=0.009 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5523)-8 3.71µs ± 9% 3.14µs ± 8% -15.54% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5523)_prefilled-8 0.61ns ± 7% 2.49ns ± 2% +304.56% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(5000000)-8 2.33ns ± 6% 2.56ns ± 4% +9.78% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)-8 3.84µs ± 5% 3.13µs ± 3% -18.57% (p=0.000 n=9+9) ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)_prefilled-8 0.63ns ± 7% 2.52ns ± 3% +297.39% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(67280421310721)-8 2.36ns ± 5% 2.55ns ± 4% +8.09% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)-8 3.78µs ± 6% 3.18µs ± 6% -15.98% (p=0.000 n=9+10) ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)_prefilled-8 0.62ns ± 6% 2.51ns ± 4% +302.29% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5)-8 2.22µs ± 7% 1.94µs ± 8% -12.42% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)-8 4.03µs ± 7% 3.35µs ± 5% -17.03% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)_prefilled-8 0.65ns ± 7% 2.49ns ± 3% +283.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5523)-8 2.24µs ± 7% 1.94µs ± 8% -13.50% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)-8 3.94µs ± 5% 3.45µs ± 7% -12.47% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)_prefilled-8 0.63ns ± 8% 2.49ns ± 5% +297.91% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5000000)-8 2.31µs ± 8% 1.95µs ± 9% -15.43% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)-8 3.89µs ± 8% 3.32µs ± 6% -14.62% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)_prefilled-8 0.62ns ± 5% 2.53ns ± 4% +309.71% (p=0.000 n=9+10) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(67280421310721)-8 2.26µs ± 2% 1.87µs ± 5% -17.13% (p=0.000 n=10+8) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)-8 3.88µs ± 7% 3.48µs ± 7% -10.17% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)_prefilled-8 0.61ns ± 6% 2.51ns ± 3% +309.80% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5)-8 2.93µs ± 6% 2.55µs ± 4% -13.11% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)-8 4.65µs ± 5% 4.03µs ± 6% -13.50% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)_prefilled-8 12.1ns ± 3% 6.3ns ± 3% -47.73% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5523)-8 2.80µs ± 5% 2.35µs ± 5% -16.12% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)-8 4.66µs ± 5% 4.04µs ± 8% -13.21% (p=0.000 n=9+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)_prefilled-8 10.3ns ± 3% 6.3ns ± 3% -39.20% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5000000)-8 2.43µs ± 8% 2.05µs ± 8% -15.65% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)-8 4.80µs ± 6% 4.04µs ± 9% -15.74% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)_prefilled-8 6.61ns ± 5% 7.67ns ± 7% +15.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(67280421310721)-8 2.63µs ± 6% 2.25µs ± 8% -14.41% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)-8 4.70µs ± 7% 3.98µs ±14% -15.29% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)_prefilled-8 20.2ns ± 3% 10.6ns ± 5% -47.60% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5)-8 2.79µs ± 6% 2.35µs ± 4% -15.79% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)-8 5.49µs ± 7% 4.65µs ± 8% -15.32% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)_prefilled-8 8.07ns ± 3% 4.96ns ± 5% -38.60% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5523)-8 2.58µs ± 8% 2.23µs ± 4% -13.49% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)-8 5.42µs ± 4% 4.51µs ± 4% -16.76% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)_prefilled-8 10.3ns ± 3% 5.7ns ± 4% -45.04% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5000000)-8 2.41µs ± 9% 2.04µs ± 9% -15.37% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)-8 5.48µs ± 8% 4.55µs ±11% -17.08% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)_prefilled-8 6.70ns ± 1% 7.03ns ± 2% +4.88% (p=0.000 n=9+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(67280421310721)-8 2.60µs ± 7% 2.20µs ±10% -15.28% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)-8 5.49µs ± 5% 4.51µs ± 6% -17.94% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)_prefilled-8 20.2ns ± 4% 10.5ns ± 4% -47.85% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5)-8 3.41µs ±11% 2.94µs ± 7% -13.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)-8 6.62µs ± 5% 5.93µs ± 7% -10.44% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)_prefilled-8 4.03ns ± 3% 4.40ns ± 2% +9.16% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5523)-8 3.60µs ± 8% 3.09µs ± 8% -14.02% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)-8 6.79µs ± 6% 6.13µs ± 6% -9.77% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)_prefilled-8 11.0ns ± 3% 6.4ns ± 7% -41.82% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5000000)-8 3.37µs ± 6% 2.89µs ±10% -14.11% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)-8 6.94µs ± 2% 5.93µs ± 3% -14.59% (p=0.000 n=9+8) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)_prefilled-8 6.98ns ± 3% 7.38ns ± 3% +5.86% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(67280421310721)-8 3.86µs ± 3% 3.29µs ± 5% -14.81% (p=0.000 n=8+9) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)-8 6.75µs ± 6% 6.07µs ± 7% -10.05% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)_prefilled-8 10.0ns ± 3% 10.5ns ± 2% +5.63% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5)-8 3.51µs ± 8% 3.03µs ± 8% -13.52% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)-8 809µs ± 4% 635µs ± 7% -21.45% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)_prefilled-8 12.0ns ± 2% 18.8ns ± 5% +55.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5523)-8 3.71µs ± 7% 3.20µs ± 4% -13.69% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)-8 828µs ± 5% 635µs ± 6% -23.35% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)_prefilled-8 8.11µs ± 3% 0.02µs ± 5% -99.78% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5000000)-8 3.63µs ± 3% 3.04µs ± 7% -16.06% (p=0.000 n=8+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)-8 929µs ± 6% 642µs ± 5% -30.91% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)_prefilled-8 41.2µs ± 2% 0.0µs ± 4% -99.96% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(67280421310721)-8 3.94µs ± 4% 3.40µs ± 5% -13.77% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)-8 908µs ± 9% 630µs ± 8% -30.63% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)_prefilled-8 41.4µs ± 2% 0.0µs ± 6% -99.95% (p=0.000 n=9+10) pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64 Cal/1s-8 4.25µs ±14% 5.24µs ±13% +23.21% (p=0.000 n=10+10) Cal/1m0s-8 273µs ± 3% 309µs ± 8% +13.09% (p=0.000 n=9+9) CalRat/1s-8 12.1ms ± 2% 14.4ms ± 2% +18.80% (p=0.000 n=8+8) CalRat/1m0s-8 8.12s ± 2% 8.06s ± 1% ~ (p=0.408 n=10+8) RampingVUsGetRawExecutionSteps/seq:;segment:/normal-8 390µs ± 5% 312µs ± 6% -19.97% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:/rollercoaster-8 3.99ms ± 7% 3.21ms ± 9% -19.58% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:0:1/normal-8 386µs ± 5% 304µs ± 5% -21.39% (p=0.000 n=9+10) RampingVUsGetRawExecutionSteps/seq:;segment:0:1/rollercoaster-8 3.95ms ± 5% 3.21ms ± 6% -18.81% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/normal-8 114µs ± 5% 91µs ± 9% -19.58% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/rollercoaster-8 1.25ms ± 6% 1.00ms ± 9% -19.82% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/normal-8 38.5µs ± 5% 32.8µs ± 5% -14.80% (p=0.000 n=10+9) RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/rollercoaster-8 425µs ± 8% 324µs ± 5% -23.90% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/normal-8 152µs ±10% 121µs ±10% -20.07% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/rollercoaster-8 1.60ms ± 6% 1.28ms ± 5% -20.02% (p=0.000 n=10+9) RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/normal-8 148µs ± 5% 138µs ± 7% -6.56% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/rollercoaster-8 1.42ms ±11% 1.20ms ± 6% -15.34% (p=0.000 n=10+10) VUHandleIterations-8 1.00s ± 0% 1.00s ± 0% ~ (p=0.529 n=10+10) name old iterations/s new iterations/s delta pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64 RampingArrivalRateRun/VUs10-8 251k ± 5% 262k ± 6% +4.33% (p=0.043 n=10+10) RampingArrivalRateRun/VUs100-8 315k ± 2% 321k ± 2% +2.07% (p=0.002 n=10+10) RampingArrivalRateRun/VUs1000-8 291k ± 2% 306k ± 1% +5.02% (p=0.000 n=10+10) RampingArrivalRateRun/VUs10000-8 266k ± 2% 286k ± 2% +7.84% (p=0.000 n=10+10) VUHandleIterations-8 0.09 ± 6% 0.08 ± 6% -14.06% (p=0.000 n=10+10)

mstoykov · 2022-01-31T08:59:00Z

I have updated this PR

imiric · 2022-11-14T17:31:54Z

@mstoykov Is this still relevant and should we merge it in this cycle?

mstoykov · 2022-11-15T08:31:03Z

I moved it to draft so people don't come looking it at it.

IMO it is still relevant, but arguably not useful in a lot of cases. It is mostly waiting for us to either hit problems with the current implementation or someone to have enough time to write better benchmarks and rerun them so we can be certain this actually improves stuff.

Both might happen the next time we refactor the executors or need to expand/implement a new one 🤷

But for now it just stays here

imiric reviewed May 18, 2020

View reviewed changes

mstoykov modified the milestones: v0.27.0, v0.28.0 Jun 30, 2020

mstoykov changed the base branch from new-schedulers to master July 7, 2020 08:31

na-- removed this from the v0.28.0 milestone Sep 8, 2020

mstoykov force-pushed the experimentalSegmentedJumps branch from a2da192 to 6f01129 Compare January 31, 2022 08:53

mstoykov changed the title ~~WIP implementation of scale using jumps~~ implementation of scale using jumps Jan 31, 2022

mstoykov marked this pull request as draft November 15, 2022 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation of scale using jumps #1452

implementation of scale using jumps #1452

mstoykov commented May 16, 2020 •

edited

Loading

imiric left a comment

imiric May 18, 2020

mstoykov May 18, 2020

mstoykov May 18, 2020

imiric May 18, 2020

mstoykov commented Jan 31, 2022

imiric commented Nov 14, 2022

mstoykov commented Nov 15, 2022

	// GetStripedJumps returns the stripped jumps for the given segment
	// GetStripedJumps returns the striped jumps for the given segment

implementation of scale using jumps #1452

Are you sure you want to change the base?

implementation of scale using jumps #1452

Conversation

mstoykov commented May 16, 2020 • edited Loading

imiric left a comment

Choose a reason for hiding this comment

imiric May 18, 2020

Choose a reason for hiding this comment

mstoykov May 18, 2020

Choose a reason for hiding this comment

mstoykov May 18, 2020

Choose a reason for hiding this comment

imiric May 18, 2020

Choose a reason for hiding this comment

mstoykov commented Jan 31, 2022

imiric commented Nov 14, 2022

mstoykov commented Nov 15, 2022

mstoykov commented May 16, 2020 •

edited

Loading