-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation of scale using jumps #1452
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scaling looks a bit hairy to me, and the fact it's repeated in VLV doesn't help, so not sure if this is worth the minor optimization, but I'll leave approval to @na--. :)
jumps := essw.jumps[segmentIndex] | ||
endValue := (value / essw.lcd) * int64(len(jumps)) | ||
remaining := value % essw.lcd | ||
if jumps[0] <= remaining { | ||
i, j := 0, len(jumps) | ||
for i < j { | ||
h := int(uint(i+j) >> 1) // avoid overflow when computing h | ||
// i ≤ h < j | ||
if jumps[h] < remaining { | ||
i = h + 1 // preserves f(i-1) == false | ||
} else { | ||
j = h // preserves f(j) == true | ||
} | ||
} | ||
endValue += int64(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is impossible to follow by reading for me, will have to step through it with an example. The comments don't help much either... :-/
Probably no action needed from your part, just pointing it out. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't (really) documented anything ... I wanted to see if it's worth it ... also to rebase it (as I previously did a month ago or so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an explanation/help for you to understand it.
The jumps instead of the difference between indexes in each cycle are the actual indexes
Example: the LCD(cycle size) is 7 and the jumps = {2, 4}previously that would've been start=2 offsets={2, 5} (the first 2 is the difference between the two indexes and the 5 is how much you need to add to 4 to loop to the start(2) - (4 + 5) %7 == 2.
Previously in order to find how many actual elements you have in NOT full cycle, we iterated over the offsets and added 1 until we pass the mark, which more or less is linearly searching through the offsets.
This new approach searches for the "jump" up to which we will go to (jump to :P). So the first three lines above are just calculating the full cycles and finding the remaining
which is to search in the list of jumps ... which are sorted in increasing order ... because they go from smallest to biggest .. by definition :D.
There is a small tricky thing that .. the index actually needs to be 1 bigger (as the start is indexed as 0 ;) )which is why I just search for the jump that is bigger than the remaining, not something else. This is more tricky in the VLV unfortunately but I would argue it might've gotten more readable:rofl:
For the record literally the whole part between i, j :=...
and the end of the if is the copy of the sort.Search
... as I mentioned in the commit message I saw 30-70% better performance this way ... which IMO is significant enough for having this 6 lines (we can probably comment around them)
Now as you can imagine the one has O(n) and the other is O(log(n)) which while great is not awesome for small n
as I now need to do a whole search instead of just iterate over ... 1,2,3,4 offsets :). Luckily apparently even for small inputs, the difference is negligible, but for inputs where there are a lot of offsets, the difference is 99%+ which given that this is not entirely out of the question is probably a good idea.
I was debating whether to add an if to go back to linear search if the jumps are less then ... choose a number .. but decided against it as that would make it even longer ... and now there is still a chance this will be inlined (I hope, I probably need to check).
return jumps[0], offsets, essw.lcd | ||
} | ||
|
||
// GetStripedJumps returns the stripped jumps for the given segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// GetStripedJumps returns the stripped jumps for the given segment | |
// GetStripedJumps returns the striped jumps for the given segment |
(Copied from GetStripedOffsets
so might as well fix it there too.)
This implementation saves the offset from the start of the cycle instead of from the previous step, called jumps later. This favors big numerator segments as can be seen by the benchmarks. I decided to drop the offsets and just use the jumps to get the offsets as the most CPU intensive part of this is calculating the jumps Obviously this means that if most uses cases require the offsets this is not better performing I would expect that in the cases that this gets worsen it gets worsen insignificantly compared to the cases where the new "jumps" give much better performance. See the absolute values as well as the difference. The sort.Search is inlined because this gives something like 30-70% boost which I would argue here it is worth the few extra lines name old time/op new time/op delta pkg:go.k6.io/k6/lib goos:linux goarch:amd64 GetStripedOffsets/length10,seed777-8 36.0µs ±29% 34.1µs ±36% ~ (p=0.796 n=10+10) GetStripedOffsets/length100,seed777-8 1.57ms ± 9% 1.36ms ±16% -13.47% (p=0.001 n=9+10) GetStripedOffsetsEven/length10-8 5.74µs ± 5% 5.01µs ± 6% -12.78% (p=0.000 n=10+10) GetStripedOffsetsEven/length100-8 68.9µs ±10% 57.7µs ± 6% -16.28% (p=0.000 n=10+10) GetStripedOffsetsEven/length1000-8 3.16ms ±12% 3.03ms ± 7% ~ (p=0.089 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(5)-8 2.52ns ± 5% 2.56ns ± 4% ~ (p=0.184 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5)-8 3.50µs ± 5% 3.14µs ±14% -10.47% (p=0.001 n=9+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5)_prefilled-8 0.65ns ± 8% 2.53ns ± 1% +289.75% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:/segment.Scale(5523)-8 2.48ns ± 5% 2.58ns ± 5% +4.30% (p=0.009 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5523)-8 3.71µs ± 9% 3.14µs ± 8% -15.54% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5523)_prefilled-8 0.61ns ± 7% 2.49ns ± 2% +304.56% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(5000000)-8 2.33ns ± 6% 2.56ns ± 4% +9.78% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)-8 3.84µs ± 5% 3.13µs ± 3% -18.57% (p=0.000 n=9+9) ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)_prefilled-8 0.63ns ± 7% 2.52ns ± 3% +297.39% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/segment.Scale(67280421310721)-8 2.36ns ± 5% 2.55ns ± 4% +8.09% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)-8 3.78µs ± 6% 3.18µs ± 6% -15.98% (p=0.000 n=9+10) ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)_prefilled-8 0.62ns ± 6% 2.51ns ± 4% +302.29% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5)-8 2.22µs ± 7% 1.94µs ± 8% -12.42% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)-8 4.03µs ± 7% 3.35µs ± 5% -17.03% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)_prefilled-8 0.65ns ± 7% 2.49ns ± 3% +283.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5523)-8 2.24µs ± 7% 1.94µs ± 8% -13.50% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)-8 3.94µs ± 5% 3.45µs ± 7% -12.47% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)_prefilled-8 0.63ns ± 8% 2.49ns ± 5% +297.91% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5000000)-8 2.31µs ± 8% 1.95µs ± 9% -15.43% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)-8 3.89µs ± 8% 3.32µs ± 6% -14.62% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)_prefilled-8 0.62ns ± 5% 2.53ns ± 4% +309.71% (p=0.000 n=9+10) ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(67280421310721)-8 2.26µs ± 2% 1.87µs ± 5% -17.13% (p=0.000 n=10+8) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)-8 3.88µs ± 7% 3.48µs ± 7% -10.17% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)_prefilled-8 0.61ns ± 6% 2.51ns ± 3% +309.80% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5)-8 2.93µs ± 6% 2.55µs ± 4% -13.11% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)-8 4.65µs ± 5% 4.03µs ± 6% -13.50% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)_prefilled-8 12.1ns ± 3% 6.3ns ± 3% -47.73% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5523)-8 2.80µs ± 5% 2.35µs ± 5% -16.12% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)-8 4.66µs ± 5% 4.04µs ± 8% -13.21% (p=0.000 n=9+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)_prefilled-8 10.3ns ± 3% 6.3ns ± 3% -39.20% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5000000)-8 2.43µs ± 8% 2.05µs ± 8% -15.65% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)-8 4.80µs ± 6% 4.04µs ± 9% -15.74% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)_prefilled-8 6.61ns ± 5% 7.67ns ± 7% +15.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(67280421310721)-8 2.63µs ± 6% 2.25µs ± 8% -14.41% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)-8 4.70µs ± 7% 3.98µs ±14% -15.29% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)_prefilled-8 20.2ns ± 3% 10.6ns ± 5% -47.60% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5)-8 2.79µs ± 6% 2.35µs ± 4% -15.79% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)-8 5.49µs ± 7% 4.65µs ± 8% -15.32% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)_prefilled-8 8.07ns ± 3% 4.96ns ± 5% -38.60% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5523)-8 2.58µs ± 8% 2.23µs ± 4% -13.49% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)-8 5.42µs ± 4% 4.51µs ± 4% -16.76% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5523)_prefilled-8 10.3ns ± 3% 5.7ns ± 4% -45.04% (p=0.000 n=10+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5000000)-8 2.41µs ± 9% 2.04µs ± 9% -15.37% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)-8 5.48µs ± 8% 4.55µs ±11% -17.08% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5000000)_prefilled-8 6.70ns ± 1% 7.03ns ± 2% +4.88% (p=0.000 n=9+9) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(67280421310721)-8 2.60µs ± 7% 2.20µs ±10% -15.28% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)-8 5.49µs ± 5% 4.51µs ± 6% -17.94% (p=0.000 n=10+10) ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(67280421310721)_prefilled-8 20.2ns ± 4% 10.5ns ± 4% -47.85% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5)-8 3.41µs ±11% 2.94µs ± 7% -13.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)-8 6.62µs ± 5% 5.93µs ± 7% -10.44% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5)_prefilled-8 4.03ns ± 3% 4.40ns ± 2% +9.16% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5523)-8 3.60µs ± 8% 3.09µs ± 8% -14.02% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)-8 6.79µs ± 6% 6.13µs ± 6% -9.77% (p=0.000 n=10+9) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5523)_prefilled-8 11.0ns ± 3% 6.4ns ± 7% -41.82% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(5000000)-8 3.37µs ± 6% 2.89µs ±10% -14.11% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)-8 6.94µs ± 2% 5.93µs ± 3% -14.59% (p=0.000 n=9+8) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(5000000)_prefilled-8 6.98ns ± 3% 7.38ns ± 3% +5.86% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/segment.Scale(67280421310721)-8 3.86µs ± 3% 3.29µs ± 5% -14.81% (p=0.000 n=8+9) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)-8 6.75µs ± 6% 6.07µs ± 7% -10.05% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2/5:4/5/et.Scale(67280421310721)_prefilled-8 10.0ns ± 3% 10.5ns ± 2% +5.63% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5)-8 3.51µs ± 8% 3.03µs ± 8% -13.52% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)-8 809µs ± 4% 635µs ± 7% -21.45% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5)_prefilled-8 12.0ns ± 2% 18.8ns ± 5% +55.93% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5523)-8 3.71µs ± 7% 3.20µs ± 4% -13.69% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)-8 828µs ± 5% 635µs ± 6% -23.35% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5523)_prefilled-8 8.11µs ± 3% 0.02µs ± 5% -99.78% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(5000000)-8 3.63µs ± 3% 3.04µs ± 7% -16.06% (p=0.000 n=8+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)-8 929µs ± 6% 642µs ± 5% -30.91% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(5000000)_prefilled-8 41.2µs ± 2% 0.0µs ± 4% -99.96% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/segment.Scale(67280421310721)-8 3.94µs ± 4% 3.40µs ± 5% -13.77% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)-8 908µs ± 9% 630µs ± 8% -30.63% (p=0.000 n=10+10) ExecutionSegmentScale/seq:;segment:2235/5213:4/5/et.Scale(67280421310721)_prefilled-8 41.4µs ± 2% 0.0µs ± 6% -99.95% (p=0.000 n=9+10) pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64 Cal/1s-8 4.25µs ±14% 5.24µs ±13% +23.21% (p=0.000 n=10+10) Cal/1m0s-8 273µs ± 3% 309µs ± 8% +13.09% (p=0.000 n=9+9) CalRat/1s-8 12.1ms ± 2% 14.4ms ± 2% +18.80% (p=0.000 n=8+8) CalRat/1m0s-8 8.12s ± 2% 8.06s ± 1% ~ (p=0.408 n=10+8) RampingVUsGetRawExecutionSteps/seq:;segment:/normal-8 390µs ± 5% 312µs ± 6% -19.97% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:/rollercoaster-8 3.99ms ± 7% 3.21ms ± 9% -19.58% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:0:1/normal-8 386µs ± 5% 304µs ± 5% -21.39% (p=0.000 n=9+10) RampingVUsGetRawExecutionSteps/seq:;segment:0:1/rollercoaster-8 3.95ms ± 5% 3.21ms ± 6% -18.81% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/normal-8 114µs ± 5% 91µs ± 9% -19.58% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/rollercoaster-8 1.25ms ± 6% 1.00ms ± 9% -19.82% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/normal-8 38.5µs ± 5% 32.8µs ± 5% -14.80% (p=0.000 n=10+9) RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/rollercoaster-8 425µs ± 8% 324µs ± 5% -23.90% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/normal-8 152µs ±10% 121µs ±10% -20.07% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/rollercoaster-8 1.60ms ± 6% 1.28ms ± 5% -20.02% (p=0.000 n=10+9) RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/normal-8 148µs ± 5% 138µs ± 7% -6.56% (p=0.000 n=10+10) RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/rollercoaster-8 1.42ms ±11% 1.20ms ± 6% -15.34% (p=0.000 n=10+10) VUHandleIterations-8 1.00s ± 0% 1.00s ± 0% ~ (p=0.529 n=10+10) name old iterations/s new iterations/s delta pkg:go.k6.io/k6/lib/executor goos:linux goarch:amd64 RampingArrivalRateRun/VUs10-8 251k ± 5% 262k ± 6% +4.33% (p=0.043 n=10+10) RampingArrivalRateRun/VUs100-8 315k ± 2% 321k ± 2% +2.07% (p=0.002 n=10+10) RampingArrivalRateRun/VUs1000-8 291k ± 2% 306k ± 1% +5.02% (p=0.000 n=10+10) RampingArrivalRateRun/VUs10000-8 266k ± 2% 286k ± 2% +7.84% (p=0.000 n=10+10) VUHandleIterations-8 0.09 ± 6% 0.08 ± 6% -14.06% (p=0.000 n=10+10)
a2da192
to
6f01129
Compare
I have updated this PR |
@mstoykov Is this still relevant and should we merge it in this cycle? |
I moved it to draft so people don't come looking it at it. IMO it is still relevant, but arguably not useful in a lot of cases. It is mostly waiting for us to either hit problems with the current implementation or someone to have enough time to write better benchmarks and rerun them so we can be certain this actually improves stuff. Both might happen the next time we refactor the executors or need to expand/implement a new one 🤷 But for now it just stays here |
This implementation saves the offset from the start of the cycle instead
of from the previous step, called jumps later. This favors big numerator
segments as can be seen by the benchmarks.
I decided to drop the offsets and just use the jumps to get the offsets
as the most CPU intensive part of this is calculating the jumps
Obviously this means that if most uses cases require the offsets this
is not better performing
I would expect that in the cases that this gets worsen it gets worsen
insignificantly compared to the cases where the new "jumps" give much
better performance. See the absolute values as well as the difference.
The sort.Search is inlined because this gives something like 30-70%
boost which I would argue here it is worth the few extra lines