Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrival-rate distribution bugs #1299

Closed
na-- opened this issue Jan 8, 2020 · 4 comments
Closed

arrival-rate distribution bugs #1299

na-- opened this issue Jan 8, 2020 · 4 comments
Assignees

Comments

@na--
Copy link
Member

na-- commented Jan 8, 2020

This comment describes how execution segments don't work correctly with the arrival-rate executors: #1007 (comment)

And the other known bug is specifically with the variable-arrival-rate executor, described and somewhat fixed in #1285 and demonstrated in this screencast: https://asciinema.org/a/xZca9sVUpmFfqBU0IFVCPRYZO

@mstoykov
Copy link
Contributor

mstoykov commented Jan 17, 2020

After much thought and even more relearning of the basics of calculus (mostly through this youtube series). I came up with a math-based solution for how we calculate when an iteration should happen during the ramp-up or ramp-down, which also works for the non-ramp parts but is not required :D.

This is based on one of the original reasons for developing calculus and calculating the integral of a function. If we have iteration per second - a and a number of seconds b and we multiply them this will be the number iterations that will be done in that amount of time which is a*b. This is also the area of a rectangle with sides a and b. As such if we now have a function f which for any given time t will return us how many iterations per second we need to do f(t). And under that graph of that function from t =0 to t=10(for example) we fit rectangles that are nonoverlapping and under that curve then the sum of the area of those rectangles will be some number (smaller) close the number of iterations that will be done if we're doing as many iterations as the function f told us to do. Here you can see a bunch of functions with rectangles fitted under them as well as interactive simulation.

And "obviously"(don't tell my math teacher I've used this) if we increase the number rectangles we get closer and closer to the actual area under the graph of the function. And this informally is called integral, which calculus can help us calculate ... accurately :)

Now for the actual application, we have in k6 .. we don't care about every possible function (although we can implement many more in the future) we care about a linear function that has value A at time 0 and value B at time t where that will represent a given stage that goes from A iterations per second to B iterations per second over the period of t. This can also be done with the same but instead from 0 from d which is the start of the stage till time t+d but ... it will give the same result and complicates some parts :D (as an exercise the reader can prove it ;)). Also, we don't want to calculate the integral for the whole ... we want to find x for which the integral between 0 and x is equal to 1 and then 2 and 3 ... This way we find at which time we need to make the first iteration, then the second iteration and so on ...

The linear function f(x) is of the form f(x) = at + b for some a and b which we can find out easy by the fact that at x=0 we have f(0)=a*0 +b = b=A where is the begining value of the stage. In the same way at x=t we have f(t)= at + b = at + A = B ... which means that a = \frac{B-A}{t} and so the equation is f(x)=\frac{B-A}{t}x + A 🎊 .

Note: maybe I will reedit this in the future to use ... different letters as it gets a little confusing with the big and small letters.

Now then ... we need to find the x for which integrating from 0 to x of f(x) = \frac{B-A}{t}x +A is equal to 1 ... or more accurately some number d and if we can only know a formula for this .... well we do but it's somewhat harder to get so I am just going to link to wolfram alpha query that gives us the answer(s) directly (wait a few seconds as the part "solutions" take a bit longer to generate.

We mostly care about the one where if A-B != 0 and t!=0 and specifically the \frac{A t - \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B}, and not \frac{A t + \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B} this is because it's always the positive answer smaller then t, here is ... prove:

All variables A,B,t,d are positive

If A<B (ramp-up): A-B (the denominator in both cases) is negative, so if we want a positive answer we need the numerator to be negative as well, but in the case of \frac{A t + \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B} there are two positive numbers being added up so there is no way that we will get a negative number

If A>B (ramp-down) A-B (the denominator) is positive so we want a positive real number above but we also want one smaller then t because if it's bigger we are getting time outside the one the stage is long.
We can do some math that will prove both that both are possitve (if real) and that the second one produces a result bigger then t
\frac{A t \pm \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B} =\frac{A t \pm A t\sqrt{1 + 2d\frac{B-A}{A^2t^2}}}{A - B} = \frac{A t \pm A t\sqrt{1 - 2d\frac{A-B}{A^2t^2}}}{A - B} = A t \frac{1 \pm \sqrt{1 - 2d\frac{A-B}{A^2t^2}}}{A - B}

Because everything is positive the only way to get a negative number is for 1 - \sqrt{\frac{\sqrt1 - 2d(A-B)}{A^2 t^2}} to be negative, but this means that 1 - 2d\frac{A-B}{A^2 t^2} should be bigger than 1 which is impossible when A,B,d,t are positive and A>B, so when A>B \frac{A t - \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B} is always positive.

on the other hand we will prove that A t \frac{1 + \sqrt{1 - 2d\frac{A-B}{A^2t^2}}}{A - B} \leq t can't be true:

A t \frac{1 + \sqrt{1 - 2d\frac{A-B}{A^2t^2}}}{A - B} \leq t # divide by t on both sides
A \frac{1 + \sqrt{1 - 2d\frac{A-B}{A^2t^2}}}{A - B} \leq 1 # move A inside the brackets
\frac{A}{A-B} + \frac{A \sqrt{1 - 2 d \frac{A-B}{A^2 t^2}}}{A-B} \leq 1

but A>B so A-B is a positive number equal or smaller then A, so \frac{A}{A-B} \geq 1, in which case A \sqrt{1-2d\frac{A-B}{A^2 t^2}}\leq 0, but the only possible case is that it is0 and it happens only, when B is also 0, but if \frac{A}{A-B} \geq 1, in which case A \sqrt{1-2d\frac{A-B}{A^2 t^2}}\leq 0 then At\frac{1 \pm \sqrt{1 -2d\frac{A-B}{A^2 t^2}}}{A-B} = \frac{At}{A-B} regardless of whether it + or - which means that there is a single solution equal to the case with the -.
So the only solution we care about in the case of k6 is \frac{A t - \sqrt{t (-2 A d + 2 B d + A^2 t)}}{A - B}.

A similar method can be used to do another ramp-up/downs or even to simulate sinusoidal traffic for example. But I propose exploring this after this has been implemented, tried and proven to work well :D.

I will be posting PR soon with this after I figure out how to better split the work with execution segments.

@na--
Copy link
Member Author

na-- commented Jan 17, 2020

Hmm after thinking that the integral is an overkill and attempting to arrive at the same solution geometrically (since, after all, we're dealing with simple rectangles and triangles), I now view the integral, if there are no mistakes in your calculations, as the simpler approach... 😅 Maybe I missed an optimization somewhere, or maybe it's because of the extra constant I added [1], but I ended up with a nastier quadratic equation at the end...

In any case, and regardless of the exact mathematical details that are used, I can see how the approach of having a function F(i) that can return the precise time offset of an arbitrary iteration with number i would probably work...

However:

  1. I have some concerns about the performance of this approach, or rather, the CPU we're going to use, if we have to calculate even a quadratic equation hundreds of times per second... Even with float64s this won't be very efficient when we have to do it 200 times per second. And considering we're doing all of this just to set some timers, I'm wondering if we shouldn't sacrifice some precision for the sake of less CPU usage. We probably don't need absolute nanosecond precision when the sleep timers could skew things by milliseconds...
  2. Even if we use this approach, it would have to be more complicated. What you've done accounts only for a single stage in the variable-arrival-rate executor. Because the ramping-up and ramping-down can't be expressed as a contiguous function, you'd sort of have to do add some nasty custom logic when the result of the F(i) function for the next iteration returns a time greater than the stage boundary.
    2.1. When you cross that stage boundary, you'd sort of be left with some amount of "unfinished" iterations that you'd have to keep as a constant (that's the [1] constant I mentioned above) and include in the integral for the next stage.
  3. It still doesn't solve what in my mind is probably the bigger issue, how to distribute work fairly when there are multiple execution segments...

@na--
Copy link
Member Author

na-- commented Jan 20, 2020

I wrote up on an idea/proposal how to fix point 3 from above, how to independently and evenly distribute arrival-rate iterations across instances: #997 (comment)

@mstoykov
Copy link
Contributor

closed by #1285

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants