Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliably targeting some RPS, and in stages? #120

Closed
shibrady opened this issue Jun 8, 2022 · 7 comments
Closed

Reliably targeting some RPS, and in stages? #120

shibrady opened this issue Jun 8, 2022 · 7 comments
Labels
documentation Improvements or additions to documentation evaluation needed

Comments

@shibrady
Copy link

shibrady commented Jun 8, 2022

Reading through grafana/k6#140 and seeing grafana/k6#2438 (although, not having grokked the changes) this might be a duplicate, but here goes...

I'd like to run distributed k6 instances to achieve some particular RPS (like arrival rate executors) given some iteration function (e.g. the default export). Ideally I'd like to be able to adjust this RPS target after certain durations (sort of like having multiple k6 Scenarios with different targets), and I'd like to be accurate (able to adapt to the SUT, like constant arrival rate executors).

It isn't clear to me how to reliably achieve this with k6 without knowing how many runners/VUs I will need ahead of time. We do currently run k6 on k8s distributed across several pods, but if I were to use this operator it seems like I would need to know how many runners I'd need ahead of time (parallelism), and how many VUs I should realistically be allocating per runner (although I'm admittedly not quite sure how --execution-segments work under the hood with constant arrival rate executors).

Say for example my SUT's request latency is a constant 100ms and I wanted to test at 100k QPS - I presume I'd need some ~10000 VUs, which I may not be able to reliably run on 1 k6 instance due to lack of resources (we have some somewhat memory intensive k6 extensions going on). So if I run 1000 VUs per pod, I imagine I'd be able to reach 100k QPS with 10 pods. But if my SUT's performance degrades, I imagine I'd need more VUs; I could imagine giving some headroom per pod with constant arrival rate executors (I'm assuming this works with --execution-segments), but realistically there's only so much headroom I can give per pod, and I'd just need more k6 runners.

(and realistically I may not know the expected latency ahead of time, and it may grow arbitrarily high mid-test)

This is fine when just thinking of one-off load tests where I'm able to go through trial and error, but seems less than ideal when wanting to automate this more generally. For now we have some tooling on our end to try and handle our use case (essentially a controller goes through trial and error, monitoring actual runner RPS and adjusting runner counts to meet a target), but this has had its flaws and we're looking into alternatives.

I'm wondering if there may be something I'm missing about k6/the k6 operator's features that make this easy, but if not, what the k6's teams thoughts are on this (or maybe grafana/k6#2438 is meant to address this?).

@yorugac
Copy link
Collaborator

yorugac commented Jun 8, 2022

Hi @shibrady,
Thanks for the issue! I'm afraid I'm not fully clear what the problem is so I'd like to clarify a couple of things. Do you have a non-distributed working test that is acceptable for your use case and that should be translated to a distributed version? If so, a non-distributed version can safely be used as a baseline: addition of execution segments won't impact the logic of executors - they'd just be distributed among runners. E.g. in case of constant-arrival-rate it means that maxVU parameter is spread evenly among runners: each runner "gets" maxVU / N VUs. The rate would be split similarly: rate / N.

But if my SUT's performance degrades, I imagine I'd need more VUs

How do you come to this conclusion? constant-arrival-rate should try to continue to match the requested rate so long as maxVU is sufficient. At the moment, I fail to see why number of runners would need to be adjusted dynamically... I.e. if the SUT degrades, isn't it a matter of limits of SUT's performance, which arguably is the whole point of the test?
Also, I wonder, do you use thresholds on e.g. response duration?

@shibrady
Copy link
Author

shibrady commented Jun 8, 2022

I haven't actually tried using these executors out, but as an example (that I think gets the point across):

import http from 'k6/http';

export const options = {
  scenarios: {
    stageOne: {
      executor: 'constant-arrival-rate',
      duration: '5m',
      rate: 1000000, // 1mil QPS for 5 min
      timeUnit: '1s',
      maxVUs: 12500, // Some number with headroom where I try to judge, given some hypothetical SUT latency, how I can reach 1mil QPS?
    },
    stageTwo: {
      executor: 'constant-arrival-rate',
      duration: '5m',
      startTime: '5m',
      rate: 10000000, // 10mil QPS for 5 min
      timeUnit: '1s',
      maxVUs: 125000, // Ditto
    },
    // ...
  },
};

export default function () {
  http.get('https://myservice');
}

If my goal is to output some n QPS for some time t, assuming that maxVUs has to have some upper bound per pod (and that I can't just select infinity), I would like to reliably hit myservice with 1 million, 10 million, ... QPS. It seems like to effectively do this via the k6 operator, I would need to somehow calculate both the maxVUs I'd need to support some rate, in combination with the total number of runners I'd want to partition those VUs across at the peak rate (where each pod can realistically only support so many VUs.

If I understood my SUT well enough to e.g. define SLOs, then I think this might be ok - e.g. I could say that I expect p99 to be < 500ms, provision maxVUs to be able to hit that given some rate, and then determine the number of pods given some VU cap per pod; and if we undershoot, we can blame the SLO not being met. But (aside from that sounding like it'd be prone to overprovisioning) I'm more so wondering whether I can target some RPS/rate adaptively without knowing about the service beforehand. 🤔

Also, I wonder, do you use thresholds on e.g. response duration?

We don't use thresholds at the moment and just aggregate metrics through another independent service.

EDIT: Clarified some of my concerns after better understanding that the maxVUs were split as well

@yorugac
Copy link
Collaborator

yorugac commented Jun 9, 2022

Interesting. Neither k6 nor k6-operator are smart enough to find the best parameters for a given SUT on their own ("adaptively") and at least at the moment, I'd say they shouldn't be expected to. I.e. yes, you'll need to try out different values of parameters until you find what's most suitable for you.

Your configuration makes sense, though it's not clear how much one pod can handle? If you know the max number of VUs possible per one pod and desired rate, then calculating both maxVU and number of runners should be trivial (you had similar calculations in your initial post). E.g. if you need 1mil rate within timeUnit = 1s with expected 100 ms response time and one pod cannot handle more than 1K of VUs, then maxVU is ~ 100K and number of pods is 100.

As for SUT's latency not being known well enough: as a suggestion, perhaps it make sense to try a more exploratory executors, e.g. ramping-arrival-rate, with one pod for simplicity (if possible), to understand how given SUT behaves under different loads? And then use those observations to decide on the parameters that would fit the chosen testing and deployment strategies.
I believe, ramping-arrival-rate should split by both rate and maxVU among N runners as constant-arrival-rate does, so the logic is the same.

TBH, this sounds to me like a general issue of how to find fitting implementation of k6 script in the given case. If you know what load one pod can reliably handle and what kind of load you want to achieve, it's straight-forward to calculate the numbers with k6-operator.

IMHO, it'd be really nice to have a system that could be self-adapting and self-determining parameters but that's quite a broad and complex problem which is currently outside of k6's or k6-operator's scope, AFAIK 😄

@BarthV
Copy link

BarthV commented Jun 11, 2022

In theory, this kind of distributed feature can be easily achieved with "leaky bucket"/"rate limiting" family mecanisms. This could be both managed by a third party or directly inside the workers using gossiping.

Technically the most simple implementation would somehow be based on a real tiny redis or memcached dedicated to this purpose. This tiny instance would be filled by all parallel workers until the bucket is full ... waiting for the next leak to create a new spot for another VU.

https://www.mikeperham.com/2020/11/09/the-leaky-bucket-rate-limiter/

I saw two different styles of implementation with Redis:

  • Using ZSETs to store a timestamp of each successful call
  • Using GET/SET with a counter and timestamp

The first suffers from a major issue: it’s O(n) with the bucket size. If you allow 10,000 calls every hour, the ZSET can hold up to 10,000 elements. I knew I didn’t want to go down that path.

The second is much better; it’s O(1) with the bucket size but unnecessarily old school Redis. Redis provides newer Hash commands which can perform all of the necessary logic while also keeping each limiter as one distinct, logical Hash element within Redis.

Another way to solve this is would be to make the operator handling such leaking/rate limiting calls through its own API (but this API must be really resilient to load & qps that could be easily handled by redis or memcached)

@shibrady
Copy link
Author

Interesting. Neither k6 nor k6-operator are smart enough to find the best parameters for a given SUT on their own ("adaptively") and at least at the moment, I'd say they shouldn't be expected to. I.e. yes, you'll need to try out different values of parameters until you find what's most suitable for you.

Your configuration makes sense, though it's not clear how much one pod can handle? If you know the max number of VUs possible per one pod and desired rate, then calculating both maxVU and number of runners should be trivial (you had similar calculations in your initial post). E.g. if you need 1mil rate within timeUnit = 1s with expected 100 ms response time and one pod cannot handle more than 1K of VUs, then maxVU is ~ 100K and number of pods is 100.

As for SUT's latency not being known well enough: as a suggestion, perhaps it make sense to try a more exploratory executors, e.g. ramping-arrival-rate, with one pod for simplicity (if possible), to understand how given SUT behaves under different loads? And then use those observations to decide on the parameters that would fit the chosen testing and deployment strategies. I believe, ramping-arrival-rate should split by both rate and maxVU among N runners as constant-arrival-rate does, so the logic is the same.

TBH, this sounds to me like a general issue of how to find fitting implementation of k6 script in the given case. If you know what load one pod can reliably handle and what kind of load you want to achieve, it's straight-forward to calculate the numbers with k6-operator.

IMHO, it'd be really nice to have a system that could be self-adapting and self-determining parameters but that's quite a broad and complex problem which is currently outside of k6's or k6-operator's scope, AFAIK 😄

That makes sense to me. I agree life would be simpler knowing more about the SUT in this case (and maybe that's the direction we go), but it's good to clear up what's possible in the context of k6!

@yorugac
Copy link
Collaborator

yorugac commented Jun 20, 2022

Sorry for the delay! @shibrady I'm glad that our discussion cleared up the situation 🎉 From my side, it also gave me an indicator that it would probably be good to add some clarity in the docs about different executors in context of k6-operator test runs.

@BarthV I'm not sure this fits the use case of finding the RPS but it's definitely an interesting idea, thanks for sharing! Leaky bucket sounds very similar to what constant-arrival-rate does, just phrased in different terms: k6 doesn't use Redis or memcached but it holds a queue-like object in memory when scheduling iterations for *-arrival-rate executors, AFAIK.
I think in distributed case of k6-operator, constant-arrival-rate becomes harder to reason about because of additional parameter with number of pods; but that is somewhat simplified by the fact that distribution here is uniform. E.g. if we were to implement #95, I expect it'll become more complicated to come up with the correct parameters.

Another interesting question here is whether it makes sense to mess with executor's level logic of k6... e.g. implement a separate ~ distributed constant-arrival-rate. I.e. pull out that queue outside of k6 and centralize the distribution of iterations. But at the moment, I think not, unless there is a very good reasoning to do that. Centralizing the queue will have a performance and network impact which we should try to avoid. Besides, executor's logic is rather complex on its own and adding to it or changing it will have repercussions of cognitive complexity.

@yorugac
Copy link
Collaborator

yorugac commented Aug 22, 2024

Closing this issue as initial question appears to have been resolved. There is now a separate issue for documenting this topic:
#451

@yorugac yorugac closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation evaluation needed
Projects
None yet
Development

No branches or pull requests

3 participants