-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering and distributed execution #140
Comments
WRT #202 / request for comments: I don't have direct experience with etcd so i don't have an opinion there but the notion of using existing, proven software is clearly sound so i'd welcome that. My initial thought is, how would someone know how many VU's a machine can handle? If there's an automated estimation based on system resources (though obv. e.g. a CPU core on AWS is definitely not === a CPU core on dedicated hardware) then that'd help to at least provide consistency which would be good. I'm aware though that it'd quite easy to overload a load generator with work and thus skew its output as it would lack the system resources to measure accurately. However it's implemented, i would imagine it's going to be necessary to allow users to set/amend the VU capability and a good user guide would help a lot - i.e. defining a way in which users can calculate/estimate - that might be the best way to start actually, keeping it simple and iterating/adding from there. |
Honestly, your best shot is probably trial and error. The limiting factor is not typically CPU power, but rather local socket usage, and to some lesser extent, RAM usage, both of which vary slightly between scripts. A good start would be just split your desired number of VUs across as many hosts as you want and seeing if it flies or not. |
Linux has a maximum of 64K ephemeral ports. That's your connection limit. I'd also kernel tune TIME_WAIT etc |
Thanks for this, I am very interested! I was wondering if there would be a way to avoid adding a new service to the pool. The need of extra shared meta date is going to be there, because you would probably want to direct the load test output to a single InfluxDB instance, so maybe we can save this kind of metadata directly there? I know that this would force folks to stick with InfluxDB, but if we'd use etcd people would in any case need to custom tailor something to collect and aggregate results. |
@arichiardi I think it's more important that we look at how we can best implement this, using all available tools, rather than looking at how to minimise dependencies right out of the bat. I'm not saying we should introduce dependencies for the sake of it, but we want this done right. The current requirements for this to be implemented is as follows: Prerequisite: Leader assignmentMost of the below requirements have one prerequisite: we need a central point to make all decisions from. The leader doesn't need a lot of processing power, it just needs to keep an eye on things, so to speak.
Spreading VUs across instances.The algorithm for this could simply be to spread VUs evenly across all available instances, respecting their caps. We could possibly do some weighing, eg. between an instance with max 1000 VUs and one with max 2000 VUs, the latter could get 2x as many VUs allocated to it. Possible implementations I can see:
Central execution of thresholds, from a data source.This would be fairly simple using something like InfluxDB; we can parse threshold snippets for the variables they refer to (there's some code for that already), then query them out of the database, using the starting timestamp of the test as delimiter. We could do something with shipping samples back to the master, but that feels... a little silly. Distributed rate limiting.The Distributed data storage.We need to be able to store two kinds of things:
This can be anything that can store keys and values of arbitrary size. |
Have you considered implementing something similar to what was done for Locust? They have a master/slave architecture where the synchronization happens via ZMQ (TCP), which is lightweight enough. One advantage, in this case, is that there is no need for introducing a hard dependency. The synchronization master/slave can be implemented via ZMQ, HTTP or whatever network protocol you might consider. IMHO, the only disadvantage that Locust implementation has is the fact that it is a stateful system, where the master must always be started first and can't recover if a slave disappears and then comes back. I would rather see a stateless system that can handle connectivity issues gracefully. |
@coderlifter, thanks, we still haven't finalized the k6 distributed execution design yet, so we'll definitely consider this approach when we get to it. We'll post a final design/RFC here when we start implementing this, so it can be discussed by any interested people. |
Any progress on internal discussions? This is a killer feature which is pushing users more towards python at the moment. |
@GTB3NW, sorry, we still haven't started specifically implementing this functionality yet, the next major feature we're currently working on is the arrival-rate based execution support, i.e. being able to schedule the execution in terms of iterations (requests) per second. As a part of the refactoring we're doing for that, we'll also improve some k6 internals in a way that would facilitate the easier implementation of the distributed execution as well, so we're slowly getting to that point, but we're not there yet. |
You should be able to set up n(ip) * ~65K(ports) sockets, since a source of a packet is src ip + src port. |
Any updates on this issue? Very keen to be able to run K6 in a distributed manner |
We are currently finishing a big refactoring of the core of how script execution works in k6, which would be foundational for native distributed execution, among other thigs. You can see some details #1007 (specifically, the execution segments part of it, #997). This would allow you to partition a load test among however many instances you require, without any synchronization between them. For example, if you want to split a test between 3 instances, you would be able to do something like this when #1007 is merged:
Each instance will execute only its own piece of the puzzle. To start the tests synchronously, you can use the |
@ghost @efology
https://tools.ietf.org/html/bcp156#section-2.2 The local IP address is likely singular, and the remote port is likely constant. e.g. Just looking at the public hostname of one of my services hosted by AWS, the DNS resolves to 4 IP addresses. In theory one k6 client machine configured to allow 64,511 ephemeral ports per remote IP would be able to create 258,044 concurrent connections to it. As you say @efology you could also configure a client with multiple IPs and multiply the potential connections that way. Particularly if you want to load test a web socket back end, where you have large numbers of idle connections this could be useful. You might start to hit other limitations like CPU and memory at those higher connection counts too, depending on the internals of how K6 is managing connections. |
Side note to this: As a short-term stopgap solution, you can squeeze more sockets out of a machine by load balancing across multiple IPs, subject to your network topology allowing this. Go actually makes this fairly trivial to implement, just set This is trivial in IPv6 setups that assign prefixes to machines, eg. my laptop currently has a /64 subset of this flat's /48. For IPv4 networks, it more commonly takes some fiddling with your interface configuration and adding discrete IPs - but you can query it all the same, with (* Neither this nor distributed execution helps if you're trying to load test through a NAT with only a single public IP between you and your destination host; that's your bottleneck in that case.) |
@liclac, This is what is used in all the multi NIC implementations, one of which will hopefully get in v0.28.0 ;). And it really does work and if you have multiple IPs/NICs |
I just saw this mentioned on the road map (https://github.com/orgs/grafana/projects/443/views/1) and wanted to make an annoying comment :-) as this must surely be one of the oldest open tickets around? For six years now it has been about "a year away". Isn't it better to just remove it from the road map until we know we're ready to start working on it? I consider it to be the holy grail for k6. When it is done, in the way @liclac envisioned it, when it is as easy to run a distributed test in k6 as it is in Locust, then it is going to be hard for most people to justify using any other load testing tool. Not to take away from the awesome work that's been done, and is being done - you guys rock! But this old killer feature is bugging me with its tendency to never get built but still appear on the road map all the time! /a grumpy old user |
@ragnarlonn, very happy to hear from you! The good news is that we have been low-key working towards this goal for a while 😅 There is an open PR (#2816) with a simple proof-of-concept implementation of distributed execution and even a very rudimentary test suites (#1342) support in a separate commit 🎉 That PoC is not polished at all, it basically only works for the happy path, but it works and I can run a k6 test on multiple instances, with a (mostly) working end-of-test summary and thresholds. Guided by that PoC, so far we have been clearing the obstacles and prerequisites before we can have robust distributed execution implementation. A couple of versions ago we released k6 v0.43.0 that included a series of refactoring efforts spanning 10+ PRs and culminating in #2815, the removal of the As you can see from the first commit of #2816, the only remaining obstacle before we can implement distributed execution is the lack of HDR/sparse histograms (#763), to efficiently move The architecture of that PoC is somewhat different than what was described at the start of the issue. Some of the original ideas were not feasible because k6 doesn't just care about endlessly looping VUs anymore, we now have arrival-rate executors and multiple scenarios in the same test. It also introduced 2 new sub-commands ( In any case, one big reason we haven't pushed distributed execution forward more swiftly has been the fact that it is competing with a lot of other features with high priorities, while we already have a reasonably OK distributed execution solution for a lot of use cases. And I am not talking about just k6 cloud, but mostly about k6-operator. It's not perfect and native k6 distributed execution support will plug some of the current gaps, but if you want to run distributed k6 tests in Kubernetes, it already works pretty well! So yeah, including native distributed k6 execution in the new roadmap is a bit more realistic this time compared to before, but it's still competing with a lot of other priorities. |
@na-- So you're saying we really are getting close to being able to release it? That would be super cool. Will we bump the version to 1.0 also then, or what's the criteria for that? I might try to test that PR! |
Not quite. We have a proof of concept and we are close to actually starting work on a full solution (HDR histograms are the only remaining prerequisite). But we haven't actually started on that final push, e.g. no work except some design docs are planned for it during the current release cycle. Follow the roadmap and milestones to track when that work actually starts. And after work has actually started, releasing it publicly is another matter entirely... I am not sure how long a fully-featured version of my PoC would take to deliver, or if we won't need to make some major architectural deviations from that PoC... 😅 If everything goes well, it shouldn't be too long, given that we have a PoC to follow and that the whole feature is backwards compatible. So it can probably be released as an "experimental" feature initially, allowing us to iterate and make some breaking changes for a few versions to fine-tune things 🤔 But no promises or even guesstimates at this point, sorry 😅 Regarding k6 v1.0, there are other concerns. It hasn't been released not just due to the lack of this feature (or other major ones), but mostly due to the fact that xk6 extensions use parts of k6 as Go dependencies. And Go module import paths change based on the major version of the dependency repo... See #2640 (comment) for more details, but the TLDR version is that before we release k6 v1.0.0, we probably need to refactor the k6 codebase and split it in 2 different Go modules:
|
@na-- I get it. And I know you're making the right calls, I just can't help fretting over that distributed execution ;) When it does get released, there should be a big party thrown somewhere. |
I think it would also be important to support streaming the aggregated results to Grafana for visualising the cluster job as it proceeds. |
I'm up for a party and/or lending a hand if needed (especially if I can do so as a freelancer ;D), not getting to finish this before I left and didn't have time anymore is one of my big regrets. |
@ragnarlonn, somewhat prompted by your questions, I've worked on pushing this issue forward as much as I can before I go on a long vacation/sabbatical for the next few months 😮💨 I'll also likely change teams a few months after I come back, so, like @liclac, I might not be the person who finishes this... 😅 😞 As you might see from the many issues and PR linked above, I've enhanced, refactored and split up my original distributed execution PoC from #2438 and #2816 into a few hopefully merge-able PRs. They are still very much proofs of concepts - very rough and not ready for any serious usage! The rest of the team also hasn't reviewed them or even approved the overall architecture. Nor has the work of reviewing, approving, merging and finishing up any remaining related tasks been prioritized. So this is still far from ready for production use... 😞 However, the distributed execution changes have been refactored into a few small, atomic and self-sufficient commits/PRs that should be safe to merge even as they currently are, since they should now be completely backwards compatible! 🤞 🎉 They no longer disable any unit tests or linters, or rely on the HDR histograms changes to be merged first! Support for HDR/sparse histograms still needs to be added before this feature can be used for big tests, but this can now be done after distributed execution has been merged! 🎉 Moreover, this distributed execution implementation doesn't affect the To wrap up, if anyone is interested in my proposal, I created a new issue to track all of its details and sub-tasks, #3218. I also wrote down my thoughts and ideas on the topic in a new design document, #3217. It is a long read, but I tried to provide the maximum amount of context so that everyone can grok the overall architecture and the reasons behind the PoC code. I've tried to make what exists as code and ideas in my head as easy as possible for someone else to adopt and built upon, so 🤞 😅 |
This is something that isn't top priority at the moment, but it's going to take a lot of design work, so I'd like to get the ball rolling on the actual planning.
My current idea is to have a
k6 hive
command (name subject to change), which hooks up to etcd - a lovely piece of software that can handle most of the heavy lifting around clustering, and is also the backbone that makes among other things Kubernetes tick.Each instance registers itself, along with how many VUs it can handle, exposes an API to talk to the cluster, and triggers a leader election. The information registered in etcd might have a structure like:
/loadimpact/k6/nodes/nodeA
- node data (JSON)/loadimpact/k6/nodes/nodeB
- node data (JSON)Running a test on the cluster is a matter of calling
k6 run --remote=https://a.node.address/ script.js
. This causes it to, instead of running the test locally, roll up all the data and files needed and push them to the cluster, where they're stored in etcd - available from each node./loadimpact/k6/test
- test data (JSON)/loadimpact/k6/test/src/...
- root for the pushed filesystemWhen test data is loaded, each VU instantiates an engine and its maximum number of VUs right away, and watches its own registry information for changes. The elected leader then takes care of patching other nodes to distribute VUs evenly across the cluster.
The text was updated successfully, but these errors were encountered: