-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a better way to declare nodes/clusters/cluster formation during the build #30904
Comments
Pinging @elastic/es-core-infra |
Another possibly interesting use-case of managing the clusters in this way is that we might fingerprint the config (distro, plugins, config etc ) and instead of spinning one up for each test, clean up and re-use running instances. |
I don't think we have the exact same config often enough, nor do I think the potential savings would be worth the headache of non-reproducibility (ie if the cluster is altered in some way by a different test runner that has a side affect on later tests run). |
It is desirable to have the ability to configure individual nodes in a cluster with node specific settings. For example, given a 3 node cluster the ability to enable machine learning ( I would imagine this ability is also useful for testing ingest nodes, dedicated masters and data only nodes. @atorok can you please consider this request when designing the DSL |
@davidkyle I am considering this. I'm focusing on single node cluster first, but I see multi node as a composite, so the DSL could be applied both to cluster level for common config, and node level for customization in a similar manner. I think that will address the needs you describe. Randomization support in the build is a different topic orthogonal to cluster formation. I can imagine other uses for it, and I think it would be useful to add support for it at some point. |
We now have the |
Todo:
testClusters
andTestClustersPlugin
ditchingClusterFormation
DSL Glimpse
Produces this output:
Initial Description
The current cluster formation has the following limitations:
--parallel
, and as such has support for no parallelism ( note thattest.jvm
doesn't help here, these tests always run in sequence)The main reason
--parallel
does not work is that Gradle'sfinalizedBy
does not offer any guarantees about when the task will be run. We sue this for stopping clusters, but when running with parallel Gradle puts that off so that one can end up running with 40+ es nodes ( 512mb * 40 ~ 20GB ) before running out of memory and build starting to fail because of this. There is no easy fix for this, other than setting up a bunch ofmustRunAfter
rules fro the different tasks. Some test run across clusters, upgrade and restart nodes, etc we can't make any assumptions about when the stop tasks is safe to run, so we can't really enforce a "stop after test runner for this cluster completed" rule as the test runners of other clusters might still need this cluster.Even after doing some hacks to bring down the nodes sooner and not run out of memory,
--parallel
uncovered some missing ordering relations between tasks that were causing failures.From some limited testing, I estimate build time could be reduced by at least 30% by being able to run integ tests in parallel (based on running
:qa:check
on my 6 physical core CPU with 32GB ram).From what I can see, this is the only thing preventing us from simply running builds with
clean check --parallel
without having to pick and choose what works in parallel and what doesn't.I think we should create a cluster formation DSL that does not rely on Gradle tasks to perform it's operations. We would still use gradle to fetch and set up distributions, but everything else would be externalized. The DSL would provide configuration for the cluster and expose methods to alter it's state (start/stop the cluster or individual nodes, change configuration etc ).
There would be methods for high level operations like starting and stopping the cluster, and running tests as well as lower level operations that can manipulate at the node level.
No operation would be carried out by default, a task would have to be set up that calls these operation from the task action (or as
doLast
). We can provide a task as well with the option to control if it's created to cover the common setup of setting up cluster, running tests and terminating.Of course we would need to have a way to run tests outside of Gradle, but since we don't use it's infrastructure to do it anyway, it shouldn't be that hard.
The custom DSL can make use of Gradles
NamedDomainObjectCollection
so plugins can change defaults for different sections of the builds when a new cluster is defined.Related: #30874, #30903
The text was updated successfully, but these errors were encountered: