Enable caching of rest tests which use integ-test distribution #43782

mark-vieira · 2019-06-29T01:46:59Z

This pull request is the initial attempt and making our REST integration tests cacheable. We limit caching at this point only to those tests that leverage our integ-test distribution, primarily because cacheability of of tests running against our default distribution would likely be very poor since almost any change to the codebase would manifest in some way in the default distribution.

While not a tremendous amount of change we've done a couple of fundamental things here:

We've introduced a RestTestRunnerTask which simply extends Gradle's Test task and defines a property such that we can track the configured test clusters as an input. We have to do this through subclassing because the Gradle runtime input API (i.e. task.inputs.*) doesn't have an analog to the @Nested annotation. Our test cluster configurations are complex and need to allow for nested properties.

We've also decorated ElasticsearchNode with all the requisite input annotations such that all configuration options exposed by TestClusterConfiguration are tracked as inputs. We don't need to track everything about the setup of these clusters as any change to the test clusters plugin implementation code would also cause a change in cache key due to them sharing the classpath with RestTestRunnerTask. We mainly need to track two things a) all the options that you can tweak about a cluster in the build script (i.e. TestClusterConfiguration) and b) the distribution we are testing against itself.

All the options are tracked by adding various getters to ElasticsearchNode that return flattened and normalized version of those properties. To simplify a lot of the complexity that existed there around dealing with lazy property evaluation and null-checking, we've introduced LazyPropertyList and LazyPropertyMap to do the legwork here. These behave like a regular List or Map but internally delegate to a data structure that support lazy evaluation (by means of a Supplier), and configurable normalization strategies, so that some properties can be ignored, or otherwise appropriately normalized for the purposes of input snapshotting (e.x. a Map<String, String> should use @Input for snapshotting entry values but a Map<String, File> should use @InputFile).

For snapshotting of the distribution itself we've split it into two parts. All the JARs in the distribution which get treated as a runtime classpath, and everything else. By separating the JARs we can use a runtime classpath normalization strategy on them which ignores things like changes in JAR entry order, timestamps and any other classpath resources we've configured to be ignored such as manifests. Without this we'd never get a cache hit because two JARs built in two different builds would never by byte-for-byte identical. For all the other contents of the distribution we just treat as a normal collection of files. For any in installed plugins or modules we take a similar approach, spitting the plugin bundle JAR files from the rest of the content and snapshotting each with the appropriate normalization strategy.

The net result here is that any project whose test cluster is configured to use the integ-test distribution should have it's integration tests cached unless either the subproject itself, or something core upstream (e.x. :server or :distribution) has changed. As an example, here's a build scan showing a build running the check task for all :plugins projects when a single-line change was made to :plugins:ingest-attachment.

elasticmachine · 2019-06-29T01:47:01Z

Pinging @elastic/es-core-infra

alpar-t · 2019-07-01T15:32:50Z

I like the idea of RestTestRunnerTask. Having java types for the different types of integration tests we run makes for a straight forward and type sage configuration.

alpar-t

Without going into any of the details of the PR, I'm wondering if we could avoid the repetition and have a higher level construct that we could adopt to across the board for anything we do not want caching on. Something like:

   ignoredForCaching {
        setting '...', {}
   }

We could have the same for task inputs like system properties as well.

alpar-t

Looks good Mark! I'm really looking forward to see this live in CI.
Just a few comments.

alpar-t · 2019-07-01T15:37:21Z

buildSrc/src/main/java/org/elasticsearch/gradle/AbstractLazyPropertyCollection.java

+
+import java.util.List;
+
+public abstract class AbstractLazyPropertyCollection {


Might be wrong, but it looks like we could use generics instead of Objects here ?

Do you mean for getNormalizedCollection()? If so we cannot because implementations might map the value type to any other type for the purposes of input snapshotting. Since this is only used by Gradle for input snapshotting, we only really care about the type at runtime.

alpar-t · 2019-07-01T15:42:24Z

buildSrc/src/main/java/org/elasticsearch/gradle/LazyPropertyList.java

+
+    @Override
+    public Iterator<T> iterator() {
+        return delegate.stream().peek(this::validate).map(PropertyListEntry::getValue).collect(Collectors.toList()).iterator();


There's no need to collect, streams have an .iterator() method.

alpar-t · 2019-07-01T15:46:41Z

buildSrc/src/main/java/org/elasticsearch/gradle/test/RestTestRunnerTask.java

+@CacheableTask
+public class RestTestRunnerTask extends Test {
+
+    private Collection<ElasticsearchCluster> clusters = new ArrayList<>();


The TestClustersPlugin tracks this, we shouldn't add another place that keeps track.
We should come up with an API in the plugin that would allow the task to ask for all the clusters it uses so we keep a single record of truth.

That get's a little awkward as then the task would have to keep a reference back to the extension or similar. I think that's bad practice. It's the job of the plugin to do this wiring. The task should not have any knowledge of the test clusters plugin.

The task already cares about the clusters so does know a lot about the plugin.
We talked about being able to use clusters with specific types of tasks only in the past with @rjernst and agreed it's a good idea, it just never got done, so another way to think about it is to consider the task part of testClusters so that it's the only one that can use them. useCluster could become a method on this task instead of an extension, possibly saving us a bit at config time. All the house-keeping (claims, running cluster etc) could live in an extension so the task can reference it via the Project. That would make the plugin code more readable too. We have a handful of
Test tasks using testclusters but those could and should be RestTestRunnerTasks.

It's still not clear exactly what the proposal here is. If the suggestion is to move logic that exists currently in TestClustersPlugin into this task then frankly that's a flat 👎 for me. Tasks should be dumb. They should not contain complex logic for managing state or inputs aside from their intended purpose. That code should and is in the plugin. If that means exposing more properties on the task for the plugin to talk to it then so be it. At some point this becomes a philosophical argument but I'm going to remain stubborn here. RestIntegTestTask is the extreme example of what I'm talking about. A bunch of complex build logic that got sucked up into a task instead of a plugin and it's a nightmare.

I simply don't see the issue with this method. This task implementation is dead simple. All it does is expose a new bean property, and that's its only intended purpose. The task knows nothign about TestClustersPlugin. All it's aware of is that some collection of ElasticsearhClusters are an input to the task. It doesn't care where they come from, how long they live for, etc.

Going back and reading your comment I think I might better understand. If the idea is "if you want to use a test cluster you have to configure a RestTestRunnerTask" I think that's moving in the right direction. In the end what we want is something to replace RestIntegTestTask, that is, some central configurable DSL element that setups up all this stuff for us (test task, dependencies, test cluster, etc). What thing thing might be I think is still up for a discussion, but it needs to be higher level than a task as eventually I think we'd like this thing to potentially create source sets, etc. I end goal here is that for REST tests a user should simply need to dump YAML files in a folder, tell us what cluster configuration they need and be on their way.

In terms of this PR, RestTestRunnerTask is meant to solve a very specific problem which is to track an additional input. There's no implied intention for this task to solve the problems I state above, those need to be addressed elsewhere. I think it's worth discussing this more in our weekly tomorrow though as I do think we share the same high-level goals here.

alpar-t · 2019-07-01T15:49:14Z

buildSrc/src/main/java/org/elasticsearch/gradle/test/RestTestRunnerTask.java

+    }
+
+    @Nested
+    @Optional


Not sure if we really need this as it will never be null, or does it treat an empty collection like a null because of @Nested ? The doc only reads that it allows for the value to be "not specified" .

I think I initially put this in place during testing. It can probably be removed.

alpar-t · 2019-07-01T15:56:30Z

plugins/repository-azure/qa/microsoft-azure-storage/build.gradle

        String firstPartOfSeed = project.rootProject.testSeed.tokenize(':').get(0)
-        setting 'thread_pool.repository_azure.max', (Math.abs(Long.parseUnsignedLong(firstPartOfSeed, 16) % 10) + 1).toString()
+        setting 'thread_pool.repository_azure.max', (Math.abs(Long.parseUnsignedLong(firstPartOfSeed, 16) % 10) + 1).toString(), System.getProperty('ignore.tests.seed') == null ? DEFAULT : IGNORE_VALUE


Why a new property here ?

Not sure I follow. There's not new property. We are just conditionally ignoring this when we've instructed the build to ignore the test seed, since this property value is generated from the test seed itself.

My bad, didn't look careful enough and taught we started using a new system property.

alpar-t · 2019-07-01T16:00:49Z

buildSrc/src/main/java/org/elasticsearch/gradle/testclusters/ElasticsearchNode.java

+
+    @InputFiles
+    @PathSensitive(PathSensitivity.RELATIVE)
+    private FileCollection getDistributionFiles() {


This will run before we start clusters, so won't have any of the files we create as part of startup ( like the config file ). That's why we need to track the extra config files.
This all works as intended, just wanted to leave it here for other reviewers.

I think this actually opens up another issue, which is the installed plugins and modules won't be there either. I thought I tested this, but I need to verify.

Yep, I totally missed the fact that these get installed after the node starts. So my test build scan above "works" because the plugin under test is also on the test classpath but this won't work in all cases as we may install plugins/modules that aren't on the test runtime classpath. I'm lookin into a solution for this.

I've pushed a solution here. Essentially, we snapshot the bundle archives for each plugin/module and snapshot it similarly to how we do for the distribution itself by separating JARs from other contents.

I totally missed the fact that these get installed after the node starts

Can you explain this more? We should not be touching plugins or modules after the node starts, and it would not work (ES only loads plugins at startup).

That was poorly worded. More accurately, these get installed inside the ElasticsearchNode.start() method, but before we actually launch the ES process. So no, we don't actually muck with the installation after it's started. The key bit here is that this happens after the testing task begins execution, so input snapshotting has already happened.

This Pr is getting large, but we should probably improve this right after this is merged by having a configuration that inherits from testRuntime track all the plugins and modules that we need to install in the test cluster. That would make this tracking more straight forward as well

That would make this tracking more straight forward as well

I'm not convinced it would. A Configuration is nothing more than a FileCollection in this case as these are all unmanaged file dependencies. Also they are .zip files so we still have to deal with unpacking them and dealing with their contents intelligently. A Configuraiton would also not support any notion of a remote plugin configured via a URI so we'd have to manage that as well.

alpar-t · 2019-07-02T06:16:22Z

Without going into any of the details of the PR, I'm wondering if we could avoid the repetition and have a higher level construct that we could adopt to across the board for anything we do not want caching on. Something like:
   ignoredForCaching {
        setting '...', {}
   }
We could have the same for task inputs like system properties as well.

@mark-vieira you might have missed this one

…lasspath

mark-vieira · 2019-07-02T23:01:01Z

@elasticmachine run elasticsearch-ci/packaging-sample

mark-vieira · 2019-07-03T00:23:37Z

I'm pulling the trigger on this so we can get some data in Gradle Enterprise. 🚢

…ic#43782) (cherry picked from commit bfd8754)

mark-vieira added :Delivery/Build Build or test infrastructure v8.0.0 labels Jun 29, 2019

mark-vieira marked this pull request as ready for review June 29, 2019 15:33

mark-vieira requested review from alpar-t and rjernst June 29, 2019 15:33

alpar-t reviewed Jul 1, 2019

View reviewed changes

mark-vieira force-pushed the integ-test-caching branch from 83747b7 to 89180d2 Compare July 1, 2019 17:17

mark-vieira added 5 commits July 2, 2019 08:56

Enable caching of rest tests which use integ-test distribution

3fb5468

Don't explicit track plugins and modules since they are part of the c…

f5f4ecf

…lasspath

Ensure we use relative path sensitivity for test cluster distribution

1c92f5f

Address PR feedback

cb82aa9

Properly handle snapshotting installed plugins and modules

def871f

mark-vieira force-pushed the integ-test-caching branch from 3742cfd to def871f Compare July 2, 2019 15:56

mark-vieira merged commit bfd8754 into elastic:master Jul 3, 2019

mark-vieira added the backport pending label Jul 3, 2019

mark-vieira added a commit to mark-vieira/elasticsearch that referenced this pull request Jul 10, 2019

Enable caching of rest tests which use integ-test distribution (elast…

5276797

…ic#43782) (cherry picked from commit bfd8754)

mark-vieira mentioned this pull request Jul 10, 2019

[Backport] Enable caching of rest tests which use integ-test distribution #44181

Merged

mark-vieira added v7.4.0 and removed backport pending labels Jul 10, 2019

jkakavas mentioned this pull request Jul 23, 2019

:plugins:repository-azure:qa:microsoft-azure-storage failures #44740

Closed

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021


		import java.util.List;

		public abstract class AbstractLazyPropertyCollection {

Enable caching of rest tests which use integ-test distribution #43782

Enable caching of rest tests which use integ-test distribution #43782

Uh oh!

Conversation

mark-vieira commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jun 29, 2019

Uh oh!

alpar-t commented Jul 1, 2019

Uh oh!

alpar-t left a comment

Choose a reason for hiding this comment

Uh oh!

alpar-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mark-vieira Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mark-vieira Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alpar-t commented Jul 2, 2019

Uh oh!

mark-vieira commented Jul 2, 2019

Uh oh!

mark-vieira commented Jul 3, 2019

Uh oh!

Uh oh!

mark-vieira commented Jun 29, 2019 •

edited

Loading

mark-vieira Jul 2, 2019 •

edited

Loading

mark-vieira Jul 1, 2019 •

edited

Loading