Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable caching of rest tests which use integ-test distribution #43782

Merged
merged 5 commits into from
Jul 3, 2019

Conversation

mark-vieira
Copy link
Contributor

@mark-vieira mark-vieira commented Jun 29, 2019

This pull request is the initial attempt and making our REST integration tests cacheable. We limit caching at this point only to those tests that leverage our integ-test distribution, primarily because cacheability of of tests running against our default distribution would likely be very poor since almost any change to the codebase would manifest in some way in the default distribution.

While not a tremendous amount of change we've done a couple of fundamental things here:

We've introduced a RestTestRunnerTask which simply extends Gradle's Test task and defines a property such that we can track the configured test clusters as an input. We have to do this through subclassing because the Gradle runtime input API (i.e. task.inputs.*) doesn't have an analog to the @Nested annotation. Our test cluster configurations are complex and need to allow for nested properties.

We've also decorated ElasticsearchNode with all the requisite input annotations such that all configuration options exposed by TestClusterConfiguration are tracked as inputs. We don't need to track everything about the setup of these clusters as any change to the test clusters plugin implementation code would also cause a change in cache key due to them sharing the classpath with RestTestRunnerTask. We mainly need to track two things a) all the options that you can tweak about a cluster in the build script (i.e. TestClusterConfiguration) and b) the distribution we are testing against itself.

All the options are tracked by adding various getters to ElasticsearchNode that return flattened and normalized version of those properties. To simplify a lot of the complexity that existed there around dealing with lazy property evaluation and null-checking, we've introduced LazyPropertyList and LazyPropertyMap to do the legwork here. These behave like a regular List or Map but internally delegate to a data structure that support lazy evaluation (by means of a Supplier), and configurable normalization strategies, so that some properties can be ignored, or otherwise appropriately normalized for the purposes of input snapshotting (e.x. a Map<String, String> should use @Input for snapshotting entry values but a Map<String, File> should use @InputFile).

For snapshotting of the distribution itself we've split it into two parts. All the JARs in the distribution which get treated as a runtime classpath, and everything else. By separating the JARs we can use a runtime classpath normalization strategy on them which ignores things like changes in JAR entry order, timestamps and any other classpath resources we've configured to be ignored such as manifests. Without this we'd never get a cache hit because two JARs built in two different builds would never by byte-for-byte identical. For all the other contents of the distribution we just treat as a normal collection of files. For any in installed plugins or modules we take a similar approach, spitting the plugin bundle JAR files from the rest of the content and snapshotting each with the appropriate normalization strategy.

The net result here is that any project whose test cluster is configured to use the integ-test distribution should have it's integration tests cached unless either the subproject itself, or something core upstream (e.x. :server or :distribution) has changed. As an example, here's a build scan showing a build running the check task for all :plugins projects when a single-line change was made to :plugins:ingest-attachment.

@mark-vieira mark-vieira added :Delivery/Build Build or test infrastructure v8.0.0 labels Jun 29, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@mark-vieira mark-vieira marked this pull request as ready for review June 29, 2019 15:33
@mark-vieira mark-vieira requested review from alpar-t and rjernst June 29, 2019 15:33
@alpar-t
Copy link
Contributor

alpar-t commented Jul 1, 2019

I like the idea of RestTestRunnerTask. Having java types for the different types of integration tests we run makes for a straight forward and type sage configuration.

Copy link
Contributor

@alpar-t alpar-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without going into any of the details of the PR, I'm wondering if we could avoid the repetition and have a higher level construct that we could adopt to across the board for anything we do not want caching on. Something like:

   ignoredForCaching {
        setting '...', {}
   }

We could have the same for task inputs like system properties as well.

Copy link
Contributor

@alpar-t alpar-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Mark! I'm really looking forward to see this live in CI.
Just a few comments.


import java.util.List;

public abstract class AbstractLazyPropertyCollection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be wrong, but it looks like we could use generics instead of Objects here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for getNormalizedCollection()? If so we cannot because implementations might map the value type to any other type for the purposes of input snapshotting. Since this is only used by Gradle for input snapshotting, we only really care about the type at runtime.


@Override
public Iterator<T> iterator() {
return delegate.stream().peek(this::validate).map(PropertyListEntry::getValue).collect(Collectors.toList()).iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to collect, streams have an .iterator() method.

@CacheableTask
public class RestTestRunnerTask extends Test {

private Collection<ElasticsearchCluster> clusters = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestClustersPlugin tracks this, we shouldn't add another place that keeps track.
We should come up with an API in the plugin that would allow the task to ask for all the clusters it uses so we keep a single record of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That get's a little awkward as then the task would have to keep a reference back to the extension or similar. I think that's bad practice. It's the job of the plugin to do this wiring. The task should not have any knowledge of the test clusters plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task already cares about the clusters so does know a lot about the plugin.
We talked about being able to use clusters with specific types of tasks only in the past with @rjernst and agreed it's a good idea, it just never got done, so another way to think about it is to consider the task part of testClusters so that it's the only one that can use them. useCluster could become a method on this task instead of an extension, possibly saving us a bit at config time. All the house-keeping (claims, running cluster etc) could live in an extension so the task can reference it via the Project. That would make the plugin code more readable too. We have a handful of
Test tasks using testclusters but those could and should be RestTestRunnerTasks.

Copy link
Contributor Author

@mark-vieira mark-vieira Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear exactly what the proposal here is. If the suggestion is to move logic that exists currently in TestClustersPlugin into this task then frankly that's a flat 👎 for me. Tasks should be dumb. They should not contain complex logic for managing state or inputs aside from their intended purpose. That code should and is in the plugin. If that means exposing more properties on the task for the plugin to talk to it then so be it. At some point this becomes a philosophical argument but I'm going to remain stubborn here. RestIntegTestTask is the extreme example of what I'm talking about. A bunch of complex build logic that got sucked up into a task instead of a plugin and it's a nightmare.

I simply don't see the issue with this method. This task implementation is dead simple. All it does is expose a new bean property, and that's its only intended purpose. The task knows nothign about TestClustersPlugin. All it's aware of is that some collection of ElasticsearhClusters are an input to the task. It doesn't care where they come from, how long they live for, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back and reading your comment I think I might better understand. If the idea is "if you want to use a test cluster you have to configure a RestTestRunnerTask" I think that's moving in the right direction. In the end what we want is something to replace RestIntegTestTask, that is, some central configurable DSL element that setups up all this stuff for us (test task, dependencies, test cluster, etc). What thing thing might be I think is still up for a discussion, but it needs to be higher level than a task as eventually I think we'd like this thing to potentially create source sets, etc. I end goal here is that for REST tests a user should simply need to dump YAML files in a folder, tell us what cluster configuration they need and be on their way.

In terms of this PR, RestTestRunnerTask is meant to solve a very specific problem which is to track an additional input. There's no implied intention for this task to solve the problems I state above, those need to be addressed elsewhere. I think it's worth discussing this more in our weekly tomorrow though as I do think we share the same high-level goals here.

}

@Nested
@Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really need this as it will never be null, or does it treat an empty collection like a null because of @Nested ? The doc only reads that it allows for the value to be "not specified" .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I initially put this in place during testing. It can probably be removed.

String firstPartOfSeed = project.rootProject.testSeed.tokenize(':').get(0)
setting 'thread_pool.repository_azure.max', (Math.abs(Long.parseUnsignedLong(firstPartOfSeed, 16) % 10) + 1).toString()
setting 'thread_pool.repository_azure.max', (Math.abs(Long.parseUnsignedLong(firstPartOfSeed, 16) % 10) + 1).toString(), System.getProperty('ignore.tests.seed') == null ? DEFAULT : IGNORE_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new property here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. There's not new property. We are just conditionally ignoring this when we've instructed the build to ignore the test seed, since this property value is generated from the test seed itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, didn't look careful enough and taught we started using a new system property.


@InputFiles
@PathSensitive(PathSensitivity.RELATIVE)
private FileCollection getDistributionFiles() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run before we start clusters, so won't have any of the files we create as part of startup ( like the config file ). That's why we need to track the extra config files.
This all works as intended, just wanted to leave it here for other reviewers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually opens up another issue, which is the installed plugins and modules won't be there either. I thought I tested this, but I need to verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I totally missed the fact that these get installed after the node starts. So my test build scan above "works" because the plugin under test is also on the test classpath but this won't work in all cases as we may install plugins/modules that aren't on the test runtime classpath. I'm lookin into a solution for this.

Copy link
Contributor Author

@mark-vieira mark-vieira Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a solution here. Essentially, we snapshot the bundle archives for each plugin/module and snapshot it similarly to how we do for the distribution itself by separating JARs from other contents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed the fact that these get installed after the node starts

Can you explain this more? We should not be touching plugins or modules after the node starts, and it would not work (ES only loads plugins at startup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was poorly worded. More accurately, these get installed inside the ElasticsearchNode.start() method, but before we actually launch the ES process. So no, we don't actually muck with the installation after it's started. The key bit here is that this happens after the testing task begins execution, so input snapshotting has already happened.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Pr is getting large, but we should probably improve this right after this is merged by having a configuration that inherits from testRuntime track all the plugins and modules that we need to install in the test cluster. That would make this tracking more straight forward as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make this tracking more straight forward as well

I'm not convinced it would. A Configuration is nothing more than a FileCollection in this case as these are all unmanaged file dependencies. Also they are .zip files so we still have to deal with unpacking them and dealing with their contents intelligently. A Configuraiton would also not support any notion of a remote plugin configured via a URI so we'd have to manage that as well.

@mark-vieira mark-vieira force-pushed the integ-test-caching branch from 83747b7 to 89180d2 Compare July 1, 2019 17:17
@alpar-t
Copy link
Contributor

alpar-t commented Jul 2, 2019

Without going into any of the details of the PR, I'm wondering if we could avoid the repetition and have a higher level construct that we could adopt to across the board for anything we do not want caching on. Something like:

   ignoredForCaching {
        setting '...', {}
   }

We could have the same for task inputs like system properties as well.

@mark-vieira you might have missed this one

@mark-vieira mark-vieira force-pushed the integ-test-caching branch from 3742cfd to def871f Compare July 2, 2019 15:56
@mark-vieira
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample

@mark-vieira
Copy link
Contributor Author

I'm pulling the trigger on this so we can get some data in Gradle Enterprise. 🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants