Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce port range re-use in tests #85777

Merged
merged 6 commits into from
Apr 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 60 additions & 28 deletions test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,6 @@ public abstract class ESTestCase extends LuceneTestCase {
private static final AtomicInteger portGenerator = new AtomicInteger();

private static final Collection<String> loggedLeaks = new ArrayList<>();
public static final int MIN_PRIVATE_PORT = 13301;

private HeaderWarningAppender headerWarningAppender;

Expand Down Expand Up @@ -1643,38 +1642,71 @@ public static boolean inFipsJvm() {
return Boolean.parseBoolean(System.getProperty(FIPS_SYSPROP));
}

/*
* [NOTE: Port ranges for tests]
*
* Some tests involve interactions over the localhost interface of the machine running the tests. The tests run concurrently in multiple
* JVMs, but all have access to the same network, so there's a risk that different tests will interact with each other in unexpected
* ways and trigger spurious failures. Gradle numbers its workers sequentially starting at 1 and each worker can determine its own
* identity from the {@link #TEST_WORKER_SYS_PROPERTY} system property. We use this to try and assign disjoint port ranges to each test
* worker, avoiding any unexpected interactions, although if we spawn enough test workers then we will wrap around to the beginning
* again.
*/

/**
* Defines the size of the port range assigned to each worker, which must be large enough to supply enough ports to run the tests, but
* not so large that we run out of ports. See also [NOTE: Port ranges for tests].
*/
private static final int PORTS_PER_WORKER = 30;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing 30 here because empirically it works in CI, whereas 40 didn't.


/**
* Defines the minimum port that test workers should use. See also [NOTE: Port ranges for tests].
*/
protected static final int MIN_PRIVATE_PORT = 13301;

/**
* Defines the maximum port that test workers should use. See also [NOTE: Port ranges for tests].
*/
private static final int MAX_PRIVATE_PORT = 36600;

/**
* Wrap around after reaching this worker ID.
*/
private static final int MAX_EFFECTIVE_WORKER_ID = (MAX_PRIVATE_PORT - MIN_PRIVATE_PORT - PORTS_PER_WORKER + 1) / PORTS_PER_WORKER - 1;

static {
assert getWorkerBasePort(MAX_EFFECTIVE_WORKER_ID) + PORTS_PER_WORKER - 1 <= MAX_PRIVATE_PORT;
}

/**
* Returns a unique port range for this JVM starting from the computed base port
* Returns a port range for this JVM according to its Gradle worker ID. See also [NOTE: Port ranges for tests].
*/
public static String getPortRange() {
return getBasePort() + "-" + (getBasePort() + 99); // upper bound is inclusive
}

protected static int getBasePort() {
// some tests use MockTransportService to do network based testing. Yet, we run tests in multiple JVMs that means
// concurrent tests could claim port that another JVM just released and if that test tries to simulate a disconnect it might
// be smart enough to re-connect depending on what is tested. To reduce the risk, since this is very hard to debug we use
// a different default port range per JVM unless the incoming settings override it
// use a non-default base port otherwise some cluster in this JVM might reuse a port

// We rely on Gradle implementation details here, the worker IDs are long values incremented by one for the
// lifespan of the daemon this means that they can get larger than the allowed port range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this old comment is still true, I think this change would no longer work when running gradle check multiple times locally without restarting the gradle daemon? I think that could be annoying. Perhaps it works differently now?

I am not sure if CI restarts the daemon either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn it, yes. I've asked @elastic/es-delivery for some input here. I would hope we don't use the daemon in CI, at least we didn't seem to in the runs this PR has had so far, but it'd be good to confirm this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen is right. The id generator we rely on here is created as part of a shared service in gradle. (see https://github.com/gradle/gradle/blob/394c3176354b8bb6565a0c92f1ea93175608f163/subprojects/core/src/main/java/org/gradle/internal/service/scopes/GradleUserHomeScopeServices.java#L185) These services are reused across multiple builds and operations scoped to a particular gradle user home directory.
My understanding is that we use the daemon locally AND on CI these days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we capture an offset very early in the build? I.e., capture the initial_worker_id and use worker_id - initial_worker_id for the port range?

// Ephemeral ports on Linux start at 32768 so we modulo to make sure that we don't exceed that.
// This is safe as long as we have fewer than 224 Gradle workers running in parallel
// See also: https://github.com/elastic/elasticsearch/issues/44134
final String workerIdStr = System.getProperty(ESTestCase.TEST_WORKER_SYS_PROPERTY);
final int startAt;
final var firstPort = getWorkerBasePort();
final var lastPort = firstPort + PORTS_PER_WORKER - 1; // upper bound is inclusive
assert MIN_PRIVATE_PORT <= firstPort && lastPort <= MAX_PRIVATE_PORT;
return firstPort + "-" + lastPort;
}

/**
* Returns the start of the port range for this JVM according to its Gradle worker ID. See also [NOTE: Port ranges for tests].
*/
protected static int getWorkerBasePort() {
final var workerIdStr = System.getProperty(ESTestCase.TEST_WORKER_SYS_PROPERTY);
if (workerIdStr == null) {
startAt = 0; // IDE
} else {
// we adjust the gradle worker id with mod so as to not go over the ephemoral port ranges, but gradle continually
// increases this value, so the mod can eventually become zero, thus we shift on both sides by 1
final long workerId = Long.valueOf(workerIdStr);
assert workerId >= 1 : "Non positive gradle worker id: " + workerIdStr;
startAt = Math.floorMod(workerId - 1, 223) + 1;
// running in IDE
return MIN_PRIVATE_PORT;
}
assert startAt >= 0 : "Unexpected test worker Id, resulting port range would be negative";
return MIN_PRIVATE_PORT + (startAt * 100);

final var workerId = Integer.parseInt(workerIdStr);
assert workerId >= 1 : "Non positive gradle worker id: " + workerIdStr;
return getWorkerBasePort(workerId % (MAX_EFFECTIVE_WORKER_ID + 1));
}

private static int getWorkerBasePort(int effectiveWorkerId) {
assert 0 <= effectiveWorkerId && effectiveWorkerId <= MAX_EFFECTIVE_WORKER_ID;
// the range [MIN_PRIVATE_PORT, MIN_PRIVATE_PORT+PORTS_PER_WORKER) is only for running outside of Gradle
return MIN_PRIVATE_PORT + PORTS_PER_WORKER + effectiveWorkerId * PORTS_PER_WORKER;
}

protected static InetAddress randomIp(boolean v4) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,12 @@ public void testWorkerSystemProperty() {
public void testBasePortGradle() {
assumeTrue("requires running tests with Gradle", System.getProperty("tests.gradle") != null);
// Gradle worker IDs are 1 based
assertNotEquals(10300, ESTestCase.getBasePort());
assertNotEquals(ESTestCase.MIN_PRIVATE_PORT, ESTestCase.getWorkerBasePort());
}

public void testBasePortIDE() {
assumeTrue("requires running tests without Gradle", System.getProperty("tests.gradle") == null);
assertEquals(10300, ESTestCase.getBasePort());
assertEquals(ESTestCase.MIN_PRIVATE_PORT, ESTestCase.getWorkerBasePort());
}

public void testRandomDateFormatterPattern() {
Expand Down