Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix origins reload thread leak #290

Merged
merged 5 commits into from
Sep 25, 2018
Merged

Fix origins reload thread leak #290

merged 5 commits into from
Sep 25, 2018

Conversation

kvosper
Copy link
Contributor

@kvosper kvosper commented Sep 24, 2018

We have a bug in which thread pools are created each time origins are reloaded and never shut down.
This PR fixes the bug and adds a couple of small utility classes in the e2e-suite that may help with debugging & testing in the future.

@@ -68,7 +68,7 @@ private StyxHttpClient(NettyConnectionFactory connectionFactory, Builder paramet
/**
* Indicates that a request should be sent using secure {@code https} protocol.
*
* @return a {@HttpClient.Transaction} instance that allows fluent method chaining
* @return a {@link HttpClient.Transaction} instance that allows fluent method chaining
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed this one: I don't think it's common practice to include links in the 'returns' line: doesn't the line with the signature already contain a link to the returned object ? Maybe we can simplify the javadoc for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are no changes related to the PR anyway, I'll remove them and we can resolve this another time.

/**
* Check the health of an origin.
*
* @param client TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment.

@@ -37,10 +38,11 @@
*
* @param id the backend service id
* @param healthCheckConfig configuration of the health check
* @param supplier of the actual function which decided if the origin is healthy or not
* @param healthCheckFunction of the actual function which decided if the origin is healthy or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the description and remove the TODO?

* @return the monitor of the origin
*/
OriginHealthStatusMonitor create(Id id, HealthCheckConfig healthCheckConfig, Supplier<OriginHealthCheckFunction> supplier);
OriginHealthStatusMonitor create(Id id, HealthCheckConfig healthCheckConfig, Supplier<OriginHealthCheckFunction> healthCheckFunction, HttpClient client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have moved the HttpClient creation outside of healthCheckFunction(), do we still need to use a supplier for the latter? I'm guessing it was used to lazily create the HttpClient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing the Supplier, but this caused tests to fail. I put it back since the change was out of the scope of this task.

@@ -154,20 +145,49 @@ public void onChange(Registry.Changes<BackendService> changes) {
.hostClientFactory(hostClientFactory)
.build();

pipeline = new ProxyToClientPipeline(newClientHandler(backendService, inventory, originStatsFactory), inventory);
pipeline = new ProxyToClientPipeline(newClientHandler(backendService, inventory, originStatsFactory), () -> {
inventory.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems here (and above in return runAsync(() -> {... ) we might be not following the principle that says we should keep statements in a function at the same level of abstraction. Following up on that, it also seems the pipeline could contain the logic to close itself instead of transferring the responsibility to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to avoid bloating the ProxyToClientPipeline constructor signature by providing a kind of listener that can react to close events, but I am open to alternatives.

try {
hostHealthMonitorExecutor.awaitTermination(10, SECONDS);
} catch (InterruptedException e) {
throw propagate(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the thread interrupted flag before propagating?
Also IIRC Throwables.propagate is deprecated from recent Guava versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the ForkJoinPool clears the thread interruption status (in awaitWork) so it's not a huge issue if we don't clear the status, but the path execution is different and we should never discard the interruption status as a principle.
However, maybe we should not use the JoinFork pool for blocking tasks?

Copy link
Contributor

@mikkokar mikkokar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good but few little details needs addressing before this can be approved.


private StyxHttpClient healthCheckClient(BackendService backendService) {
ConnectionSettings connectionSettings = new ConnectionSettings(
backendService.connectionPoolConfig().connectTimeoutMillis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionSettings class in created unnecessarily. The StyxHttpClient.connectTimeout takes an `int.

* <p>
* Two {@link ItemsDump} objects can be diffed to quickly see the difference between two collections.
*/
public class ItemsDump {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should go into styx-e2e-testsupport.

/**
* Helper methods for looking at threads.
*/
public final class Threads {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too probably belongs to e2e-testsupport module.

}

describe("Reload Origins Endpoint") {
it("should not leak threads") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Be careful with wording of the test case name:

describe("Styx health checking") ...

it ("cleans up resources after origins reload")

  1. Also please raise a GitHub issue for this problem, and add the issue number in a comment to this test case. The GitHub issue will become handy if this ever needs to be double committed to 0.7 branch in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created: #291

@mikkokar mikkokar merged commit fe8e0ad into ExpediaGroup:master Sep 25, 2018
@kvosper kvosper deleted the fixOriginsReloadThreadLeak branch October 10, 2018 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants