Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS client leaves dangling threads after close, if NATS server was down when drain & close were invoked #324

Closed
adamkotwasinski opened this issue Jul 22, 2020 · 3 comments

Comments

@adamkotwasinski
Copy link

Hey, first and foremost thanks for creating NATS Java client.

I found that NATS connection leaves dangling non-daemon threads after .drain & .close methods have been invoked, potentially leaking resources.

Very simple example (run it with NATS server running):

    public static void main(final String[] args) throws Exception {

        // We connect without exceptions.
        final Connection connection = Nats.connect();

        // Now kill the server and kick off shutdown.
        System.in.read(new byte[2]);
        LOG.info("shutting down (server is dead now)");
        try {
            final CompletableFuture<Boolean> drainFuture = connection.drain(Duration.ofSeconds(5));
            LOG.info("drain: {}", drainFuture.get()); // Throws, as expected.
        }
        catch (final Exception e) {
            LOG.error("drain", e);
        }
        finally {
            connection.close(); // Returns okay.
        }

        // Main thread finishes.
        LOG.info("finished");
    }

what results in main thread finishing, but us having non-daemon NATS Connection Timer thread and one of (potentially daemon) worker threads:
Screen Shot 2020-07-22 at 13 50 48

After some time, the worker thread finally finishes, but it turns out that Connection Timer was never cleaned up:
Screen Shot 2020-07-22 at 13 50 18

I think the cause is that it's possible for close() to end early in https://github.com/nats-io/nats.java/blob/2.6.8/src/main/java/io/nats/client/impl/NatsConnection.java#L624 (after all, we wanted to drain, but just failed).

Changing worker threads to daemons would not help, as the timer itself is not a daemon (https://github.com/nats-io/nats.java/blob/2.6.8/src/main/java/io/nats/client/impl/NatsConnection.java#L462). And anyways, that would not be viable for users that want to initialize NATS connection multiple times over lifetime of VM (e.g. in application containers).

@matthiashanel
Copy link
Contributor

@adamkotwasinski , thanks for submitting this issue. I have verified it and am now weighing my options for a fix.

@matthiashanel
Copy link
Contributor

matthiashanel commented Jul 28, 2020

@sasbury @ColinSullivan1 , this leak happens because the fetch inside drain fails (

this.flush(timeout); // Flush and wait up to the timeout, if this fails, let the caller know
). When this happens, close will never make it past the draining check and thus won't perform the cleanup.

When flush throws an exception, I see two options:

  1. Set draining to null again (tracker won't be returned anyway), allowing close to pass the draining check OR
  2. Proceed as if flush had passed (possibly bypassing some of the other code and just calling close(checkDrainStatus = false)

I'm relatively new to java's futures api, do you have a preference?
(2 is probably the cleanest but most intrusive in terms of behavior change)

@ColinSullivan1
Copy link
Member

I see the comment that exceptions in flush were intended to be thrown back to the caller... So I suggest wrapping the flush in a try/catch, then in the catch cleanup up resources (this.close(false);) and throwing the original exception back up.

e.g. something like:

try {
    this.flush(timeout); // Flush and wait up to the timeout, if this fails, let the caller know
} catch (Exception e) {
    this.close(false);
    throw e;
}

There may be more cleanup to do but I think this could resolve the problem and keep the original behavior (and the authors intent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants