Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

javax.net.ssl.SSLException: SSLEngine closed already #1645

Closed
unoexperto opened this issue Jun 17, 2019 · 11 comments
Closed

javax.net.ssl.SSLException: SSLEngine closed already #1645

unoexperto opened this issue Jun 17, 2019 · 11 comments
Assignees
Labels
Milestone

Comments

@unoexperto
Copy link
Contributor

unoexperto commented Jun 17, 2019

Hi @slandelle. I've been haunted by this exception for couple years at least and after preliminary investigation I have feeling that AHC uses single SSLEngine for all instances of DefaultAsyncHttpClient class. Which means as soon as some bad domain botches it http clients of entire app stop working.

Exception is thrown in io.netty.handler.ssl.SslHandler.wrap(ChannelHandlerContext ctx, boolean inUnwrap) in line

if (result.getStatus() == Status.CLOSED) {
    buf.release();
    buf = null;
    promise.tryFailure(SSLENGINE_CLOSED);
    promise = null;
    // SSLEngine has been closed already.
    // Any further write attempts should be denied.
    pendingUnencryptedWrites.releaseAndFailAll(ctx, SSLENGINE_CLOSED);
    return;
}

And the moment it happens it's pretty much game over for the client. All subsequent attempts to send request simply hang.

Could you please clarify how it works currently and what to do?

@slandelle
Copy link
Contributor

I have feeling that AHC uses single SSLEngine for all instances of DefaultAsyncHttpClient class.

No, it creates a new SSLEngine for each connection.

Could you please clarify how it works currently and what to do?

  1. Upgrade to latest JDK in case of a JDK nug
  2. Upgrade to latest AHC is case it's a Netty bug
  3. Switch to netty-tcnative, eg with BoringSSL static
  4. Provide a way to reproduce

@unoexperto
Copy link
Contributor Author

unoexperto commented Jun 17, 2019

Switch to netty-tcnative, eg with BoringSSL static

Just tried. Same outcome.

Provide a way to reproduce

I'll try to create isolated example tomorrow. But I'm pretty sure it will occur quickly if you call https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=30789216&cmd=neighbor_score&retmode=json couple thousand times. That's pretty much what my project does. SSL dies in couple minutes of pounding NCBI API.

In the mean time could you please suggest workaround ? Why SSLEngine never recovers ? Can I forcibly re-initialize it right after I catch the exception?

@Mr00Anderson
Copy link

Mr00Anderson commented Jun 17, 2019

I was the one who put in that referenced issue #5860.

A little TL:DR. I was connecting and having a mixed volume of traffic with SSL. I was creating a mutli connection session using two TCP connections. The first one connected and stayed with SSL. The other connection connected with SSL and shared handshake data to link the two connections to one session/user. I would then remove the SSL from the pipeline on both side simultaneously on the second connection which would cause the error as described in #5860.

Have you tried with the most recent version of netty + listed in the #5860 commits that fixed the issue?

On a side note. One way to have a temp solution is to wrap the error causing logic in a loop and just renew unusable resources (or erroting one) or statistics to your engine and find the values that your exceeding and make a manual or auto adjusting rate limiting engine?

@slandelle
Copy link
Contributor

I suspect the problem has something to do with request retry.
It looks like your server frequently crashes connections (set max request retry to 0 and loop on the provided url with a 10ms pause and you’ll quickly see it).

@slandelle
Copy link
Contributor

I'm afraid you'll have to provide a reproducer. I can't hammer your url because of your rate limiter.

@unoexperto
Copy link
Contributor Author

unoexperto commented Jun 18, 2019

I'm afraid you'll have to provide a reproducer. I can't hammer your url because of your rate limiter.

@slandelle Here we go
https://github.com/unoexperto/bug-ahc-ssl-exception

It includes private API key (it's not my API btw) so rate limit is 10 calls / second.

I guess how quickly it reproduces is pretty random. First time it took me only couple minutes to get the exception. Second attempt took me 7 mins to fail.

In my production it takes ~30-50 mins to get to faulty state.


I'm on Ubuntu MATE 19.04

$ java --version
java 12.0.1 2019-04-16
Java(TM) SE Runtime Environment (build 12.0.1+12)
Java HotSpot(TM) 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)

@slandelle
Copy link
Contributor

@unoexperto I think I've fixed it. Is there any chance you can build from sources and try on your side?

@unoexperto
Copy link
Contributor Author

unoexperto commented Jun 19, 2019

@slandelle One test fails so I can't build it.

Tests run: 871, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 168.491 sec <<< FAILURE! - in TestSuite
testGetCompleteServicePrincipalName(org.asynchttpclient.spnego.SpnegoEngineTest)  Time elapsed: 0.006 sec  <<< FAILURE!
java.lang.AssertionError: null
	at org.asynchttpclient.spnego.SpnegoEngineTest.testGetCompleteServicePrincipalName(SpnegoEngineTest.java:138)

Is there maven command besides package I can use to build jars ?

@slandelle
Copy link
Contributor

mvn package -Dmaven.test.skip

@slandelle slandelle self-assigned this Jun 25, 2019
@slandelle slandelle added this to the 2.10.1 milestone Jun 25, 2019
@unoexperto
Copy link
Contributor Author

7 days of running in production - no issues. Thanks!

@slandelle
Copy link
Contributor

Great news, thanks for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants