-
Notifications
You must be signed in to change notification settings - Fork 13
Hornet SSL Oddity
This issue appears to be solved, here is the key commit.
Take the Hornet ping example. Configure it for SSL. Put a loop around it so that it runs forever. Create and close the HornetQ session around each ping.
Within a few seconds, the Producer will hang, and eventually timeout and fail. If you change any of the variables above, the problem disappears.
This problem is easy to reproduce. If maven has already installed the Internet on your local machine, you should be able to see the problem yourself within a few minutes, following the instructions in the next section. The problem seems consistent on a bunch of different releases of Hornet from 2.2 to the present.
Suspicion: there is some timing issue around shutdown of HornetQ sessions that use SSL.
The Consumer and Producer programs are adapted from the ping example that ships with Hornet, with three key modifications:
- two separate application mains
- infinite loop of messages, instead of 1
- SSL plus key and trust stores used by Datomic
You can run the consumer and producer as follows:
git clone git://github.com/Datomic/datomic-java-examples.git
cd datomic-java-examples
mvn compile
# from two different shells
mvn exec:java -Dexec.mainClass=hornet.samples.PingConsumer
mvn exec:java -Dexec.mainClass=hornet.samples.PingProducer
If you run first the consumer, and then the producer, they will attempt to loop forever, with the consumer consuming messages from the producer. In my tests, this loop fails in under a minute, with less than a few hundred pings.
Two errors jump out at me from the logs:
2012-12-02 09:33:02.630 DEBUG default o.jboss.netty.handler.ssl.SslHandler - Failed to clean up SSLEngine.
javax.net.ssl.SSLException: Inbound closed before receiving peer's close_notify: possible truncation attack?
2012-12-02 09:33:02.645 DEBUG default o.jboss.netty.handler.ssl.SslHandler - Swallowing an exception raised while writing non-app data
java.nio.channels.ClosedChannelException: null
These errors suggest some kind of lifecycle bug. Interestingly, the errors disappear if you run the same test without SSL:
If you run the same Consumer and Producer above, but the change each file's m.put("ssl-enabled", true);
to false
, the example no longer fails. The attached log shows a run more than 10K pings, two orders of magnitude more than the number of pings needed to make the SSL example above fall over.
If I return to the SSL-enabled code, and turn on detailed SSL logging with -Djavax.net.debug=all
, the problem seems to go away. This furthers my impression that there is some kind of timing problem around closing SSL sessions, and that that additional overhead of detailed SSL logging is enough to mask the problem.
In pom.xml: Exclude hornetq from Datomic dep, and add some other hornetq version as top level dep. The excerpt below shows where the two changes are:
<dependency>
<groupId>com.datomic</groupId>
<artifactId>datomic-free</artifactId>
<version>0.8.3627</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<!-- add the exclusion below -->
<exclusion>
<groupId>org.hornetq</groupId>
<artifactId>hornetq-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.0.1</version>
</dependency>
<!-- add this dependency, set version to taste -->
<dependency>
<groupId>org.hornetq</groupId>
<artifactId>hornetq-core</artifactId>
<version>2.2.21.Final</version>
</dependency>
- mean time to fail is quite different on different machines, but seems consistent on a particular machine. While all of my (Stu's) machines hit the failure fairly quickly, a few people seem never able to hit it.
- if you turn on detailed logging with
-Djavax.net.debug=all
, it takes many more pings to encounter the failure.
- reproduced on single machine when running wired, wireless, and no network
- reproduced on a bunch of different JDK versions
- reproduced on a bunch of different netty versions up to 3.5.7.Final
- reproduced on a bunch of differnet HornetQ versions, up to 2.3.0.Alpha
This question on StackOverflow might be related. Poster sees the same ClosedChannelException
exception, only when under heavy load. No resolution. :-(
On Dec 7 2012, Alex Redington reproduced the problem on ec2 pairhosts using Java version:
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
On Ubuntu 12.04. Over the course of 5 trials the pair timed out after the following counts of HornetQ txes:
- 44
- 44
- 283
- 44
- 76