-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue #910: Bad packet received by server when hearbeat is enabled #911
Conversation
…beat is enabled
It would be nice if @exceptionfactory also looks into this PR, who has dealt with hearbeat issues in the past too. |
Thanks for summarizing this issue @rasantel. Unfortunately just moving the |
Thanks for checking, @exceptionfactory . Yes, the heartbeater still runs in its own thread, but by starting that thread after the key exchange completes, this change guarantees that it will always send hearbeats with the new keys. The bug happens when it's sent with the old keys. Maybe you mean that this could still be a problem if there is a re-keying later? I see now that the same bug could happen during re-keying. I'll look into providing a better solution. |
I may have been thinking of a different issue, so on further review, it looks like |
@exceptionfactory I updated the PR to address the possibility of this same bug happening during re-keying. I generalized the solution to add |
Thanks for implementing the generalized solution @rasantel. Hopefully @hierynomus can take a closer look soon. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #911 +/- ##
============================================
- Coverage 68.96% 68.45% -0.52%
+ Complexity 1448 1407 -41
============================================
Files 208 207 -1
Lines 7602 7504 -98
Branches 658 642 -16
============================================
- Hits 5243 5137 -106
- Misses 2009 2023 +14
+ Partials 350 344 -6 ☔ View full report in Codecov by Sentry. |
Problem
As mentioned in #910, when enabling the default keep-alive (the heartbeater) and then connecting, the ssh server receives a bad packet immediately after the key exchange and closes the connection.
Diagnosis
The sshd server logs show:
and the SSHJ logs show:
Based on the timings of these messages, I conjectured that the hearbeat (
SSH_MSG_IGNORE
=2) is sent by SSHJ to the server right after the server has sent and receivedSSG_MSG_NEWKEYS
and thus switched to the new keys (according to https://www.ietf.org/rfc/rfc4253.txt) but right before SSHJ itself has updated its keys, which happens atKeyExchanger.gotNewKeys
, when handling theSSG_MSG_NEWKEYS
from the server. As a consequence, the heartbeat is encoded with the old keys and the server tries to decode it with the new keys, which results in garbage (e.g.Bad packet length 2412619996
in the server log).I am able to reproduce the issue consistently by:
HeartBeater.doKeepAlive
.KeyExchanger.gotNewKeys
. Once this breakpoint is hit, the server has switched to the new keys but the SSHJ client hasn't yet.KeyExchanger.gotNewKeys
.After 4, the server logs will show
Bad packet length <garbage>
and after 5 the SSHJ client's reader thread or the hearbeater thread will detect a broken transport and close the connection.Solution
When the server and client are switching from the old to the new keys at the end of the key exchange, there shouldn't be any messages being sent other than final key exchange messages and new service requests (e.g. user auth).
Therefore, I propose to start of the heartbeater thread after, rather than before, the key exchange completes.Commit 2: Updated the PR to address the possibility of this same bug happening during re-keying. I generalized the solution to add
IGNORE
to the cases in whichTransportImpl.write
waits for an ongoing key exchange to complete before writing. This covers both the initial key exchange and any subsequent re-keying.Note that the other keep-alive provider,
KeepAliveRunner
, uses the messageGLOBAL_REQUEST
(80) which is already covered inTransportImpl.write
by the check!m.in(1, 49)
.Testing
Using the same breakpoints approach I used to reproduce the issue, I verified that now the heartbeater thread will block in
Transport.write
atkexer.waitForDone()
and will unblock only after the key exchange has completed. The server now always recognizes this message and no longer closes the connection.Note that, in theory, while the hearbeater is in
Transport.write
, another thread could potentially start a key exchange after theif (kexer.isKexOngoing())
check but before the actual write ofIGNORE
happens, and switch to new keys right after that write (so the server could still try to decode with new keys anIGNORE
encoded with old keys). However, thewriteLock
acquired at the beginning ofTransport.write
should guarantee that a key exchange cannot happen during that time because the exchange needs to write its own messages -- which it can't do until the lock is released.