-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856
Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856
Conversation
In some environments, writing a nil c-string to an std::ostream object crashes Squid. In some use cases, Security::IoResult::errorDescription may be nil. These changes handle the combination better while making higher-level reporting code simpler and safer. This change alters affected level-1 ERROR test lines a little, including removing duplicated connection information from clientNegotiateSSL() message (cache_log_message id=62). That duplication happened because Squid reporting the same Connection info automatically via CodeContext.
Debugging is for developers. Things like a TLS operation outcome often end up in async call arguments and other places where admin-friendly, multiline output complicates filtering/searching. Also, if code does not care about specifics like errorDetail (after it has been collected and before it is logged), we do not want to display those details every time we debug the result inside that code. Today, the only known debugging context for Security::IoResult is Security::PeerConnector::suspendNegotiation(), but that is likely to change as we upgrade TLS callers to library-independent wrappers beyond the current Security::Accept() and Security::Connect() pair. New WithExtras() mechanism may be useful for other "debugging and level-0/1 details ought to differ" cases as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments in this review show the difference between old and new debugs() messages based on a few sample errors. Three different error reporting contexts are related to this bug, although only the third one is known to be trigger crashes (in some cases).
debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: " << handshakeResult.errorDescription << | ||
" while accepting a TLS connection on " << conn->clientConnection << ": " << handshakeResult.errorDetail); | ||
debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: Cannot accept a TLS connection" << | ||
Debug::Extra << "problem: " << WithExtras(handshakeResult)); | ||
// TODO: No ConnStateData::tunnelOnError() on this forward-proxy code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ERROR: failure while accepting a TLS connection on conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1
- connection: conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1
+ ERROR: Cannot accept a TLS connection
+ problem: failure
+ error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1
+ connection: conn40 local=127.0.0.1:21443 remote=127.0.0.1:54910 FD 20 flags=1
Client-to-Squid connection is still reported here (via CodeContext) even though it is not visible in new code. That is likely to (automatically) change as we start allocating or preserving master transaction context earlier, but current PR reporting is OK.
debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: " << handshakeResult.errorDescription << | ||
" while SslBump-accepting a TLS connection on " << clientConnection << ": " << handshakeResult.errorDetail); | ||
debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: Cannot SslBump-accept a TLS connection" << | ||
Debug::Extra << "problem: " << WithExtras(handshakeResult)); | ||
updateError(errCategory = ERR_SECURE_ACCEPT_FAIL, handshakeResult.errorDetail); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ERROR: failure while SslBump-accepting a TLS connection on conn35 local=127.0.0.1:3130 remote=127.0.0.1:53250 FD 20 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1
- current master transaction: master53
+ ERROR: Cannot SslBump-accept a TLS connection
+ problem: failure
+ error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1
+ current master transaction: master54
No client-to-Squid connection reporting here because we do have a master transaction in this context. Admins are supposed to correlate that with their access.log entries if they need more information. We are using the same transaction/context-based approach for other level-0/1 messages as well.
RawPointer("detail: ", result.errorDetail).asExtra()); | ||
debugs(83, 2, "ERROR: Cannot establish a TLS connection" << | ||
Debug::Extra << "problem: " << WithExtras(result) << | ||
Debug::Extra << "connection: " << serverConnection()); | ||
recordNegotiationDetails(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above debugs() is where the reported bug was triggered, but all three debugs() have similar dangerous/buggy code.
- 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection to conn38 local=127.0.0.1:35992 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1:
- problem: failure
- detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1
+ 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection
+ problem: failure
+ error detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1
+ connection: conn38 local=127.0.0.1:58424 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1
We do want to report Squid-to-peer connection in such contexts because, unlike the client-to-Squid connection discussed elsewhere in this review, there may be multiple Squid-to-peer connections per master transaction and our access.log records do not record their details.
Please note that this is a level-2 message in current code. There is a corresponding TODO to upgrade "important" messages to level-1, and the message is already formatted as a level-0/1 message, so I used WithExtras() here.
I do not know why the labels were changed (there were no other changes related to this PR around July 29), but this PR has been awaiting review since July 1, 2024, after I responded to the initial change request and requested a review. I have now corrected the labels. The original request for review (dated July 1) is still active/valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the solution is not minimal for this specific bug, it does make sense from a logical point of view that we would print the object and not just its description at different levels detail as decided by the calling context.
I considered what could an alternative look like, it could be for instance two debugs() statements with different levels inspecting the object or new getters, but this is brittle and lots of duplicated code.
Every other change is boilerplate to make it happen with standard c++ semantics; it is well contained and doesn't spill over, I don't see why this shouldn't go it.
@yadij, if you don't have any specific objections I'd be inclined to dismiss your request for changes and land this; please speak up before I move ahead
Or @rousskov could reduce this PR to the necessary minimal (bug 5383) fix appropriate for porting to all the older versions and packages that need it. I see there is discussion mentions of (three?) additional bugs being fixed here, but no clear indication of what they are. I could guess, but others not so much. |
While it is possible to first avoid crashes in thee known problematic callers by duplicating code and then essentially undo that bad fix to implement a proper one (i.e. this PR), I should not be required to do so. This bug can be fixed in older Squid versions by writing duplicated code or by backporting this change. There are non-trivial tradeoffs involved in that choice; it is up to older version maintainers to decide how to handle that (and I am willing to help if needed). This master/v7 PR should be allowed to go in even if maintainers would prefer to just backport a problematic fix instead, especially after this PR has been waiting for 3+ months! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why GH marks me a "requested changes". I explicitly did not do a review earlier, just feedback comments.
Writing a nil c-string to an std::ostream object results in undefined behavior. When Security::IoResult::errorDescription is nil, that UB leads to crashes for some STL implementations. These changes avoid UB while making higher-level reporting code simpler and safer. This change alters affected level-1 ERROR test lines a little, including removing duplicated connection information from clientNegotiateSSL() message (cache_log_message id=62). That duplication existed because Squid reports the same Connection info automatically via CodeContext. New WithExtras() mechanism may be useful for other "low-level debugging and level-0/1 details for admins ought to differ" cases as well. Today, the only known debugging context for Security::IoResult is Security::PeerConnector::suspendNegotiation(), but that is likely to change as we upgrade TLS callers to library-independent wrappers beyond the current Security::Accept() and Security::Connect() pair.
You explicitly requested changes in response to my comment on July 1, 2024. After that, you left comments (without requesting changes), but that original negative vote naturally stays negative until it is explicitly dismissed or you approve PR. Thank you for dismissing it now. FWIW, writing a top-level comment when requesting changes would make that change request to stand out more on the primary PR page (rather than be buried deeper inside an (often poorly correlated) comment thread). |
Writing a nil c-string to an std::ostream object results in undefined
behavior. When Security::IoResult::errorDescription is nil, that UB
leads to crashes for some STL implementations. These changes avoid UB
while making higher-level reporting code simpler and safer.
This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication existed because
Squid reports the same Connection info automatically via CodeContext.
New WithExtras() mechanism may be useful for other "low-level debugging
and level-0/1 details for admins ought to differ" cases as well. Today,
the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.