Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

rousskov · 2024-07-01T02:02:55Z

Writing a nil c-string to an std::ostream object results in undefined
behavior. When Security::IoResult::errorDescription is nil, that UB
leads to crashes for some STL implementations. These changes avoid UB
while making higher-level reporting code simpler and safer.

This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication existed because
Squid reports the same Connection info automatically via CodeContext.

New WithExtras() mechanism may be useful for other "low-level debugging
and level-0/1 details for admins ought to differ" cases as well. Today,
the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.

In some environments, writing a nil c-string to an std::ostream object crashes Squid. In some use cases, Security::IoResult::errorDescription may be nil. These changes handle the combination better while making higher-level reporting code simpler and safer. This change alters affected level-1 ERROR test lines a little, including removing duplicated connection information from clientNegotiateSSL() message (cache_log_message id=62). That duplication happened because Squid reporting the same Connection info automatically via CodeContext.

Debugging is for developers. Things like a TLS operation outcome often end up in async call arguments and other places where admin-friendly, multiline output complicates filtering/searching. Also, if code does not care about specifics like errorDetail (after it has been collected and before it is logged), we do not want to display those details every time we debug the result inside that code. Today, the only known debugging context for Security::IoResult is Security::PeerConnector::suspendNegotiation(), but that is likely to change as we upgrade TLS callers to library-independent wrappers beyond the current Security::Accept() and Security::Connect() pair. New WithExtras() mechanism may be useful for other "debugging and level-0/1 details ought to differ" cases as well.

rousskov

The comments in this review show the difference between old and new debugs() messages based on a few sample errors. Three different error reporting contexts are related to this bug, although only the third one is known to be trigger crashes (in some cases).

rousskov · 2024-07-01T02:06:40Z

src/client_side.cc

-        debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: " << handshakeResult.errorDescription <<
-               " while accepting a TLS connection on " << conn->clientConnection << ": " << handshakeResult.errorDetail);
+        debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: Cannot accept a TLS connection" <<
+               Debug::Extra << "problem: " << WithExtras(handshakeResult));
        // TODO: No ConnStateData::tunnelOnError() on this forward-proxy code


- ERROR: failure while accepting a TLS connection on conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1 - connection: conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1 + ERROR: Cannot accept a TLS connection + problem: failure + error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1 + connection: conn40 local=127.0.0.1:21443 remote=127.0.0.1:54910 FD 20 flags=1

Client-to-Squid connection is still reported here (via CodeContext) even though it is not visible in new code. That is likely to (automatically) change as we start allocating or preserving master transaction context earlier, but current PR reporting is OK.

rousskov · 2024-07-01T02:07:09Z

src/client_side.cc

-        debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: " << handshakeResult.errorDescription <<
-               " while SslBump-accepting a TLS connection on " << clientConnection << ": " << handshakeResult.errorDetail);
+        debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: Cannot SslBump-accept a TLS connection" <<
+               Debug::Extra << "problem: " << WithExtras(handshakeResult));
        updateError(errCategory = ERR_SECURE_ACCEPT_FAIL, handshakeResult.errorDetail);


- ERROR: failure while SslBump-accepting a TLS connection on conn35 local=127.0.0.1:3130 remote=127.0.0.1:53250 FD 20 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1 - current master transaction: master53 + ERROR: Cannot SslBump-accept a TLS connection + problem: failure + error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1 + current master transaction: master54

No client-to-Squid connection reporting here because we do have a master transaction in this context. Admins are supposed to correlate that with their access.log entries if they need more information. We are using the same transaction/context-based approach for other level-0/1 messages as well.

rousskov · 2024-07-01T02:07:53Z

src/security/PeerConnector.cc

-           RawPointer("detail: ", result.errorDetail).asExtra());
+    debugs(83, 2, "ERROR: Cannot establish a TLS connection" <<
+           Debug::Extra << "problem: " << WithExtras(result) <<
+           Debug::Extra << "connection: " << serverConnection());
    recordNegotiationDetails();


The above debugs() is where the reported bug was triggered, but all three debugs() have similar dangerous/buggy code.

- 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection to conn38 local=127.0.0.1:35992 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1: - problem: failure - detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1 + 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection + problem: failure + error detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1 + connection: conn38 local=127.0.0.1:58424 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1

We do want to report Squid-to-peer connection in such contexts because, unlike the client-to-Squid connection discussed elsewhere in this review, there may be multiple Squid-to-peer connections per master transaction and our access.log records do not record their details.

Please note that this is a level-2 message in current code. There is a corresponding TODO to upgrade "important" messages to level-1, and the message is already formatted as a level-0/1 message, so I used WithExtras() here.

src/security/PeerConnector.cc

rousskov · 2024-09-13T13:37:16Z

@yadij added S-waiting-for-author and removed S-waiting-for-reviewer labels on Jul 29, 2024

I do not know why the labels were changed (there were no other changes related to this PR around July 29), but this PR has been awaiting review since July 1, 2024, after I responded to the initial change request and requested a review. I have now corrected the labels. The original request for review (dated July 1) is still active/valid.

kinkie

While the solution is not minimal for this specific bug, it does make sense from a logical point of view that we would print the object and not just its description at different levels detail as decided by the calling context.

I considered what could an alternative look like, it could be for instance two debugs() statements with different levels inspecting the object or new getters, but this is brittle and lots of duplicated code.

Every other change is boilerplate to make it happen with standard c++ semantics; it is well contained and doesn't spill over, I don't see why this shouldn't go it.

@yadij, if you don't have any specific objections I'd be inclined to dismiss your request for changes and land this; please speak up before I move ahead

yadij · 2024-10-10T16:52:03Z

Or @rousskov could reduce this PR to the necessary minimal (bug 5383) fix appropriate for porting to all the older versions and packages that need it.

I see there is discussion mentions of (three?) additional bugs being fixed here, but no clear indication of what they are. I could guess, but others not so much.

rousskov · 2024-10-10T19:11:42Z

Or @rousskov could reduce this PR to the necessary minimal (bug 5383) fix appropriate for porting to all the older versions and packages that need it.

While it is possible to first avoid crashes in thee known problematic callers by duplicating code and then essentially undo that bad fix to implement a proper one (i.e. this PR), I should not be required to do so.

This bug can be fixed in older Squid versions by writing duplicated code or by backporting this change. There are non-trivial tradeoffs involved in that choice; it is up to older version maintainers to decide how to handle that (and I am willing to help if needed). This master/v7 PR should be allowed to go in even if maintainers would prefer to just backport a problematic fix instead, especially after this PR has been waiting for 3+ months!

yadij

Not sure why GH marks me a "requested changes". I explicitly did not do a review earlier, just feedback comments.

NOP

Writing a nil c-string to an std::ostream object results in undefined behavior. When Security::IoResult::errorDescription is nil, that UB leads to crashes for some STL implementations. These changes avoid UB while making higher-level reporting code simpler and safer. This change alters affected level-1 ERROR test lines a little, including removing duplicated connection information from clientNegotiateSSL() message (cache_log_message id=62). That duplication existed because Squid reports the same Connection info automatically via CodeContext. New WithExtras() mechanism may be useful for other "low-level debugging and level-0/1 details for admins ought to differ" cases as well. Today, the only known debugging context for Security::IoResult is Security::PeerConnector::suspendNegotiation(), but that is likely to change as we upgrade TLS callers to library-independent wrappers beyond the current Security::Accept() and Security::Connect() pair.

rousskov · 2024-11-05T14:18:27Z

Not sure why GH marks me a "requested changes".

You explicitly requested changes in response to my comment on July 1, 2024. After that, you left comments (without requesting changes), but that original negative vote naturally stays negative until it is explicitly dismissed or you approve PR. Thank you for dismissing it now.

FWIW, writing a top-level comment when requesting changes would make that change request to stand out more on the primary PR page (rather than be buried deeper inside an (often poorly correlated) comment thread).

rousskov added 3 commits June 30, 2024 17:07

fixup: Regenerated doc/debug-messages.dox

9900757

rousskov commented Jul 1, 2024

View reviewed changes

yadij previously requested changes Jul 1, 2024

View reviewed changes

src/security/PeerConnector.cc Show resolved Hide resolved

rousskov requested a review from yadij July 1, 2024 13:35

rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Jul 1, 2024

yadij added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Jul 30, 2024

squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Aug 21, 2024

rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Sep 13, 2024

rousskov mentioned this pull request Oct 2, 2024

Upgrade HTTP header field-line parser #1908

Draft

kinkie approved these changes Oct 8, 2024

View reviewed changes

rousskov mentioned this pull request Oct 28, 2024

Replace MEM_DREAD_CTRL pool with MEMPROXY class pool #1924

Closed

yadij removed their request for review November 5, 2024 10:49

yadij reviewed Nov 5, 2024

View reviewed changes

rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Nov 5, 2024

squid-anubis closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

rousskov commented Jul 1, 2024

rousskov left a comment

rousskov Jul 1, 2024

rousskov Jul 1, 2024

rousskov Jul 1, 2024

rousskov commented Sep 13, 2024

kinkie left a comment

yadij commented Oct 10, 2024

rousskov commented Oct 10, 2024

yadij left a comment

rousskov commented Nov 5, 2024

Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

Conversation

rousskov commented Jul 1, 2024

rousskov left a comment

Choose a reason for hiding this comment

rousskov Jul 1, 2024

Choose a reason for hiding this comment

rousskov Jul 1, 2024

Choose a reason for hiding this comment

rousskov Jul 1, 2024

Choose a reason for hiding this comment

rousskov commented Sep 13, 2024

kinkie left a comment

Choose a reason for hiding this comment

yadij commented Oct 10, 2024

rousskov commented Oct 10, 2024

yadij left a comment

Choose a reason for hiding this comment

rousskov commented Nov 5, 2024