Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 5383: handleNegotiationResult() level-2 debugs() crash #1856

Conversation

rousskov
Copy link
Contributor

@rousskov rousskov commented Jul 1, 2024

Writing a nil c-string to an std::ostream object results in undefined
behavior. When Security::IoResult::errorDescription is nil, that UB
leads to crashes for some STL implementations. These changes avoid UB
while making higher-level reporting code simpler and safer.

This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication existed because
Squid reports the same Connection info automatically via CodeContext.

New WithExtras() mechanism may be useful for other "low-level debugging
and level-0/1 details for admins ought to differ" cases as well. Today,
the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.

rousskov added 3 commits June 30, 2024 17:07
In some environments, writing a nil c-string to an std::ostream object
crashes Squid. In some use cases, Security::IoResult::errorDescription
may be nil. These changes handle the combination better while making
higher-level reporting code simpler and safer.

This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication happened because
Squid reporting the same Connection info automatically via CodeContext.
Debugging is for developers. Things like a TLS operation outcome often
end up in async call arguments and other places where admin-friendly,
multiline output complicates filtering/searching. Also, if code does not
care about specifics like errorDetail (after it has been collected and
before it is logged), we do not want to display those details every time
we debug the result inside that code.

Today, the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.

New WithExtras() mechanism may be useful for other "debugging and
level-0/1 details ought to differ" cases as well.
Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments in this review show the difference between old and new debugs() messages based on a few sample errors. Three different error reporting contexts are related to this bug, although only the third one is known to be trigger crashes (in some cases).

debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: " << handshakeResult.errorDescription <<
" while accepting a TLS connection on " << conn->clientConnection << ": " << handshakeResult.errorDetail);
debugs(83, (handshakeResult.important ? Important(62) : 2), "ERROR: Cannot accept a TLS connection" <<
Debug::Extra << "problem: " << WithExtras(handshakeResult));
// TODO: No ConnStateData::tunnelOnError() on this forward-proxy code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ERROR: failure while accepting a TLS connection on conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1
-     connection: conn41 local=127.0.0.1:21443 remote=127.0.0.1:55000 FD 25 flags=1
+ ERROR: Cannot accept a TLS connection
+     problem: failure
+     error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00009C+TLS_IO_ERR=1
+     connection: conn40 local=127.0.0.1:21443 remote=127.0.0.1:54910 FD 20 flags=1

Client-to-Squid connection is still reported here (via CodeContext) even though it is not visible in new code. That is likely to (automatically) change as we start allocating or preserving master transaction context earlier, but current PR reporting is OK.

debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: " << handshakeResult.errorDescription <<
" while SslBump-accepting a TLS connection on " << clientConnection << ": " << handshakeResult.errorDetail);
debugs(83, (handshakeResult.important ? DBG_IMPORTANT : 2), "ERROR: Cannot SslBump-accept a TLS connection" <<
Debug::Extra << "problem: " << WithExtras(handshakeResult));
updateError(errCategory = ERR_SECURE_ACCEPT_FAIL, handshakeResult.errorDetail);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ERROR: failure while SslBump-accepting a TLS connection on conn35 local=127.0.0.1:3130 remote=127.0.0.1:53250 FD 20 flags=1: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1
-     current master transaction: master53
+ ERROR: Cannot SslBump-accept a TLS connection
+     problem: failure
+     error detail: SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A00018A+TLS_IO_ERR=1
+     current master transaction: master54

No client-to-Squid connection reporting here because we do have a master transaction in this context. Admins are supposed to correlate that with their access.log entries if they need more information. We are using the same transaction/context-based approach for other level-0/1 messages as well.

RawPointer("detail: ", result.errorDetail).asExtra());
debugs(83, 2, "ERROR: Cannot establish a TLS connection" <<
Debug::Extra << "problem: " << WithExtras(result) <<
Debug::Extra << "connection: " << serverConnection());
recordNegotiationDetails();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above debugs() is where the reported bug was triggered, but all three debugs() have similar dangerous/buggy code.

- 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection to conn38 local=127.0.0.1:35992 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1:
-     problem: failure
-     detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1
+ 83,2| PeerConnector.cc(281) handleNegotiationResult: ERROR: Cannot establish a TLS connection
+     problem: failure
+     error detail: SQUID_TLS_ERR_CONNECT+TLS_LIB_ERR=A000410+TLS_IO_ERR=1
+     connection: conn38 local=127.0.0.1:58424 remote=127.0.0.1:443 HIER_DIRECT FD 22 flags=1

We do want to report Squid-to-peer connection in such contexts because, unlike the client-to-Squid connection discussed elsewhere in this review, there may be multiple Squid-to-peer connections per master transaction and our access.log records do not record their details.

Please note that this is a level-2 message in current code. There is a corresponding TODO to upgrade "important" messages to level-1, and the message is already formatted as a level-0/1 message, so I used WithExtras() here.

yadij
yadij previously requested changes Jul 1, 2024
src/security/PeerConnector.cc Show resolved Hide resolved
@rousskov rousskov requested a review from yadij July 1, 2024 13:35
@rousskov rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Jul 1, 2024
@yadij yadij added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Jul 30, 2024
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Aug 21, 2024
@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Sep 13, 2024
@rousskov
Copy link
Contributor Author

@yadij added S-waiting-for-author and removed S-waiting-for-reviewer labels on Jul 29, 2024

I do not know why the labels were changed (there were no other changes related to this PR around July 29), but this PR has been awaiting review since July 1, 2024, after I responded to the initial change request and requested a review. I have now corrected the labels. The original request for review (dated July 1) is still active/valid.

Copy link
Contributor

@kinkie kinkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the solution is not minimal for this specific bug, it does make sense from a logical point of view that we would print the object and not just its description at different levels detail as decided by the calling context.

I considered what could an alternative look like, it could be for instance two debugs() statements with different levels inspecting the object or new getters, but this is brittle and lots of duplicated code.

Every other change is boilerplate to make it happen with standard c++ semantics; it is well contained and doesn't spill over, I don't see why this shouldn't go it.

@yadij, if you don't have any specific objections I'd be inclined to dismiss your request for changes and land this; please speak up before I move ahead

@yadij
Copy link
Contributor

yadij commented Oct 10, 2024

Or @rousskov could reduce this PR to the necessary minimal (bug 5383) fix appropriate for porting to all the older versions and packages that need it.

I see there is discussion mentions of (three?) additional bugs being fixed here, but no clear indication of what they are. I could guess, but others not so much.

@rousskov
Copy link
Contributor Author

Or @rousskov could reduce this PR to the necessary minimal (bug 5383) fix appropriate for porting to all the older versions and packages that need it.

While it is possible to first avoid crashes in thee known problematic callers by duplicating code and then essentially undo that bad fix to implement a proper one (i.e. this PR), I should not be required to do so.

This bug can be fixed in older Squid versions by writing duplicated code or by backporting this change. There are non-trivial tradeoffs involved in that choice; it is up to older version maintainers to decide how to handle that (and I am willing to help if needed). This master/v7 PR should be allowed to go in even if maintainers would prefer to just backport a problematic fix instead, especially after this PR has been waiting for 3+ months!

Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why GH marks me a "requested changes". I explicitly did not do a review earlier, just feedback comments.

squid-anubis pushed a commit that referenced this pull request Nov 5, 2024
Writing a nil c-string to an std::ostream object results in undefined
behavior. When Security::IoResult::errorDescription is nil, that UB
leads to crashes for some STL implementations. These changes avoid UB
while making higher-level reporting code simpler and safer.

This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication existed because
Squid reports the same Connection info automatically via CodeContext.

New WithExtras() mechanism may be useful for other "low-level debugging
and level-0/1 details for admins ought to differ" cases as well. Today,
the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.
@squid-anubis squid-anubis added M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-passed-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels Nov 5, 2024
@rousskov
Copy link
Contributor Author

rousskov commented Nov 5, 2024

Not sure why GH marks me a "requested changes".

You explicitly requested changes in response to my comment on July 1, 2024. After that, you left comments (without requesting changes), but that original negative vote naturally stays negative until it is explicitly dismissed or you approve PR. Thank you for dismissing it now.

FWIW, writing a top-level comment when requesting changes would make that change request to stand out more on the primary PR page (rather than be buried deeper inside an (often poorly correlated) comment thread).

@rousskov rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Nov 5, 2024
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-passed-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants