[release/8.0] Removed unused sessions from SSL_CTX internal cache #102095
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #101684 to release/8.0-staging
/cc @wfurt @rzikm
Customer Impact
Reported by customer via official support. Small repro available.
Customers on Linux sees increased memory usage when establishing parallel connections to the same host (note that parallel requests on HTTP/1.1 will always use parallel connections). Measured overhead can be easily 100M+, which is problem for containers in k8s clusters limited to 300M memory.
It also helped one customer in general "memory problems with .NET 6 -> 8 upgrade" issue - see comment.
Workaround is lowering TLS cache size via:
System.Net.Security.TlsCacheSize
AppCtx switch, orDOTNET_SYSTEM_NET_SECURITY_TLSCACHESIZE
environment variableTechnical details:
The mechanism of the (bounded) memory leak is as follows:
The fix is to keep the two caches in sync and remove the dropped TLS session tickets from the internal cache as well.
Regression
Yes, the bug is part of TLS Session resumption feature on Linux, introduced in .NET 7. For customers migrating from .NET 6 it manifests as E2E scenario regression.
Testing
Tested on customer provided minimal repro.
Customer was not willing to verify privates in production.
Note: Customer confirmed that the workaround helps them in production, which means we have high confidence, this fix is the real root cause of their production problems and will help them.
Risk
Low, the issue is well understood and the change is localized to the feature. Functional tests verified TLS resumption works.