Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage when using Mailkit in containers #57213

Closed
Tracked by #64488
mwalczyk81 opened this issue Aug 11, 2021 · 18 comments
Closed
Tracked by #64488

Memory usage when using Mailkit in containers #57213

mwalczyk81 opened this issue Aug 11, 2021 · 18 comments
Labels
area-System.Security needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@mwalczyk81
Copy link

Description

When using the Mailkit library for sending emails on containers we are seeing a drastic increase in memory consumed per email. This persists until the container runs out of memory and crashes. This issue was opened on the MailKit repo, but nothing was found in the library causing the issue.

I would expect the GC to free up the memory after the email is sent instead of allowing the consumption to build up until the container crashes.

Example code of sending emails -

        public async Task SendEmailAsync(string email, string subject, string htmlMessage) {
            var emailMetadata = _tenantMetadata.Value.Email;

            var message = new MimeMessage();
            //Need a string here and not null otherwise the call is ambiguous 
            message.To.Add(new MailboxAddress(string.Empty, email));
            message.From.Add(new MailboxAddress(emailMetadata.FromName, emailMetadata.FromAddress));
            message.Subject = subject;
            message.Body = new TextPart(TextFormat.Html) {
                Text = htmlMessage
            };

            var additionalData = new LoggerState() {
                { "From", emailMetadata.FromAddress },
                { "To", email },
                { "Subject", subject },
                { "Smtp location", emailMetadata.Server },
                { "Smtp port", emailMetadata.Port.ToString() }
            };

            using var emailClient = new SmtpClient();
            try {
                _logger.LogInformation("Sending email message.", additionalData);

                await emailClient.ConnectAsync(emailMetadata.Server, emailMetadata.Port);

                emailClient.AuthenticationMechanisms.Remove("XOAUTH2");

                await emailClient.SendAsync(message);

                _logger.LogInformation("Email successfully sent.", additionalData);
            } catch (Exception ex) {
                ex.Data.Add("AdditonalData", additionalData);
                _logger.LogError(ex, "Email failed to send", additionalData);
                throw;
            } finally {
                await emailClient.DisconnectAsync(true);
            }
        }

Configuration

.NET Core 3.1 3.1.13 runtime x64 on Ubuntu 18.04.5 LTS. Not sure if it's specific to this configuration.

Regression?

Using the legacy SmtpClient functions as expected.

Other information

Not validating the SSL certificate seems to fix this, but is not an option for me.

@adamsitnik adamsitnik transferred this issue from dotnet/core Aug 11, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 11, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Aug 11, 2021

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq, @GrabYourPitchforks
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When using the Mailkit library for sending emails on containers we are seeing a drastic increase in memory consumed per email. This persists until the container runs out of memory and crashes. This issue was opened on the MailKit repo, but nothing was found in the library causing the issue.

I would expect the GC to free up the memory after the email is sent instead of allowing the consumption to build up until the container crashes.

Example code of sending emails -

        public async Task SendEmailAsync(string email, string subject, string htmlMessage) {
            var emailMetadata = _tenantMetadata.Value.Email;

            var message = new MimeMessage();
            //Need a string here and not null otherwise the call is ambiguous 
            message.To.Add(new MailboxAddress(string.Empty, email));
            message.From.Add(new MailboxAddress(emailMetadata.FromName, emailMetadata.FromAddress));
            message.Subject = subject;
            message.Body = new TextPart(TextFormat.Html) {
                Text = htmlMessage
            };

            var additionalData = new LoggerState() {
                { "From", emailMetadata.FromAddress },
                { "To", email },
                { "Subject", subject },
                { "Smtp location", emailMetadata.Server },
                { "Smtp port", emailMetadata.Port.ToString() }
            };

            using var emailClient = new SmtpClient();
            try {
                _logger.LogInformation("Sending email message.", additionalData);

                await emailClient.ConnectAsync(emailMetadata.Server, emailMetadata.Port);

                emailClient.AuthenticationMechanisms.Remove("XOAUTH2");

                await emailClient.SendAsync(message);

                _logger.LogInformation("Email successfully sent.", additionalData);
            } catch (Exception ex) {
                ex.Data.Add("AdditonalData", additionalData);
                _logger.LogError(ex, "Email failed to send", additionalData);
                throw;
            } finally {
                await emailClient.DisconnectAsync(true);
            }
        }

Configuration

.NET Core 3.1 3.1.13 runtime x64 on Ubuntu 18.04.5 LTS. Not sure if it's specific to this configuration.

Regression?

Using the legacy SmtpClient functions as expected.

Other information

Not validating the SSL certificate seems to fix this, but is not an option for me.

Author: mwalczyk81
Assignees: -
Labels:

area-System.Security, untriaged

Milestone: -

@adamsitnik
Copy link
Member

@mwalczyk81 is there any chance that you could capture a GC dump and share it with us? https://github.com/dotnet/diagnostics/blob/main/documentation/dotnet-gcdump-instructions.md

@adamsitnik
Copy link
Member

I've set the area to Security as from the referenced issue it seems that the leak is specific to SSL certs on Linux.

@jstedfast
Copy link
Member

@adamsitnik based on the discussion in the MailKit issue, it sounds like it may have something to do with checking for certificate revocation.

In other words, if the checkCertificateRevocation argument to SslStream.AuthenticateAsClientAsync() is true, then this issue can be reproduced (at least with some servers?) but if it is false, then there is no gain in memory consumption.

Apparently different servers result in different amount of leakage. I tested with gmail and saw only a small leak but afaict, it was not coming from MailKit.

@mwalczyk81
Copy link
Author

@mwalczyk81 is there any chance that you could capture a GC dump and share it with us? https://github.com/dotnet/diagnostics/blob/main/documentation/dotnet-gcdump-instructions.md

Unfortunately I wasn't able to replicate this on my local machine and don't have access to the containers where the issue happens. I'll reach out again to other team members to see if I can get any help on that.

@bartonjs bartonjs added needs more info and removed untriaged New issue has not been triaged by the area owner labels Aug 18, 2021
@bartonjs bartonjs added this to the 7.0.0 milestone Aug 18, 2021
@ghost ghost added the no-recent-activity label Oct 9, 2021
@ghost
Copy link

ghost commented Oct 9, 2021

This issue has been automatically marked no recent activity because it has been marked as needs more info but has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no recent activity.

Please refer to our contribution guidelines for tips on what information might be required.

@mwalczyk81
Copy link
Author

Still working with our internal teams to recreate this and get memory dumps

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info no-recent-activity labels Oct 11, 2021
@mwalczyk81
Copy link
Author

@adamsitnik is there a place I can transfer a large file? The dump I finally got is 222MB.

@mwalczyk81
Copy link
Author

@jstedfast do you happen to have any ideas on where these files could go?

@jstedfast
Copy link
Member

Dropbox or OneDrive? Not sure what the maintainers would prefer

@Vandersteen
Copy link

Any updates on this issue ?

@rzikm
Copy link
Member

rzikm commented Jun 23, 2022

Any updates on this issue ?

@Vandersteen, would you be able to provide either a self-contained repro (i.e. sources + dockerfile) or GC dump or full memory dump? We have tried to reproduce the leak in cert validation in the past without success.

@Vandersteen
Copy link

Vandersteen commented Jun 23, 2022

I believe these are related:

#52577
#55672
jstedfast/MailKit#1105

Someone wrote a repro gist here: https://gist.github.com/MihaZupan/2e207c4c703272961b475e482e96afa3#file-certrevocationmemory-cs-L18-L30

Note: it might be dependent on the domain.

We used mailkit to send emails to mailtrap on local / dev and mailgun on stg / prd

We only had issues on stg / prd

We now used the

client.CheckCertificateRevocation = false;

And our memory leaks are gone

@rzikm rzikm self-assigned this Jun 23, 2022
@rzikm
Copy link
Member

rzikm commented Jun 24, 2022

I took some time to check this using repro from #55672, I checked using valgrind and there were no leaks reported that would originate from crypto or certs. I also dumped the malloc info using mallinfo2 in the middle and at the end of the run (I did 10 minutes which seemed sufficient, the Working set was around 720 MB at the end):

after 5 minutes:

Non-mmapped space allocated (bytes):        328,916,992
Number of free chunks:                           25,308
Number of free fastbin blocks:                  148,796
Number of mmapped regions:                           36
Space allocated in mmapped regions (bytes):  58,925,056
usmblks:                                              0
Space in freed fastbin blocks (bytes):        6,050,240
Total allocated space (bytes):               96,732,368
Total free space (bytes):                   232,184,624
Top-most, releasable space (bytes):              72,896

after 10 minutes:

Non-mmapped space allocated (bytes):        360,849,408
Number of free chunks:                           13,327
Number of free fastbin blocks:                   24,805
Number of mmapped regions:                           34
Space allocated in mmapped regions (bytes):  48,640,000
usmblks:                                              0
Space in freed fastbin blocks (bytes):          990,832
Total allocated space (bytes):               93,729,712
Total free space (bytes):                   267,119,696
Top-most, releasable space (bytes):              72,896

The main item of importance are the second and third items from the bottom, between the two measurements, the total free space (i.e. memory kept around by the allocator, but not allocated by the application) is growing, while the amount of memory actively being used by the app is more or less steady. This is in line with @bartonjs's suspicion from #52577 (comment).

The problem is that we likely have no way to affect this behavior, even calling malloc_trim does not help, as it would free 72kB at most (see the last item in the report).

One thing that still baffles me is how can the total free space grow, since we seem to pretty much just repeatedly allocate and later correctly free the certificate, so to my understanding this should not cause any sort of memory fragmentation which would prevent malloc from releasing more memory.

cc: @wfurt

@wfurt
Copy link
Member

wfurt commented Jun 25, 2022

Can you check counts on various safe handles related CRLs? I've seen similar patterns in the past where small chunks of memory were preventing consolidation and that pushed overall memory use up. You can try to add GC.Collect to aggressively collect managed objects to see if it makes any difference.

@rzikm
Copy link
Member

rzikm commented Jun 27, 2022

Looks like between iterations, there is not really much truly alive concerning certificates. No SafeX509Handles lying around:

DumpHeap listing
> dumpheap -live -type X509

[...]

Statistics:
              MT    Count    TotalSize Class Name
00007f29704f8728        1           24 System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+<>c[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]]
00007f29704f1db0        1           24 System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+<>c
00007f29704abc00        1           24 System.Collections.Generic.ObjectEqualityComparer`1[[System.Security.Cryptography.X509Certificates.X509Certificate2, System.Security.Cryptography]]
00007f2970356258        1           24 System.Security.Cryptography.X509Certificates.OpenSslX509Encoder
00007f2970354ac0        1           24 System.Security.Cryptography.X509Certificates.OpenSslX509CertificateReader+<>c
00007f297032f760        1           24 Interop+Crypto+X509VerifyStatusCode
00007f29704d4df0        1           32 System.Tuple`2[[Microsoft.Win32.SafeHandles.SafeX509StackHandle, System.Security.Cryptography],[Microsoft.Win32.SafeHandles.SafeX509StackHandle, System.Security.Cryptography]]
00007f29704db8d8        1           40 System.Runtime.CompilerServices.ConditionalWeakTable`2[[System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+ThreadLocalArray[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][], System.Private.CoreLib],[System.Object, System.Private.CoreLib]]
00007f29704daf48        1           40 System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]]
00007f29705abd10        1           48 System.Runtime.CompilerServices.ConditionalWeakTable`2+Enumerator[[System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+ThreadLocalArray[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][], System.Private.CoreLib],[System.Object, System.Private.CoreLib]]
00007f29704dfe20        2           48 System.Security.Cryptography.X509Certificates.X509ChainStatus[]
00007f2970351b30        2           48 System.Security.Cryptography.X509Certificates.OpenSslCachedSystemStoreProvider
00007f29704dc3a8        1           56 System.Runtime.CompilerServices.ConditionalWeakTable`2+Container[[System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+ThreadLocalArray[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][], System.Private.CoreLib],[System.Object, System.Private.CoreLib]]
00007f29704d9dd8        1           64 System.Func`3[[Microsoft.Win32.SafeHandles.SafeX509Handle, System.Security.Cryptography],[System.Int32, System.Private.CoreLib],[Microsoft.Win32.SafeHandles.SafeSharedAsn1OctetStringHandle, System.Security.Cryptography]]
00007f29704d7650        1           64 System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+MapVersionSpecificCode
00007f29704d2fa8        1           64 System.Func`2[[Microsoft.Win32.SafeHandles.SafeX509Handle, System.Security.Cryptography],[Microsoft.Win32.SafeHandles.SafeSharedAsn1IntegerHandle, System.Security.Cryptography]]
00007f29704ab828        1           64 Interop+Crypto+NegativeSizeReadMethod`1[[Microsoft.Win32.SafeHandles.SafeX509Handle, System.Security.Cryptography]]
00007f29703547a8        2          128 System.Func`2[[Microsoft.Win32.SafeHandles.SafeX509Handle, System.Security.Cryptography],[System.String, System.Private.CoreLib]]
00007f29703530b0        5          160 Microsoft.Win32.SafeHandles.SafeX509StackHandle
00007f29704d5038        3          168 System.Security.Cryptography.X509Certificates.OpenSslCachedDirectoryStoreProvider
00007f2970301af8        1          184 System.Security.Cryptography.X509Certificates.OpenSslX509ChainEventSource
00007f29704dc1a0        1          240 System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+PerCoreLockedStacks[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][]
00007f29704dc590        1          536 System.Runtime.CompilerServices.ConditionalWeakTable`2+Entry[[System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+ThreadLocalArray[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][], System.Private.CoreLib],[System.Object, System.Private.CoreLib]][]
00007f29704dc768       22         4752 System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection[]
00007f29704db758       22        10032 System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1+ThreadLocalArray[[System.Security.Cryptography.X509Certificates.OpenSslX509ChainProcessor+ErrorCollection, System.Security.Cryptography]][]
Total 76 objects

So the memory is not being held from managed side (and even explicit GC.Collect has little impact on the observed mallinfo stats). The few SafeX509StackHandle are the static instances for user/machine cert stores (and in this case are initialized once and not modified for the rest of the program)

@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 10, 2022
@rzikm rzikm removed their assignment Jul 15, 2022
@bartonjs
Copy link
Member

Using essentially the same test program as linked above, I see essentially no variation in the total virtual memory size of the process on .NET 6.0 on Ubuntu 18.04 during 3000 serial calls to revoked.badssl.com in either Online or Offline revocation modes.

Two caveats to "essentially no variation":

  • Every now and then there's a small up and down, +16k here, -16k after the next run.
  • The total virtual memory size does increase about 150MB after starting the test.
    • Sometimes this happens all together early on, sometimes it happens in two 75MB chunks.
    • It's hard to say if it's OpenSSL, or the GC deciding to stretch its legs, but I've never seen any growth (other than caveat 1) happen beyond the 24th request.

Since this is consistent with @rzikm's findings, I don't see any other action for us to take.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Security needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

8 participants