-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when serializing large string properties with System.Text.Json #51951
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsDescription:I've encountered a significant memory issue when serializing large In the larger context of my application, out of over 50,000 messages, only 3 messages contained large strings (exceeding 20MB in size). This resulted in the memory consumption jumping from around 300 MB to several gigabytes. When I switched to using JSON.NET as the serializer for ASP.NET, this problem did not manifest. Also, when I limited the message size to a maximum of 100,000 characters, the issue disappeared. Here's a simplified code snippet that reproduces the issue: [HttpGet]
public IEnumerable<LogMessage> Get()
{
var message = new StringBuilder();
for (int i = 0; i < 10_000; i++)
{
message.AppendLine(Guid.NewGuid().ToString());
}
var logMessage = new LogMessage
{
Message = message.ToString()
};
var logMessages = new List<LogMessage>();
for (int i = 0; i < 100; i++)
{
logMessages.Add(logMessage);
}
return logMessages;
}
public class LogMessage
{
public string Message { get; set; }
} Observations:
|
Does the issue persist if you pre-cache your |
The standard time frame for a log query is cached, so no database query is being executed in this scenario. |
Could you share a minimal repro? Ideally it should be a console app, but an aspnetcore app without DB access should work. |
I created this Repo |
I notice that the image concerns IAsyncEnumerable serialization, which is very different in behavior compared to regular IEnumerable serialization that the repro is doing. |
I tried many things before I found the suspected fault, this was one of them. |
I've been able to narrow down the reproduction to this: using System.Runtime;
using System.Text.Json;
Console.WriteLine($"IsServerGC: {GCSettings.IsServerGC}");
var largeString = new string('x', 400_000);
var enumerable = Enumerable.Repeat(largeString, 300);
while (true)
{
var stream = new MemoryStream();
await JsonSerializer.SerializeAsync(stream, enumerable);
// GC.Collect();
} Like your example, this will show an almost unbounded increase in process memory. It seems to only happen with server GC turned on (which it is in the case of aspnetcore apps), once I revert to workstation mode or if I force a collection on each iteration the problem goes away. Once I modified the code to reuse the same underlying using System.Runtime;
using System.Text.Json;
Console.WriteLine($"IsServerGC: {GCSettings.IsServerGC}");
var largeString = new string('x', 400_000);
var enumerable = Enumerable.Repeat(largeString, 300);
var stream = new MemoryStream();
while (true)
{
await JsonSerializer.SerializeAsync(stream, enumerable);
stream.Position = 0;
} the issue immediately went away, indicating that this is not caused by some leak in System.Text.Json; it instead appears to relate to how the GC handles collection of LOH buffers created by the If I were to simplify this further: using System.Runtime;
Console.WriteLine($"IsServerGC: {GCSettings.IsServerGC}");
while (true)
{
var bytes = new byte[120_000_000];
bytes.AsSpan().Fill(0x42);
} I get identical behavior. I'd be inclined to write this under by-design behavior of Server GC mode, but I defer to @Maoni0 on that. It isn't clear to me why this could be caused in the context of aspnetcore endpoints, but if I were to make an educated guess it might because the required buffers exceed the maximum size of the buffer pool being used, resulting in them being allocated on each request. @davidfowl might know. |
This is by mostly design as your char[] is landing on the LOH which means it'll be collected when Gen2s happen. Read the ASP.NET Core performance recommendations doc:
Outside of this, these are the related problems on the ASP.NET Core side: |
Thanks, looks like we can close this over the linked issues. |
FWIW there's a leak happening in using System.Text;
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var messages = GetMessages();
app.MapGet("/messages", () => messages);
app.Run();
static List<LogMessage> GetMessages()
{
var message = new StringBuilder();
for (int i = 0; i < 10_000; i++)
{
message.AppendLine(Guid.NewGuid().ToString());
}
var logMessage = new LogMessage
{
Message = message.ToString()
};
var logMessages = new List<LogMessage>();
for (int i = 0; i < 100; i++)
{
logMessages.Add(logMessage);
}
return logMessages;
}
public class LogMessage
{
public string Message { get; set; }
} |
@eiriktsarpalis can you show the trace/dump/profile? The pathological case is when a single JSON token is big because it needs to be fully buffered by the JSON serializer. The JSON serializer has its own buffer and Kestrel has it's own buffer (this is one of the reasons I still want this API) and data is copied into 4K kestrel buffers that stick around forever but are reused for other requests. |
I am very interested in seeing this solved or contributing. I have a massive problem with this in combination with Azure functions with limited memory. I'm practically always OOM and it is completely unworkable. I have spend the past three weeks debugging, tracing and trying workarounds. I'm a bit at a death end now. kind regards a |
I don't think that's the root cause in this case, from the previous examples compare the console app: Console.WriteLine($"IsServerGC: {GCSettings.IsServerGC}");
var enumerable = Enumerable.Repeat(new string('x', 400_000), 300);
var stream = new MemoryStream();
while (true)
{
await JsonSerializer.SerializeAsync(stream, enumerable);
stream.Position = 0;
} with the equivalent aspnetcore application: Console.WriteLine($"IsServerGC: {GCSettings.IsServerGC}");
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var enumerable = Enumerable.Repeat(new string('x', 400_000), 300);
app.MapGet("/messages", () => enumerable);
app.Run(); Using server GC, the first case shows constant (albeit increased) memory usage whereas the latter will demonstrate unbounded increase the more that endpoing is being hit. |
Thanks @eiriktsarpalis |
var enumerable = Enumerable.Repeat(new string('x', 400_000), 300);
var stream = new MemoryStream();
while (true)
{
await JsonSerializer.SerializeAsync(stream, enumerable);
stream.Position = 0;
} This example isn't a fair apples-to-apples comparison. Json uses
There is not an unbounded increase in memory. The memory on my 16 core machine peaks around 700MB with Server GC. If you look at what memory is being used, you can see that Json uses the classic doubling array sizes when it needs more memory and it seems to get to and stay at a steady state of 16MB Another semi-big chunk of memory (~40MB) when using Server GC is from I did a quick spike of dotnet/runtime#68586 which David mentioned earlier in this thread and it shows significant improvements in working set when writing large payloads. In short with Server GC and Json writing directly to the Pipe results in a peak of roughly 140MB. This is largely due to Json only getting a 2MB buffer from the array pool when writing instead of doubling all the way up to 16MB. I didn't look too deeply into the Json code to figure out why it grabs such big arrays from the array pool, but this might be a potential improvement to be looked into? I think we can close this issue. And potentially open a new issue against Json to not allocate such large arrays? And I'll follow up with some more detailed analysis of Json + Pipes in dotnet/runtime#68586. |
For whatever reason, I also can't reproduce an unbounded increase right now. I must have made a mistake back then.
Ultimately I think it all boils down to addressing this issue: dotnet/runtime#67337 |
Closing per discussion above. There isn't a leak, and there are other issues filed to improve these scenarios. |
Seems to be related: #55490 |
Description:
I've encountered a significant memory issue when serializing large
string
properties using the built-in System.Text.Json serializer in ASP.NET Core. This suggests a potential memory leak.In the larger context of my application, out of over 50,000 messages, only 3 messages contained large strings (exceeding 20MB in size). This resulted in the memory consumption jumping from around 300 MB to several gigabytes. When I switched to using JSON.NET as the serializer for ASP.NET, this problem did not manifest. Also, when I limited the message size to a maximum of 100,000 characters, the issue disappeared.
Here's a simplified code snippet that reproduces the issue:
Observations:
Get
method, there's a noticeable increase in memory consumption.The text was updated successfully, but these errors were encountered: