Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider clearing JsonSerializerOptions.Caching cache using a timer, not just based on incoming calls #76548

Closed
SteveSandersonMS opened this issue Oct 3, 2022 · 20 comments · Fixed by #76607
Assignees
Labels
Milestone

Comments

@SteveSandersonMS
Copy link
Member

First, it's totally possible I'm misinterpreting what STJ does here. We got this report saying that a Blazor Server app was holding hundreds of MB in memory for longer than expected, and using the VS memory inspection tools, it looks like at least one (possibly the only) thing pinning this data is rooted in JsonSerializerOptions's s_cache. The cache uses WeakReference for values, but the problem is with the keys.

It looks like the only thing that clears dangling entries is subsequent calls into the JSON serializer. This is likely sufficient in server scenarios where the server will never stop processing requests for any long duration, and even if it does, people probably don't care about freeing any memory until the server experiences memory pressure. However, for desktop scenarios, an app may stop doing things for a long time, and during that period, people really want it to release memory that's no longer required.

Suggestion

Instead of triggering TryEvictDanglingEntries only when there's a call to GetCachingContext, consider also having some guarantee like "it will always clear dangling entries within 10s even if there are no such incoming calls" or similar.

This isn't blocking anything in Blazor directly, but would be a good solution for people using STJ directly in desktop apps, or Blazor Server as an implementation detail of a desktop app (like this person).

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 3, 2022
@ghost
Copy link

ghost commented Oct 3, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

First, it's totally possible I'm misinterpreting what STJ does here. We got this report saying that a Blazor Server app was holding hundreds of MB in memory for longer than expected, and using the VS memory inspection tools, it looks like at least one (possibly the only) thing pinning this data is rooted in JsonSerializerOptions's s_cache. The cache uses WeakReference for values, but the problem is with the keys.

It looks like the only thing that clears dangling entries is subsequent calls into the JSON serializer. This is likely sufficient in server scenarios where the server will never stop processing requests for any long duration, and even if it does, people probably don't care about freeing any memory until the server experiences memory pressure. However, for desktop scenarios, an app may stop doing things for a long time, and during that period, people really want it to release memory that's no longer required.

Suggestion

Instead of triggering TryEvictDanglingEntries only when there's a call to GetCachingContext, consider also having some guarantee like "it will always clear dangling entries within 10s even if there are no such incoming calls" or similar.

This isn't blocking anything in Blazor directly, but would be a good solution for people using STJ directly in desktop apps, or Blazor Server as an implementation detail of a desktop app (like this person).

Author: SteveSandersonMS
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@SteveSandersonMS
Copy link
Member Author

Also, if this caching behavior was introduced in 7.0, then it could be argued that it's a regression. (Subjective, I know.)

@jkotas
Copy link
Member

jkotas commented Oct 3, 2022

Is the actual problem the shortcut described in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs#L165-L167? Should this code rather use ConditionalWeakTable to avoid unnecessarily keeping the keys around?

@eiriktsarpalis
Copy link
Member

Thanks @SteveSandersonMS, your analysis is accurate. At the time when the cache was implemented, we debated as to whether there should be a timer that ran evictions in the background but ultimately found that approach to be unsavory. In hindsight, I had not anticipated that user configuration in JsonSerializerOptions would realistically capture a large amount of memory but I guess I should have known better :-)

That being said, it is almost certain that the application in question is creating single-use JsonSerializerOptions instances, a known anti-pattern that would have caused performance problems even in .NET 6. The author of the original issue could almost certainly address the problem by creating singleton options instances. We're planning on adding an analyzer in the future which should help users avoid that particular anti-pattern.

Also, if this caching behavior was introduced in 7.0, then it could be argued that it's a regression. (Subjective, I know.)

I think we could definitely improve the cache based on this feedback but I don't necessarily think this requires a .NET 7 fix. I think we should wait on more user feedback in case it arrives.

Is the actual problem the shortcut described in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs#L165-L167? Should this code rather use ConditionalWeakTable to avoid unnecessarily keeping the keys around?

Original prototypes did use ConditionalWeakTable, effectively performing a linear search over all JsonSerializerOptions instances in the process:

internal static class TrackedOptionsInstances
{
/// <summary>Tracks all live JsonSerializerOptions instances.</summary>
/// <remarks>Instances are added to the table in their constructor.</remarks>
public static ConditionalWeakTable<JsonSerializerOptions, object?> All { get; } =
// TODO https://github.com/dotnet/runtime/issues/51159:
// Look into linking this away / disabling it when hot reload isn't in use.
new ConditionalWeakTable<JsonSerializerOptions, object?>();
}

But IIRC enumeration performance for the particular type was bad enough that it almost invalidated any performance benefits gained from caching.

@eiriktsarpalis eiriktsarpalis added tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Oct 3, 2022
@eiriktsarpalis eiriktsarpalis self-assigned this Oct 3, 2022
@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Oct 3, 2022
@willdean
Copy link
Contributor

willdean commented Oct 3, 2022

That being said, it is almost certain that the application in question is creating single-use JsonSerializerOptions instances, a known anti-pattern that would have caused #40072. The author of the original issue could almost certainly address the problem by creating singleton options instances.

I think the original author was using an unmodified (except for the ability to manually trigger GC) Blazor template to demonstrate the problem.

@chrdlx
Copy link

chrdlx commented Oct 3, 2022

That being said, it is almost certain that the application in question is creating single-use JsonSerializerOptions instances, a known anti-pattern that would have caused performance problems even in .NET 6. The author of the original issue could almost certainly address the problem by creating singleton options instances. We're planning on adding an analyzer in the future which should help users avoid that particular anti-pattern.

Hey @eiriktsarpalis Just in case you couldn't read the complete original issue (I'm the author) and just for clarification, the application in question which you are referring to is the Blazor framework itself using the default Blazor Server template.
So, if there're problems (like the anti-patterns you mentioned), they are in .NET 7 RC1 itself and its default templates, not user code.

Regards!

@chrdlx
Copy link

chrdlx commented Oct 3, 2022

Regarding what @SteveSandersonMS mentioned about a timer, you can configure the retention period of circuits for Blazor, for example:

builder.Services.AddServerSideBlazor(options => {
                options.DisconnectedCircuitMaxRetained = 5;
                options.DisconnectedCircuitRetentionPeriod = TimeSpan.FromSeconds(10);
            });

So maybe this should be tied somehow with the serializer cache clearing call, I guess it wouldn't make much sense to configure different amounts of time for each since one depends on the other to be useful.
I mean, the Blazor App could disconnect forcibly all the users and release their circuits, but if the instances are being held by the serializer cache, memory won't be released, which is the intention.

Regards!

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 3, 2022

Hi @chrdlx, that's interesting. If there are indeed keys in the cache that require eviction that would suggest to me that Blazor is using temporary options instances, which should be avoided under most circumstances. I'm not too familiar with Blazor internals however, is there anything in its configuration that could prompt single-use instances to be created? Perhaps some options instance only used for initialized purposes that then gets garbage collected? cc @SteveSandersonMS

@chrdlx
Copy link

chrdlx commented Oct 3, 2022

Hi @eiriktsarpalis ! I guess Steve is the right one for this kind of stuff. I use Blazor to develop Apps but sadly I don't have that deep knowledge about its inner workings to give you a proper answer on how to solve this. Regards!

@eiriktsarpalis
Copy link
Member

But IIRC enumeration performance for the ConditionalWeakTable was bad enough that it almost invalidated any performance benefits gained from caching.

For context, here are some benchmarks comparing the current caching strategy with one performing linear traversal on a ConditionalWeakTable:

public static CachingContext GetOrCreate(JsonSerializerOptions options)
{
    Debug.Assert(options.IsReadOnly, "Cannot create caching contexts for mutable JsonSerializerOptions instances");
    Debug.Assert(options._typeInfoResolver != null);

    OptionsEqualityComparer comparer = s_equalityComparer;
    foreach (KeyValuePair<JsonSerializerOptions, object?> entry in TrackedOptionsInstances.All)
    {
        JsonSerializerOptions otherOptions = entry.Key;
        if (otherOptions._cachingContext != null && comparer.Equals(options, otherOptions))
        {
            return otherOptions._cachingContext;
        }
    }
    return new CachingContext(options);
}
Method Job Method Mean Error StdDev Median Min Max Ratio MannWhitney(3%) RatioSD Gen 0 Gen 1 Gen 2 Allocated Alloc Ratio
NewCustomizedOptions Job-IEKUUC main 1.128 us 0.0618 us 0.0712 us 1.117 us 1.025 us 1.315 us 1.00 Base 0.00 0.0449 0.0045 0.0045 486 B 1.00
NewCustomizedOptions Job-ODPMMK ConditionalWeakTable 39.336 us 0.7793 us 0.8974 us 39.420 us 37.909 us 40.930 us 35.01 Slower 2.35 - - - 544 B 1.12
NewCustomizedOptions Job-TIQXHW No Caching 19.421 us 0.7321 us 0.8431 us 19.189 us 18.3138 us 21.025 us 18.06 Slower 1.22 0.7392 0.0821 - 7966 B 16.39

@SteveSandersonMS
Copy link
Member Author

SteveSandersonMS commented Oct 4, 2022

that would suggest to me that Blazor is using temporary options instances, which should be avoided under most circumstances

@eiriktsarpalis Blazor has a JsonSerializerOptions instance per circuit (user connection), and has done for over 3 years, at least since this commit. It does this because the serialized representation of certain types (DotNetObjectReference, in this case) is a per-circuit ID, and the act of serializing it for a JS interop call is what causes the instance to become tracked within the circuit. So within the per-circuit options, there is a JsonConverterFactory that holds a reference to the circuit.

I don't think we were aware that, at a certain point, it became expensive to create JsonSerializerOptions instances (or was this always the case?).

If you have a recommendation for a different way that a JsonConverter can access contextual objects specific to a particular seralization process, please let us know.

@willdean
Copy link
Contributor

willdean commented Oct 4, 2022

I don't think we were aware that, at a certain point, it became expensive to create JsonSerializerOptions instances (or was this always the case?).

Stephen Toub's 7.0 performance behemoth had some stuff about this:

https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/#json

It wouldn't have been intuitive to me that an "Options" class would be massively expensive.

Obviously that article contains the somewhat optimistic promise "... with appropriate removal when no longer used in order to avoid unbounded leaks" which overlooks the old saw about cache eviction being the hardest problem in computer science.

@eiriktsarpalis
Copy link
Member

I don't think we were aware that, at a certain point, it became expensive to create JsonSerializerOptions instances (or was this always the case?).

It wouldn't have been intuitive to me that an "Options" class would be massively expensive.

It's been like this since the introduction of the type -- it's essentially the same issue as HttpClient but more insidious since the name "options" doesn't suggest that the type defines its own reflection metadata caching context.

@SteveSandersonMS
Copy link
Member Author

It's been like this since the introduction of the type

What do you recommend as a solution? It does seem like the 7.0 caching technique change has led to an issue for non-server scenarios. Is there (or should there be) some way to signal that a particular JsonSerializerOptions instance can be cleared up? It's not disposable. Should it become disposable?

To be honest, I don't think we'd see this as being urgent enough to fix for 7.0 within Blazor since we only have evidence that it affects one person, and they already have a workaround. What we could do in Blazor in 8.0 though is add some logic to our circuit disposal code that makes it null out the reference from the JsonConverterFactory to the circuit, so the circuits at least can be collected even if there are some JsonSerializerOptions/JsonConverterFactory instances left stuck in the cache until more calls happen.

Since this could affect other (non-Blazor) desktop applications that use STJ, it might be worth having a more general solution such as making JsonSerializerOptions disposable, or clearing the cache on a timer, or going back to ConditionalWeakTable. If you think any of those are a possibility please let us know so we could make use of that in 8.0.

@eiriktsarpalis
Copy link
Member

I'm curious, what prompted requiring separate JsonSerializerOptions instances per connection? Presumably it's related to some form of dependency injection on custom converters? Would it be possible to have your converter instances remain fixed per-type application-wide and inject connection-specific state differently?

I know that System.Text.Json doesn't provide a good mechanism for scoping DI on a per-operation basis, here's a user story that attempts to cover this and related requirements: #63795.

@SteveSandersonMS
Copy link
Member Author

SteveSandersonMS commented Oct 4, 2022

I'm curious, what prompted requiring separate JsonSerializerOptions instances per connection?

This is what I mentioned above:

It does this because the serialized representation of certain types (DotNetObjectReference, in this case) is a per-circuit ID, and the act of serializing it for a JS interop call is what causes the instance to become tracked within the circuit. So within the per-circuit options, there is a JsonConverterFactory that holds a reference to the circuit.

Apologies if it's still unclear, but if you can clarify which part of this is unclear I'll try to clarify more!


Would it be possible to have your converter instances remain fixed per-type application-wide and inject connection-specific state differently?

That would be great, but don't know of a way 😄 It's what I was asking with "If you have a recommendation for a different way that a JsonConverter can access contextual objects specific to a particular seralization process, please let us know." #63795 sounds interesting but async conversion isn't an aspect of it for us; we just have to be able to perform a JsonSerializer.Serialize call within some context that allows the converters to access objects within that context (which we achieve today by having a separate JsonConverterFactory for each context).

@eiriktsarpalis
Copy link
Member

If you have a recommendation for a different way that a JsonConverter can access contextual objects specific to a particular seralization process, please let us know.

For the case of serialization, state could be encapsulated by the serialized value. I'm not aware of a good way to achieve the same effect in deserialization, unfortunately.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 4, 2022
@stephentoub
Copy link
Member

the somewhat optimistic promise "... with appropriate removal when no longer used in order to avoid unbounded leaks"

It does avoid unbounded leaks... unbounded being the key word.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 6, 2022
@cerkoid
Copy link

cerkoid commented Nov 4, 2022

Hi.
This behaviour in blazor server is really sth one would not expect. You can easily ramp up memory usage to several gigs, god knows when evicted :)
For one it's hard to reason about, like is there a memory leak in the app or what will happen with smaller apps, where you could have many running on one server each easily consuming some gigs of ram...

I guess this behaviour cannot be avoided in 7.0 release?

@eiriktsarpalis
Copy link
Member

I guess this behaviour cannot be avoided in 7.0 release?

We might consider backporting #76607 in servicing, assuming there is substantial impact in blazor apps.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants