-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json should support unloadable assemblies correctly #65323
Comments
Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov Issue DetailsI think this should make the cache work reasonably well with collectible assemblies, but it's hard to tell for sure. Could you please add a test which:
Good guide what to do is here: https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability In short what we want to avoid: Originally posted by @vitek-karas in #64646 (comment)
|
Tagging subscribers to this area: @dotnet/area-system-text-json Issue DetailsI think this should make the cache work reasonably well with collectible assemblies, but it's hard to tell for sure. Could you please add a test which:
Good guide what to do is here: https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability In short what we want to avoid: Originally posted by @vitek-karas in #64646 (comment)
|
From #64646 (comment)
System.Text.Json already suffers from that issue since we're rooting the default JsonSerializerOptions instance (and any caches it may create): runtime/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.cs Line 35 in cff924f
We need to work out an eviction policy that takes into account the rooted instance as well, however that's not in scope for the current PR, which only addresses the performance issue. |
I'm not sure what the best solution would be here. @stephentoub added JsonSerializerOptionsUpdateHandler to support hot reload. Perhaps we should expose something like this for users to call into when unloading their ALC? |
Perhaps using |
CWT could easily result in types getting dropped from the cache way earlier than they should, just because the GC happened to run, in turn resulting in huge increases in costs for JsonSerializer. It's possible it could play a role, but any such changes will require very careful measurement. |
Can a Line 111 in a25aa66
Another thing I thought is to add an |
It cannot. Type instances get collected only once the types gets unloaded. |
But a reference to a Type will also prevent unloading, even if a weak reference, yes? If that's the case, which I thought it was, then the key to the CWT couldn't be a Type and still have that help with unloadability, at which point something else needs to be the key, and we're back to things getting dropped from the cache more aggressively/randomly than desired. if I'm wrong about a weak reference preventing unloadability, then ignore my comment. |
Type used as the key in CWT won't prevent unloading. You can think about CWT as adding an expando field to the item that is used as the key. CWT does not extend the lifetime of the key, but it keeps the value reachable for as long as the key is reachable. XmlSerializer has similar global singleton caches and it uses CWT to make them work with unloadable types: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.Xml/src/System/Xml/Serialization/ContextAwareTables.cs#L15. |
BTW: The original System.Text.Json design was trying to avoid these problematic global caches: #28325 . It is unfortunate that we end up reintroducing them. |
Is this specific to CWT or does it apply to weak references in general? |
yes and no. It was keeping the cache in the options instance and saying the user was responsuble for caching, but in the same breath also saying that passing options was optional, which then means the system needs to cache the default for good perf. |
CWT uses dependent handles which are different from weak references. A weak reference on a type/object will not prevent its unloading/collection. |
No - unloading is driven by GC - if the Type object (treated as managed object) is collectible, it will be able to unload the assembly/ALC it belongs to. Basically - don't create managed memory leaks is the same thing as allow unloadability. (With the caveat that Type objects normally never go away, since they're tied by the runtime, unless the parent ALC is unloadable). |
Yes, I know that. But my mental model had long been that the key of a dependent handle is a weak reference, and the dependent nature was from the value to the key. That's not the case? |
It is a weak reference, but the types are strongly held elsewhere until and if they are unloaded. When a type needs to be unloaded, it won't be hindered by being in a CWT's key. |
Hence my question about whether the described behavior was specific to CWT or for weak references in general, which Vitek answered. I'm not sure why "CWT uses dependent handles which are different from weak references" is relevant then. |
Looks like I had got confused but your questions are answered either way. 😅 If you decide that a CWT is the best way to solve this, I can prepare a PR. |
It's unlikely we will have time to address this in 7.0, moving to 8.0 |
Hey, I am a beginner and this conversation goes a little bit over my head, but I think I'm running in to this exact issue. I want to unload an assembly, but it fails because I used System.Text.Json to serialize a type from the assembly. So my question is, is there currently a way to clear the cache (or disable it), or do I have to switch to a different Json Serializer? Update: Got it working now by clearing the caches through reflection.
|
A note about this, as part of working on CoreCLR support at Unity we encountered this issue. For now we are going to call |
Contrary to what I stated in the original post of the issue, the problem with unloading assemblies doesn't lie with the reusable caches implementation (it points to them using weak references) but the fact that we keep default singleton |
That effectively promotes what's intended to be an implementation detail to instead be something in the public API. I'd rather explore alternative options, like using a CWT, or if that has measurably negative performance implications, looking at using a CWT only for types in unloadable assemblies. |
It's pretty unambiguous in what it does and we have important customers taking a de facto dependency on the current private implementation. We probably couldn't change it much without introducing substantial disruption.
We could try to measure this, but my concern is that this would still complicate lookup logic (checking if the assembly of the type is unloadable, looking up two separate caches). From my perspective the existing (private) approach is the simplest approach that shouldn't compromise lookup performance. |
We can't be in a situation where we're prevented from changing private APIs because someone is using them via private reflection. Someone doing so is on their own. |
Aside from this being a bit weird from a caller's point of view ("Why do I care it uses caches inside... why should I?"), I think this would hurt our unloadability story. It's already a bit challenging because it's cooperative and it's easy to break things by holding onto references too long. The debugging story for this is also not the best (try to find GC roots for things in a given ALC, which currently requires SOS debugging). And even if I did go through all of that and found out that the GC root is inside Unloadability should work out of the box - if my code doesn't hold onto anything in the ALC, I should be able to unload it. Note that this is not the only case in framework where we have global caches which hold onto types, if we used the same solution in the other places as well, I might need to call several such "Clear" methods every time I want to unload something. I just find that a really weird design choice. |
Here's a benchmark comparing lookup performance between CD and CWT for using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Concurrent;
using System.Runtime.CompilerServices;
BenchmarkRunner.Run<Benchmark>();
public class Benchmark
{
[Params(1, 10, 100, 1000, 2000)]
public int Count;
public class TypeToHit;
public class TypeToMiss;
private ConcurrentDictionary<Type, string> _concurrentDict = new();
private ConditionalWeakTable<Type, string> _conditionalWeakTable = new();
[GlobalSetup]
public void Init()
{
var types = typeof(int).Assembly.GetTypes()
.Take(Count - 1)
.Append(typeof(TypeToHit));
foreach (Type t in types)
{
_concurrentDict[t] = t.Name;
_conditionalWeakTable.Add(t, t.Name);
}
}
[Benchmark]
public string? ConcurrentDictionary_Hit()
=> _concurrentDict.TryGetValue(typeof(TypeToHit), out string? value) ? value : null;
[Benchmark]
public string? ConcurrentDictionary_Miss()
=> _concurrentDict.TryGetValue(typeof(TypeToMiss), out string? value) ? value : null;
[Benchmark]
public string? ConditionalWeakTable_Hit()
=> _conditionalWeakTable.TryGetValue(typeof(TypeToHit), out string? value) ? value : null;
[Benchmark]
public string? ConditionalWeakTable_Miss()
=> _conditionalWeakTable.TryGetValue(typeof(TypeToMiss), out string? value) ? value : null;
} Results
Roughly speaking this is showing a 2x slowdown, but I'm not sure how substantially that would register in the context of a full-blown serialization operation. It probably would regress performance in some of our microbenchmarks measuring serialization for small POCOs. |
@vitek-karas are there any circumstances beyond assembly unload events that could result in |
I actually don't know if types built dynamically via |
Unloadability is at assembly granularity. The types are collectible if their assembly builder was created using AssemblyBuilderAccess.RunAndCollect.
I do not think we have public assembly unload event API. |
Is there any chance we can get this in .Net 9? |
Very unlikely at this point. |
I want to include that the Now that Does adding the unloadability to Alternatively, can we reload the CLR to ignore the not unloadable ALCs (like reloading the entire BCL)? I apologize for my lack of knowledge of the low-level runtime design. |
I would be inclined to just promote the existing method to a public API. The solution of using a CWT instead of CD is bound to introduce performance regressions -- we already know that CD lookups account for a nontrivial part of the serialization cost so regressing that by 2x per the benchmarks above is not a good look. |
Another approach would be to subscribe to ALC unload requests: when an entry in the cache is added, look if its owning ALC is already known. If it is not and it is unloadable, subscribe to the unloaded event to automatically clear cache. |
That will not work. There is no |
Unloading is what we want in that case: the ALC is not unloaded because System.Text.Json caches hold onto it: cache needs to be cleared when Unloading is raised so that the ALC can ultimately get unloaded once GC has collected all objects created from it. |
That is a very ugly workaround to the problem at best. You still have to actively be aware of System.Text.Json caching for unloading to work properly. What about the solution suggested in #65323 (comment)? In this solution, you would maintain two caches: The current default one, and one for collectible types that uses a CWT. Things would "just work" and you would only pay the performance penalty if you actually use collectible assemblies. |
Can this be pay-for-play? Use CWT for unloadable types only? |
@tbdty please reconsider the option of subscribing to the unloading event, this is actually the event needed (we are trying to prevent ALC to be kept alive, so we need to clear when Unloading is requested, not when it actually completes - which wont happen unless we clear the cache) |
That is not how ALC works, there is no trigger to unload. The documentation even explicitly states that Even if you don't do it like this, your suggestion immediately breaks if someone calls The entire point of collectible ALC is that the GC is capable of detecting when unloading can happen. By keeping these caches, we are preventing that mechanism from working. |
It would be great to have something that can operate at the ALC level, like |
It would have similar overhead as conditional weak table. Also, I do not see how this can work here. The field that's keeping the ALC alive is in concurrent dictionary. |
Ah, you are correct; the calling assembly is different so it won't work anyway. |
I think this should make the cache work reasonably well with collectible assemblies, but it's hard to tell for sure. Could you please add a test which:
Good guide what to do is here: https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability
In short what we want to avoid:
Global GC root which holds onto anything from unloadable ALCs as that will prevent the ALC from unloading. We already have quite a few caches in the FX which do this, so let's not add another one.
Originally posted by @vitek-karas in #64646 (comment)
The text was updated successfully, but these errors were encountered: