Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System.Text.Json] Allow ReferenceResolver to feed external references to serializer #42163

Closed
BreyerW opened this issue Sep 12, 2020 · 16 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@BreyerW
Copy link

BreyerW commented Sep 12, 2020

Background and Motivation

Recently I was trying to fix something in ReferenceResolver.AddReference that at first seemed to be a bug but in reality its just its behaviour. I wanted to add external reference to ReferenceResolver and not rely on passed value (unless external reference got deleted or otherwise expired), however internals of serializer ignores external reference if theres no $ref property after $id and rely on newly created value. To fix this i propose:

Proposed API

namespace System.Text.Json.Serialization
{
    public abstract class ReferenceResolver {
+     public virtual object? TryGetExternalReference(string reference)=>null;
     }

Usage Examples

This new API internally should be called before creating new value for AddReference and if return null should proceed like usual. ResolveReference should take care of $ref and as such further calls to TryGetExternalReference should be unecessary

Alternative Designs

Write custom converter that will catch ALL reference objects and change behaviour around ReferenceResolver on read (CanWrite can be false). Currently this is awful though since theres no way to extend existing converters or call default behaviour in custom converters when references got deleted or expired as far as i know (at which point custom converter isnt needed). If either was available such alternative would be viable. However theres one more thing that makes me leery with this approach: ensuring this special converter remains last. This might be problematic in more dynamic setup which might be the case for me

Theoretically AddReference could be slighty modified (return object? instead of void or add out/ref parameter) to accomodate such feature but due to back compat i assume its off table

Another alternative would be calling ResolveReference before creating and checking if returns null but this might make breaking change although i feel like in almost all cases it wont since it will be first call before AddReference it will return null anyway. If this turns out to be breaking then this could be hidden behind an option in JsonSerializerOptions with a name like ReuseExternalObjects

Last alternative would be to add ref modifier to value in AddReference not sure if this count as breaking change or not

Risks

Only slight performance hit of calling TryGetExternalReference (by default returns only null and does nothing else) and processing if per $id. Actually, those who would rely on such feature might see slight performance win since it will avoid creating new object. There should be no breaking change thanks to virtual modifier and returning null by default that causes serializer to behave like in the past

@BreyerW BreyerW added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 12, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Sep 13, 2020
@layomia
Copy link
Contributor

layomia commented Sep 13, 2020

Can you share some real-world scenario where this behavior is useful? If you already have existing instances, is the idea that you are using the deserializer to "hydrate"/set additional properties on the instances? Besides performance, what is the implication of getting new instances over using existing ones?

cc @jozkee

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Sep 13, 2020
@layomia layomia added this to the Future milestone Sep 13, 2020
@BreyerW
Copy link
Author

BreyerW commented Sep 13, 2020

@layomia Undo/Redo system based on deltas rather than whole serialized objects to reduce memory pressure and reduce wasted disk space when project gets saved. Also it can be potentially used to allow future PopulateObject to work without replacing arrays when preserved reference is desired which is regular pain point of Json.Net if you ever come to implement this feature (although thats merely a prediction on my part)

PopulateObject also could serve my purpose but ideally it should be able to deal with arrays otherwise i dont think it will beat my proposal which also will be much smaller change. Also it wont be as clean since some objects may get deleted and resurrected during program lifetime (its Undo/Redo after all) and here PopulateObject gets a little messy. With my proposal i simply return null in TryGetExternalReference and get fresh object when redoing.

So yes you are very close with hydrate but in my case its updating existing objects with deltas. With current behaviour i get completely new object which is not what i want (and simply swapping references would be huge PITA since references to single object are already all over the place, unless theres a way to get to the 'root' reference of 'local' reference but swapping references themselves with ref work only locally)

Besides performance, what is the implication of getting new instances over using existing ones?

i think you accidentally wrote it backward? Otherwise i trust previous explanation is sufficient. If not ping me.

@BreyerW
Copy link
Author

BreyerW commented Sep 17, 2020

@layomia @jozkee

Extra scenarios that i realised could be supported with this:

  • any cross reference eg file x file, file x database, memory x file, memory x file x database and any other combination that you can think of.

  • or in other words support for sort of "foreign key" if ur familiar with database terminology

  • poor man's PopulateObject (poor man's because it will work only with ref preserve enabled and complete json unless custom converter that can deal with incomplete json is provided, on the other hand implementation is super easy since everything is in place already)

  • recently i realized my workaround with writing custom converter with read reimplemented that catches all objects scales really weakly the more you have special types like dictionaries, lists etc. Since i have to implement custom read for each of these special types from scratch making it untenable in the long run. With this proposal i could avoid all that hassle and reimplement only what i really need

  • also current workaround prevents me from taking advantage of future code-gen version because i have to call reflection in custom read to populate external object defeating the purpose of gode-gen unless i copy your code-gen infrastructure and tweak it to my needs which i think is really bad thing

Closing if i wrote this proposal again i would advocate for option in JsonSerializerSettings that enables ResolveReference to be called before AddReference. This way theres no new API surface on ReferenceResolver and option that hides extra call to ResolveReference by default will prevent breaking change. This is how i deal currently in Json.Net and i can provide snippets if you want but currently im writing on phone

@layomia
Copy link
Contributor

layomia commented Sep 18, 2020

Thanks for the response @BreyerW.

Yes, there is some overlap with this design and "populate object"-related features. #29538 / #30258. We'll be sure to account for this in the design of each feature.

Closing if i wrote this proposal again i would advocate for option in JsonSerializerSettings that enables ResolveReference to be called before AddReference.

Yes I was also thinking along these lines. Just thinking about how relevant relevant state can be passed to the custom resolver to help it fetch the external references.

This is how i deal currently in Json.Net and i can provide snippets if you want but currently im writing on phone

Code snippets would be great here. Curious how this custom configuration is done.

@BreyerW
Copy link
Author

BreyerW commented Sep 18, 2020

@layomia Here you go:

Before we start small caveat: in my project JsonSerializer settings are invariant except custom converters collection so i opted to NOT support any setting in custom converters to simplify maintenance. As such they are barebone with added support for reference preserving and thats it. Lets go.

With this i can deal with System.Array and any well behaving IList (like List<T>)

at first glance placement of ReferenceResolve and AddReference might seems peculiar but notice that its basically ReferenceResolve -> create object if null (cant use ContractResolver.Create because it wont create properly sized System.Array) otherwise reuse external object -> AddReference -> populate resulting object

I had to override write because Json.Net serializes byte array as base64 encoding and didnt bother writing separate path for that in custom Read since write was straightforward. Maybe later.

public class ListReferenceConverter : JsonConverter<IList>
	{
		internal const string refProperty = "$ref";
		internal const string idProperty = "$id";
		internal const string valuesProperty = "$values";


		public override IList ReadJson(JsonReader reader, Type objectType, IList existingValue, bool hasExistingValue, JsonSerializer serializer)
		{
			if (reader.TokenType == JsonToken.Null)
				return null;
			reader.Read();
			var elementType = objectType.IsArray ? objectType.GetElementType() : objectType.GetGenericArguments()[0];
			var refId = reader.ReadAsString();

			while (reader.TokenType is not JsonToken.StartArray) reader.Read();
			reader.Read();
			if (refId is not null)
			{
				IList tmp = new List<object>();
				while (true)
				{
					tmp.Add(serializer.Deserialize(reader, elementType));
					reader.Read();
					if (reader.TokenType is JsonToken.EndArray) { reader.Read(); break; }
					while (!elementType.IsPrimitive && reader.TokenType is not JsonToken.StartObject) reader.Read();
				}
				var reference = serializer.ReferenceResolver.ResolveReference(serializer, refId) as IList;
				if (reference is not null)
				{
					if (objectType.IsArray)
					{
						foreach (var i in ..reference.Count)
							reference[i] = tmp[i];
						return reference;
					}
					else
					{
						reference.Clear();
						existingValue = reference;
					}
				}
				else
					existingValue = Activator.CreateInstance(objectType, tmp.Count) as IList;

				foreach (var i in ..tmp.Count)
					if (reference is not null && !objectType.IsArray)
						existingValue.Add(tmp[i]);
					else
						existingValue[i] = tmp[i];
				serializer.ReferenceResolver.AddReference(serializer, refId, existingValue);
				return existingValue;
			}

			throw new NotSupportedException();
		}

public override void WriteJson(JsonWriter writer, IList value, JsonSerializer serializer)
		{
			writer.WriteStartObject();
			if (!serializer.ReferenceResolver.IsReferenced(serializer, value))
			{
				writer.WritePropertyName(idProperty);
				writer.WriteValue(serializer.ReferenceResolver.GetReference(serializer, value));
				writer.WritePropertyName(valuesProperty);
				writer.WriteStartArray();
				foreach (var item in value)
				{
					serializer.Serialize(writer, item);
				}
				writer.WriteEndArray();
			}
			else
			{
				writer.WritePropertyName(refProperty);
				writer.WriteValue(serializer.ReferenceResolver.GetReference(serializer, value));
			}
			writer.WriteEndObject();
		}
	}

Next would be Dictionaries:

here placement of ReferenceResolver and AddReference is more obvious and just like i imagine it to be implemented in System.Text.Json. Write is overridden only because Json.Net cant deal with dicts more complex than Dictionary<string,> out of box.

public class DictionaryConverter : JsonConverter<IDictionary>
	{

		public override IDictionary ReadJson(JsonReader reader, Type objectType, IDictionary existingValue, bool hasExistingValue, JsonSerializer serializer)
		{
			if (reader.TokenType == JsonToken.Null)
				return null;
			reader.Read();
			var id = reader.ReadAsString();
			var obj = serializer.ReferenceResolver.ResolveReference(serializer, id) as IDictionary;

			if (obj is null)
			{
				obj = serializer.ContractResolver.ResolveContract(objectType).DefaultCreator() as IDictionary;
				serializer.ReferenceResolver.AddReference(serializer, id, obj);
			}
			obj.Clear();
			ReadOnlySpan<char> name = "";
			int dotPos = -1;
			if (reader.TokenType == JsonToken.String)
			{
				dotPos = reader.Path.LastIndexOf('.');
				if (dotPos > -1)
					name = reader.Path.AsSpan()[0..dotPos];
			}
			while (reader.Read())
			{
				if (reader.Path.AsSpan().SequenceEqual(name)) break;
				if (reader.TokenType == JsonToken.EndArray /*&& reader.Value as string is "pairs"*/)
					break;
				while (reader.Value as string is not "key") reader.Read();
				reader.Read();
				var generics = objectType.GetGenericArguments();
				var key = serializer.Deserialize(reader, generics[0]);
				while (reader.Value as string is not "value") reader.Read();
				reader.Read();
				var value = serializer.Deserialize(reader, generics[1]);
				obj.Add(key, value);
				reader.Read();
			}
			return obj;
		}

		public override void WriteJson(JsonWriter writer, IDictionary value, JsonSerializer serializer)
		{
			writer.WriteStartObject();
			writer.WritePropertyName(ListReferenceConverter.idProperty);
			writer.WriteValue(serializer.ReferenceResolver.GetReference(serializer, value));
			writer.WritePropertyName("pairs");
			writer.WriteStartArray();
			foreach (DictionaryEntry item in value)
			{
				writer.WriteStartObject();
				writer.WritePropertyName("key");
				serializer.Serialize(writer, item.Key);
				writer.WritePropertyName("value");
				serializer.Serialize(writer, item.Value);
				writer.WriteEndObject();
			}
			writer.WriteEndArray();
			writer.WriteEndObject();
		}
	}

And the heart of it all:

public class ReferenceConverter : JsonConverter
	{
		public override bool CanWrite => false;
		public override bool CanConvert(Type objectType)
		{
			return objectType != typeof(string) && !objectType.IsValueType && !typeof(IList).IsAssignableFrom(objectType) && !typeof(Delegate).IsAssignableFrom(objectType) && !typeof(MulticastDelegate).IsAssignableFrom(objectType);
		}
		public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
		{
			if (reader.TokenType == JsonToken.Null)
				return null;
			reader.Read();
			var id = reader.ReadAsString();
			var obj = serializer.ReferenceResolver.ResolveReference(serializer, id);

			if (obj is null)
			{
				obj = serializer.ContractResolver.ResolveContract(objectType).DefaultCreator();
				serializer.ReferenceResolver.AddReference(serializer, id, obj);
			}
			ReadOnlySpan<char> name = "";
			int dotPos = -1;
			if (reader.TokenType == JsonToken.String)
			{
				dotPos = reader.Path.LastIndexOf('.');
				if (dotPos > -1)
					name = reader.Path.AsSpan()[0..dotPos];
			}
			while (reader.Read())
			{
				if (reader.Path.AsSpan().SequenceEqual(name)) break;
				if (reader.TokenType == JsonToken.PropertyName)
				{
					var member = objectType.GetMember(reader.Value as string, BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Public);
					if (member.Length is 0) continue;
					reader.Read();

					if (member[0] is PropertyInfo p)
					{
						p.SetValue(obj, serializer.Deserialize(reader, p.PropertyType));
					}
					else if (member[0] is FieldInfo f)
					{
						f.SetValue(obj, serializer.Deserialize(reader, f.FieldType));
					}
				}
			}
			return obj;
		}
		public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
		{
			throw new NotImplementedException();
		}
	}

Here we catch all reference types and modify reference resolving behaviour. Process is identical to dictionaries and almost identical to arrays but thats only because ContractResolver cant deal with System.Array properly

Last notes:

Yes I was also thinking along these lines. Just thinking about how relevant relevant state can be passed to the custom resolver to help it fetch the external references.

since ReferenceResolver spits and accept string you can pass anything and format it however you want. Eg prepend prefix like "sql:unique-indentifier-of sql-query", "file:path-to-file" or use GUIDs for absolutely everything (my current approach).

here is my custom resolver: (a bit dirty but works)

public class IdReferenceResolver : IReferenceResolver
	{
		internal readonly IDictionary<Guid, object> _idToObjects = new Dictionary<Guid, object>();
		internal readonly IDictionary<object, Guid> _objectsToId = new Dictionary<object, Guid>();
		//Resolves $ref during deserialization
		public object ResolveReference(object context, string reference)
		{
			var id = new Guid(reference);
			var o = Extension.objectToIdMapping.FirstOrDefault((obj) => obj.Value == id).Key;
			return o;
		}
		//Resolves $id or $ref value during serialization
		public string GetReference(object context, object value)
		{
			if (value.GetType().IsValueType) return null;
			if (!_objectsToId.TryGetValue(value, out var id))
			{
				id = value.GetInstanceID()
				AddReference(context, id.ToString(), value);
			}
			return id.ToString();
		}
		//Resolves if $id or $ref should be used during serialization
		public bool IsReferenced(object context, object value)
		{
			return _objectsToId.ContainsKey(value);
		}
		//Resolves $id during deserialization
		public void AddReference(object context, string reference, object value)
		{
			if (value.GetType().IsValueType) return;
			Guid anotherId = new Guid(reference);
			Extension.objectToIdMapping.TryAdd(value, anotherId);
			_idToObjects[anotherId] = value;
			_objectsToId[value] = anotherId;
		}
	}

Extension.objectToIdMapping contains all alive references with associated GUIDs. Currently its simple Dictionary but thinking about going back to ConditionalWeakTable as Dictionary require careful disposing of everything when deleted (basically reference counting problem).

Hope it helps understanding. If u got questions ping me as usual ;)

@BreyerW
Copy link
Author

BreyerW commented Oct 6, 2020

@steveharter @layomia @jozkee

I have noticed this comment that you plan on tweaking ReferenceResolver. #40099 (comment)

Can i hope that you take my proposal under consideration when you actually get to it? I believe my proposal is relatively simple to implement

@jozkee
Copy link
Member

jozkee commented Jan 19, 2021

@BreyerW Can you provide a sample class that implements ReferenceResolver and also implements TryGetExternalReference?

The reason I am asking for it is that I don't really comprehend this statement:

internals of serializer ignores external reference if theres no $ref property after $id and rely on newly created value

I was sketching an scenario where I had an external reference already added to my ReferenceResolver and then call Deserialize, and it does deserialize it correctly when it steps into the corresponding $ref JSON property.

class Program
{
    static void Main(string[] args)
    {
        var refHandler = new MyRefHandler();
        var opts = new JsonSerializerOptions
        {
            ReferenceHandler = refHandler
        };

        var refResolver = new MyRefResolver();
        refHandler.Resolver = refResolver;

        var employee = new Employee() { Name = "Bob" };

        refResolver.AddReference("1", employee);

        string json = @"{""$ref"":""1""}";
        Employee deserializedObject = JsonSerializer.Deserialize<Employee>(json, opts);

        json = JsonSerializer.Serialize(deserializedObject);
        Console.WriteLine(json);  // Prints: {"Name":"Bob"}
    }
}

How do you pretend to use your resolver? Are you still using the resolver as a persistent bag of references across multiple serialization calls? If yes, above scenario should also work for your use case.

Here's the full code as well https://dotnetfiddle.net/tTBVdO.

@BreyerW
Copy link
Author

BreyerW commented Jan 20, 2021

@jozkee i fiddled with your example and it will NOT work under scenario that i must support: when there is reference in json but external reference does not exist (just comment out AddReference and you will see what i mean). Remember we are talking about Undo/Redo system meaning object/s may get created/destroyed and even "resurrected".

I suppose i could inject extra info into $ref metadata and use that info to create factory at ResolveReference but this prevents me from using System.Text.Json factory infrastructure (and future AOT feature most likely) meaning extra burden on me to keep parity with factory behaviour inside json library (eg respecting constructor attributes) or making clean break and documenting it neither being perfect.

Also it violates responsibility separation principle (because extra data on $ref would be similar or even identical to $type metadata)

Regarding the confusing part:

The reason I am asking for it is that I don't really comprehend this statement:

It is possible that single file might contain only $id metadata and no $ref. In such scenario serializer creates internally an object and moves on without ever asking for external reference. Workaround i can imagine is pretending that reference was already added even if it wasnt, forcing serializer to always spit $ref instead of $id. But i need to spit $id at least once per object in order to preserve data and populate that data onto existing object (unless it does not exist in which case create new object and populate that new object). Spitting $ref preserves only reference but no data. And more importantly i always hit brick wall that i mentioned at the begining.

And yes i still use it as sort of persistent bag of references across multi serializations - with caveat that i remove references when object gets destroyed externally to prevent infinite grow (teoretically can still happen but in reality chances are next to 0). I did post almost all code snippets already in ealier comments

Last but not least I would abandon idea of TryGetExternalReference and instead reuse ResolveReference (by calling before first AddReference) guarded by new option maybe like JsonSerializerOptions.AllowExternalReferences or somesuch for backward-compatibility. This idea came to me after i created this proposal and thats why it isnt in opening post. I believe @layomia also agrees on this particular point. That being said i dont think about it too strongly. If you think separation with added benefit of no new option is better then so be it.

@BreyerW
Copy link
Author

BreyerW commented Jan 21, 2021

@jozkee sorry for calling out again but i made another dive into my source code to make sure i didnt forgot anything important and i must say after that, that i messed up my explanation but dont want to delete previous post stuff since there might still be something worth reading. Sorry. Here is real reason behind this proposal:

this dive reminded me that $id scenario is even more important than $ref despite being considerably less abundant because thats where all the data is. And $id during deserialization does not fire off ResolveReference, making it impossible for ReferenceResolver to use external reference when that reference is still alive. I made example off your fiddle and pasted it below to show that reference equality is false after deserialization when using $id in json. Also ResolveReference does not fire off (i added Console.WriteLine here)

using System;
using System.Collections.Generic;
using System.Text.Json;
using System.Text.Json.Serialization;

namespace ConsoleApp6
{
    class Program
    {
        static void Main(string[] args)
        {
            var refHandler = new MyRefHandler();
            var opts = new JsonSerializerOptions
            {
                ReferenceHandler = refHandler
            };

            var refResolver = new MyRefResolver();
            refHandler.Resolver = refResolver;

            var employee = new Employee() { Name = "Bob" };

            refResolver.AddReference("1", employee);

            string json = @"{""$id"":""1"",""Name"":""B""}";
            Employee deserializedObject = JsonSerializer.Deserialize<Employee>(json, opts);
			Console.WriteLine("is reference equal?: " +object.ReferenceEquals(deserializedObject,employee));
            json = JsonSerializer.Serialize(deserializedObject);
            Console.WriteLine(json);
        }
    }

    class MyRefHandler : ReferenceHandler
    {
        public ReferenceResolver Resolver { get; set; }

        public override ReferenceResolver CreateResolver() => Resolver;
    }

    class MyRefResolver : ReferenceResolver
    {
        private uint _referenceCount;
        private readonly Dictionary<string, object> _referenceIdToObjectMap = new();
        private readonly Dictionary<object, string> _objectToReferenceIdMap = new();

        public override void AddReference(string referenceId, object value)
        {
            if (!_referenceIdToObjectMap.TryAdd(referenceId, value))
            {
               //throw new JsonException();
            }
        }

        public override string GetReference(object value, out bool alreadyExists)
        {
            if (_objectToReferenceIdMap.TryGetValue(value, out string referenceId))
            {
                alreadyExists = true;
            }
            else
            {
                _referenceCount++;
                referenceId = _referenceCount.ToString();
                _objectToReferenceIdMap.Add(value, referenceId);
                alreadyExists = false;
            }

            return referenceId;
        }

        public override object ResolveReference(string referenceId)
        {
            if (!_referenceIdToObjectMap.TryGetValue(referenceId, out object value))
            {
                throw new JsonException();
            }
            Console.WriteLine("Resolve fired off");
            return value;
        }
    }

    class Employee
    {
        public string Name { get; set; }
    }
}

@BreyerW
Copy link
Author

BreyerW commented Jan 23, 2021

@jozkee I decided to make certain that this request will solve my problem by forking Json.Net (not STJ because it still lacks some features like polymorphims & full support for private members) and sprinkling ResolveReference in strategic places. And it indeed allowed me to remove custom ReferenceConverter which was biggest offender in my eyes since it was unamaintanable in long run and broke AOT scenario. I checked inside my code if references are equal before and after deserialize with $id inside and they are.

I would have liked to show neat diff of Json.Net but unfortunetly before forking i used nuget version and didnt thought to commit Json.Net without changes. What i did basically is placing this snippet existingValue =(id is not null? Serializer?.ReferenceResolver?.ResolveReference(new object(), id) : null) ?? existingValue; in 3 places with slight variance to conform to correct types eg object, IList, IDictionary (should have placed in 5-6 but didnt care for dynamic or ISerializable support so i omitted them). All of them happened in JsonSerializerInternalReader.cs file

Since its just 3 places i will just tell you where i made these changes instead of linking "broken" commit (there is more code than necessary to provide better context as to where aforementioned snippet landed):

line: 476

existingValue =(id is not null? Serializer?.ReferenceResolver?.ResolveReference(new object(), id) : null) ?? existingValue;

// check that if type name handling is being used that the existing value is compatible with the specified type
						if (existingValue != null && (resolvedObjectType == objectType || resolvedObjectType.IsAssignableFrom(existingValue.GetType())))
						{
							targetObject = existingValue;
						}
						else
						{
							targetObject = CreateNewObject(reader, objectContract, member, containerMember, id, out createdFromNonDefaultCreator);
						}

line: ~523-526

existingValue = (id is not null ? Serializer?.ReferenceResolver?.ResolveReference(new object(), id) : null) as IDictionary ?? existingValue;

						if (existingValue == null)
						{
							IDictionary dictionary =  CreateNewDictionary(reader, dictionaryContract, out bool createdFromNonDefaultCreator);

							if (createdFromNonDefaultCreator)
							{
								if (id != null)
								{
									throw JsonSerializationException.Create(reader, "Cannot preserve reference to readonly dictionary, or dictionary created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
								}

								if (contract.OnSerializingCallbacks.Count > 0)
								{
									throw JsonSerializationException.Create(reader, "Cannot call OnSerializing on readonly dictionary, or dictionary created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
								}

								if (contract.OnErrorCallbacks.Count > 0)
								{
									throw JsonSerializationException.Create(reader, "Cannot call OnError on readonly list, or dictionary created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
								}

								if (!dictionaryContract.HasParameterizedCreatorInternal)
								{
									throw JsonSerializationException.Create(reader, "Cannot deserialize readonly or fixed size dictionary: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
								}
							}

							PopulateDictionary(dictionary, reader, dictionaryContract, member, id);

							if (createdFromNonDefaultCreator)
							{
								ObjectConstructor<object> creator = (dictionaryContract.OverrideCreator ?? dictionaryContract.ParameterizedCreator)!;

								return creator(dictionary);
							}
							else if (dictionary is IWrappedDictionary wrappedDictionary)
							{
								return wrappedDictionary.UnderlyingDictionary;
							}

							targetDictionary = dictionary;
						}
						else
						{
							targetDictionary = PopulateDictionary(dictionaryContract.ShouldCreateWrapper || !(existingValue is IDictionary) ? dictionaryContract.CreateWrapper(existingValue) : (IDictionary)existingValue, reader, dictionaryContract, member, id);
						}

line: ~846-849

existingValue = (id is not null ? Serializer?.ReferenceResolver?.ResolveReference(new object(), id) : null) as IList?? existingValue;

			if (existingValue == null)
			{
				IList list = CreateNewList(reader, arrayContract, out bool createdFromNonDefaultCreator);

				if (createdFromNonDefaultCreator)
				{
					if (id != null)
					{
						throw JsonSerializationException.Create(reader, "Cannot preserve reference to array or readonly list, or list created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
					}

					if (contract.OnSerializingCallbacks.Count > 0)
					{
						throw JsonSerializationException.Create(reader, "Cannot call OnSerializing on an array or readonly list, or list created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
					}

					if (contract.OnErrorCallbacks.Count > 0)
					{
						throw JsonSerializationException.Create(reader, "Cannot call OnError on an array or readonly list, or list created from a non-default constructor: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
					}

					if (!arrayContract.HasParameterizedCreatorInternal && !arrayContract.IsArray)
					{
						throw JsonSerializationException.Create(reader, "Cannot deserialize readonly or fixed size list: {0}.".FormatWith(CultureInfo.InvariantCulture, contract.UnderlyingType));
					}
				}

				if (!arrayContract.IsMultidimensionalArray)
				{
					PopulateList(list, reader, arrayContract, member, id);
				}
				else
				{
					PopulateMultidimensionalArray(list, reader, arrayContract, member, id);
				}

				if (createdFromNonDefaultCreator)
				{
					if (arrayContract.IsMultidimensionalArray)
					{
						list = CollectionUtils.ToMultidimensionalArray(list, arrayContract.CollectionItemType!, contract.CreatedType.GetArrayRank());
					}
					else if (arrayContract.IsArray)
					{
						Array a = Array.CreateInstance(arrayContract.CollectionItemType, list.Count);
						list.CopyTo(a, 0);
						list = a;
					}
					else
					{
						ObjectConstructor<object> creator = (arrayContract.OverrideCreator ?? arrayContract.ParameterizedCreator)!;

						return creator(list);
					}
				}
				else if (list is IWrappedCollection wrappedCollection)
				{
					return wrappedCollection.UnderlyingCollection;
				}

				value = list;
			}
			else
			{
				if (!arrayContract.CanDeserialize)
				{
					throw JsonSerializationException.Create(reader, "Cannot populate list type {0}.".FormatWith(CultureInfo.InvariantCulture, contract.CreatedType));
				}

				value = PopulateList((arrayContract.ShouldCreateWrapper || !(existingValue is IList list)) ? arrayContract.CreateWrapper(existingValue) : list, reader, arrayContract, member, id);
			}

If you still want link to commit i can provide it, my project is public so its not a problem.

As for $ref when reference is dead i can workaround it outside serializer itself but $id with alive reference ("dead" $id does not exist in my codebase unlike $ref) right now is impossible (or at least i couldnt think of one since serializer seems to always create new object internally when encountering $id and theres no way to override it)

Alternative would be to implement PopulateObject but so far cant use it with Json.Net since it refuses to work with arrays and problem with $id not using external references still persist. And i believe this alternative is potentially much larger change than this proposal.

Sorry for triple post but what i wrote rn seems to be worth a post on its own

@BreyerW
Copy link
Author

BreyerW commented Apr 21, 2021

@layomia in light of your recent comment in other issue i would like to mention use cases in a bit more broad stroke: its games and editors like Unity3d. There is even sort of "prior art" namely Ceras serializer which support what i want via interface IExternalObject (which is not viable for me but i digress). You can even find example based on game related issue here https://github.com/rikimaru0345/Ceras/blob/master/src/Tutorial/Tutorial.cs - do ctrl + f write 'IExternalObject' and it should show up.

Unity3d-like editors often serialize current object and compare result to discover automatically changes in objects compared to previous frame. Of course what i request isnt strictly necessary for that but can have considerable memory AND perf wins simply because $id with data for object appears only once no matter how much i fracture the serialization step (which could be used for multithreading state change discovery).

It has snowballing effect for my particular use case: single set of data means smaller serialization results (because references in other places are turned into guids which has fixed length even when reference is serialized in completely different call) that in turn means less work for state discovery and invariant amount of data sets means i can fracture serialization step into smaller serialization steps paying price of extra calls instead of single one but i get considerably smaller memory peak usage since previous work can be discarded or reused immediately and can get back lost perf by multithreading state discovery.

Well actually all that can still be achieved without my request but such data is unusable for deserialization. And thats where shit hits the fan because i reuse data from state discovery for Undo/Redo where deserialization support is absolutely necessary

As for PopulateObject im worried it wont work well because only root object will get reused and all $ref but any subsequent $id deep in JSON structure will probably break my use case unless you will provide an overload that will override default behaviour around $id (that is always create new object when i need it to first try find external reference then create new obj if not exist). If im wrong or such overload will be possible then remaining concern is if it will work with plain arrays which is recurring annoyance with json.net that was never fixed despite all these years of it existing in json.net

Other than that it is admittedly niche but can be used for optimization trade-offs (eg. memory peak vs perf) in any application that does heavy serialization though that requires a bit of creativity (like mentioned splitting of large payload into smaller chunks that can be discarded or reused quickly and optionally multithreaded while still preserving references - though that apply only to deserialization part since serialization can already achieve that)

@layomia
Copy link
Contributor

layomia commented Apr 26, 2021

@BreyerW - thanks for all the insight you've shared. Are you able to prototype the desired semantics in a fork of this repo (dotnet/runtime) and share it with me? I believe the proposed behavior of a new option (e.g. JsonSerializerOptions.ResolveNewReferences) making the serializer call ResolveReference first before creating a new instance is valid. Having a PoC showing that the feature will actually work for your scenarios will help move it along. I believe this is where the major changes to fetch an existing reference before creating a new one need to be made:

else if (state.Current.ObjectState == StackFrameObjectState.ReadIdValue)
.

@BreyerW
Copy link
Author

BreyerW commented Apr 27, 2021

@layomia sure thing and already done + sample test ;)

changes are really simple

 string referenceId = reader.GetString()!;
                if (options.ResolveNewReferences)
                    state.Current.ReturnValue = state.ReferenceResolver.ResolveReference(referenceId);
                //ResolveReference can fail. We guard against such scenario
                if (state.Current.ReturnValue is null)
                    converter.CreateInstanceForReferenceResolver(ref reader, ref state, options);

                //Alternatively, order preserving impl but with worse perf
                //in case polymorphic de/serialization will mess with converter.CreateInstanceForReferenceResolver:
                //converter.CreateInstanceForReferenceResolver(ref reader, ref state, options);
                //string referenceId = reader.GetString()!;
                //if (options.ResolveNewReferences)
                //state.Current.ReturnValue = state.ReferenceResolver.ResolveReference(referenceId) ?? state.Current.ReturnValue;

this snippet i placed above AddReference in 2 places exactly where you predicted and that was enough to enable what i wanted. Current version changes order of things a little bit in order to get best perf
but places restriction on converter.CreateInstanceForReferenceResolver and reader.GetString that it wont forward reader (or at least the moves wont clash). If that is not acceptable i provided alternative implementation that preserves old order in commented part of snippet.

sample test:

[Fact]
        public static void CustomReferenceResolverPersistentWithExternalReferences()
        {
            var options = new JsonSerializerOptions
            {
                ResolveNewReferences = true,
                WriteIndented = true,
                ReferenceHandler = new PresistentGuidReferenceHandler
                {
                    // Re-use the same resolver instance across all (de)serialiations based on this options instance.
                    PersistentResolver = new GuidReferenceResolver()
                }
            };
            var resolver = options.ReferenceHandler.CreateResolver();


            var person1 = new PersonReference();
            var person2 = new PersonReference();

            //currently necessary for $ref metadata when doing disjoint deserialization with preserve semantic
            //but objects can be blank slate
            //this limitation could be potentially eliminated by guarding ResolveReference called during deserialization
            //against nulls (create blank slate when encountering null just like AddReference is guarded when ResolveNewReferences is set)
            //e.g. created via RuntimeHelpers.GetUninitializedObject or new'ed
            //for my use case not necessary since i have list of ids and types before doing serialization/deserialization
            //but something to consider as an option
            resolver.AddReference("0b64ffdf-d155-44ad-9689-58d9adb137f3", person1);
            resolver.AddReference("ae3c399c-058d-431d-91b0-a36c266441b9", person2);

            string jsonPerson1 =
@"
  {
    ""$id"": ""0b64ffdf-d155-44ad-9689-58d9adb137f3"",
    ""Name"": ""John Smith"",
    ""Spouse"": {
       ""$ref"": ""ae3c399c-058d-431d-91b0-a36c266441b9""
    }
  }
";
            var jsonPerson2 = @"
{
    ""$id"": ""ae3c399c-058d-431d-91b0-a36c266441b9"",
    ""Name"": ""Jane Smith"",
    ""Spouse"": {
        ""$ref"": ""0b64ffdf-d155-44ad-9689-58d9adb137f3""
      }
}";
            //notice disjoint deserialization but it still preserves references and $id happens only once per object

            //disjoint serialization can be done entirely with current feature set
            //eg. by guaranteeing order of serializations
            //or by customizing ReferenceResolver to recognize which object is the root object
            PersonReference firstPeople = JsonSerializer.Deserialize<PersonReference>(jsonPerson1, options);
            PersonReference secondPeople = JsonSerializer.Deserialize<PersonReference>(jsonPerson2, options);


            Assert.Same(firstPeople, secondPeople.Spouse);
            Assert.Same(firstPeople.Spouse, secondPeople);
            //despite $id, their references are intact but data changed
            //with old behaviour these asserts would fail
//this behaviour can be (ab)used for undo/redo
//Or streaming scenarios eg. Collaboration over internet
//This can also be used as workaround for lack of PopulateObject that works with plain arrays unlike newtonsoft
            Assert.Same(person1, firstPeople);
            Assert.Same(person2, secondPeople);
        }

i added some comments in sample test that may or may not be of interest to you

link to fork with updated code: https://github.com/BreyerW/runtime made from main from yesterday (which is net 6 preview 4 or preview 5 i believe?) however my build script blew up after doing 99% work and had to manually fix last 1% resulting in this fork not being exact copy when it comes to dependency tree so i suggest extra caution when getting my copy

Let me know if this sample test is too simple i can add some extra scenarios though im a little busy today and tomorrow.

EDIT also a bit of bikeshedding ;) i think ResolveNewReferences doesnt tell much about its purpose so i propse something closer to TryResolveExternalReferences. Try part suggest that it can gracefully fail (just like all Try patterns) and it is about external references. Alternatively CallResolveBeforeAddReference. Self documenting but a bit mouthy at the same time. However since its just bikeshedding i left ResolveNewReferences for now

@eiriktsarpalis
Copy link
Member

The semantics of $id are meant to indicate "populate with instance produced by the deserializer and associate with specified identifier for future use". I'll say that adding a method specifically intended to override that behavior does strike me as a little bit odd.

It leaves at least one open question regarding the implementation: would the deserializer be skipping the nested JSON content of such an $id object or would it deserialize onto the externally fed instance? I suspect it would need to be the former, given that your ReferenceResolver is now effectively a singleton: any deserialization would be mutating state attached to your JsonSerializerOptions instance. It is also not clear to me how this feature would interact with polymorphic deserialization (currently planned in #30083).

I think @jozkee already alluded to this in his previous remarks, but wouldn't it make sense to employ a schema where externally fed nodes are always serialized and deserialized as $ref nodes? This would involve making a call as to whether a reference needs to be fed externally on serialization, via a custom ReferenceResolver.GetReference implementation.

@BreyerW
Copy link
Author

BreyerW commented Oct 25, 2021

@eiriktsarpalis actually it is the latter. Override of behaviour cant be piecemeal so any nested json contents will try find external reference first and deserialize onto that (almost like PopulateObject except it works with any depth unlike newtonsoft which works only on rootobject and creates new instances for anything below that if memory serves right ) if not found then what happens is up to you can throw or return null to indicate that blank slate should be created and deserialized onto that then insert that object to parent object

Also note above you is link to my working fork with extra comments in code

EDIT and no cant use $ref everywhere bc data must be stored somewhere and deserialized unless i employ 2 schemas: one with data AND skipped (or written as null to ensure graph is invalidated before relinking step) any reference objects that arent root objects since that would be wasteful (i want to do disjoint serialization to reduce max memory used and possibly parallelize) and serialize whole graph again with disabled data and only $ref written so that graph can be correctly relinked again.

That being said i will definitely need PopulateObject for alternative approach and not sure if System.Text.Json converters support rerouting to other converters since i will need root converter for all reference types in first step that will detect if given object should be skipped or written and if written reroute to actual converter

@eiriktsarpalis
Copy link
Member

Override of behaviour cant be piecemeal so any nested json contents will try find external reference first and deserialize onto that

That's what I don't like. For your use case you'd need a reference handler storing mutable deserialization state that spans multiple operations, effectively tying it to the JsonSerializerOptions instance. JsonSerializerOptions is designed to be used as a singleton instance (apart from being a configuration object it also encapsulates all converter caches, which are expensive to recalculate) and as such it is meant to stay immutable, referentially transparent and thread safe after it has been configured. I would say that the use case is clearly outside of the design parameters of both ReferenceResolver and JsonSerializerOptions and therefore I don't think it should be supported.

That being said i will definitely need PopulateObject for alternative approach

That's certainly something we're considering for a future release, I would recommend contributing to the conversation in #29538.

@eiriktsarpalis eiriktsarpalis closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

6 participants