-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Support streaming deserialization of JSON objects #64182
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json Issue DetailsBackground and motivationMy team has a large JSON blob that has the following format: {
"package-id-1": ["owner-1", "owner-2"],
"package-id-2": ["owner-1"],
... megabytes and megabytes later ...
"package-id-9001": ["owner-42"],
} I thought that perhaps this file could be read in a streaming way via some implementation of Currently, it appears that From the blog post I read, it appears that this limitation is expected for now.
Currently, if a KVP is provided for
I attempted to write my own code to produce an API ProposalI propose an overload of IAsyncEnumerable<KeyValuePair<TKey, TValue>> DeserializeAsyncEnumerable<TKey, TValue>(
Stream utf8Json,
JsonSerializerOptions? options = null,
CancellationToken cancellationToken = default(CancellationToken)); By default, the method would work best when the property values are homogenous in type (e.g. I believe this is superior to passing a API UsageJSON: {
"a": [ 1, 2 ],
"b": [ 2, 3 ]
} Code: using System.Text.Json;
using var json = File.OpenRead("example.json");
// Returns a IAsyncEnumerable<KeyValuePair<string, List<int>>>
var pairs = JsonSerializer.DeserializeAsyncEnumerable<string, List<int>>(json);
await foreach (var pair in pairs)
{
Console.WriteLine($"{pair.Key}: {string.Join(" + ", pair.Value)} = {pair.Value.Sum()}");
} Output:
Alternative DesignsA new type could be introduced to contain both the property name and value. However, I see the symmetry between Alternatively, the existing method with a single type parameter An alternative design for the end-user would be to format the JSON in a different (more sane, yet more verbose) way, e.g. [
{ "id": "package-id-1", "owners": ["owner-1", "owner-2"] },
{ "id": "package-id-2", "owners": ["owner-1"] },
... megabytes and megabytes later ...
{ "id": "package-id-9001", "owners": ["owner-42"] }
] This may not be possible given constraints on the producer of the JSON document. RisksIt might be entirely unclear that is how you do streaming object deserialization. The nuance between one and two type parameters is perhaps too subtle. This suggested feature may be a bit frustrating in that, I wager, most JSON objects do not have homogenous property values. So perhaps a lot of folks just will use It is quite likely that this method would need to allow duplicate property names. Otherwise, the streaming state would need to track property names that have already been seen in order to error out. It would need to be abundantly clear to callers that they need to do duplicate property name checks themselves (if necessary).
|
That's an excellent point, which in my view illustrates that this is a niche application. Rather than exposing such functionality as a dedicated method, we should instead offer extensibility points that let users write extensions that support their bespoke scenaria. I believe it could be addressed by #63795. |
Awesome! I'll follow #63795 then and give it a try whenever it lands. Thanks for your time, @eiriktsarpalis! |
Supporting async JSON object deserialization with strongly-typed
Assuming property types are not homogenous, then yes a DOM type would be easiest especially if type-specific POCO logic on getters\setters is not present. Deserializing into With |
Background and motivation
My team has a large JSON blob that has the following format:
I thought that perhaps this file could be read in a streaming way via some implementation of
IAsyncEnumerable<KeyValuePair<string, List<string>>>
provided by System.Text.Json.Currently, it appears that
JsonSerializer.DeserializeAsyncEnumerable<T>
only supports documents that are rooted as arrays. This definitely makes sense as the main use case. However, it seems to me that this general concept could also work for streaming across very large objects where the keys are more like data than schema and therefore allowing unbounded properties. In the JSON example above, both the keys and the values are "data" so to speak rather than a more typical JSON document using object property names as "schema".From the blog post I read, it appears that this limitation is expected for now.
Currently, if a KVP is provided for
T
, the following strange error is thrown mentioning aQueue
(appears to be an implementation detail). I would have expected an error saying "unexpectedJsonTokenType.StartObject
, expectedJsonTokenType.StartArray
" or something.I attempted to write my own code to produce an
IAsyncEnumerable
from aUtf8JsonReader
but found it quite challenging. The analogous code with Newtonsoft.Json (usingJsonTextReader
is straightforward.API Proposal
I propose an overload of
JsonSerializer.DeserializeAsyncEnumerable
is added to support the parsing of objects:By default, the method would work best when the property values are homogenous in type (e.g.
List<string>
in my example above) but this could be enhanced using aJsonConverter
that handles all of the different property types and returning them asTValue
.TValue
could be left asobject
indicating that the value should be returned as a JSON DOM object.I believe this is superior to passing a
KeyValuePair
asT
for the existingDeserializeAsyncEnumerable<T>
since it provides a hint at the call site that the expected document is an object, not an array.API Usage
JSON:
Code:
Output:
Alternative Designs
A new type could be introduced to contain both the property name and value. However, I see the symmetry between
IAsyncEnumerable<KeyValuePair<TKey, TValue>>
andDictionary<TKey, TValue>
implementingIEnumerable<KeyValuePair<TKey, TValue>>
.Alternatively, the existing method with a single type parameter
T
could be enhanced to have a special case to allow objects when T is a KVP. I think this alternative is a bit more confusing and not discoverableAn alternative design for the end-user would be to format the JSON in a different (more sane, yet more verbose) way, e.g.
This may not be possible given constraints on the producer of the JSON document.
Risks
It might be entirely unclear that is how you do streaming object deserialization. The nuance between one and two type parameters is perhaps too subtle.
This suggested feature may be a bit frustrating in that, I wager, most JSON objects do not have homogenous property values. So perhaps a lot of folks just will use
object
as theTValue
which (from what I can tell) falls through to the DOM API for the returnedobject
values.It is quite likely that this method would need to allow duplicate property names. Otherwise, the streaming state would need to track property names that have already been seen in order to error out. It would need to be abundantly clear to callers that they need to do duplicate property name checks themselves (if necessary).
The text was updated successfully, but these errors were encountered: