-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: System.Text.Json: IAsyncEnumerable support #38055
Comments
Is there a way of getting some insight into how items get prioritized for triage and when this proposal might get assigned an owner? |
@marek-safar Sorry for the tag out of the blue, but after some searching I saw you removed the |
An update: I have become aware of the following scenario:
Thus, a deadlock arises. This means that in order to support serializing |
Hmm, I can't think of a way of doing this without breaking existing It seems, then, that the only options are either that or a major version bump that forces all converters to be |
Thank you for your investigation and proposal. Due to time constraints on 5.0, we are unlikely to fit this in. I'll triage this as "Future" for now, and it can be considered for the next release. |
Thanks for your response :-) I have updated the proposal to include a description of the async-over-sync problem and what I believe it would take to resolve it. Ultimately it would be a breaking change, and I have no idea how or if things in the core framework can support that, even over major version number changes. There's some argument that perhaps making the converter architecture |
The serialization design makes extension points including converters sync-only. This avoids the cost of using async all the way down when not needed and\or avoids the cost of duplicated code (in custom converters and in internal code) by having both async and sync method have "almost duplicated" code. It does this by managing "state" classes that allow the (de)serializer to unwind the physical call stack, do the async call (read or flush), and then resume the call stack manually. Also the So IAsyncEnumerable can't be safely implemented as a custom converter (avoiding sync->async calls) at least with the converter base classes exposed publicly today. The async handling and state management must be built-in, or the converter model expanded as you mention to support async. |
In the original issue I added information and a link to a prototype that extends the serializer internally to handle this (and does not extend the custom converter architecture to handle async). It only handles the Deserialize+Stream case at this time (it is a proof-of-concept, not necessarily production quality or fully tested). If an IAsyncEnumerable is obtained in a non-async deserialize or serialize method then the serializer should throw |
Addressed by #50778. |
Background and Motivation
System.Text.Json provides a generalized mechanism for converting data between object graphs and a textual JSON representation. In general, the mapping between object representation and JSON format is automated, with the JSON structure inferred from the object structure by means of reflection. When System.Text.Json encounters a value of type
IEnumerable
orIEnumerable<T>
, it translates it into a JSON array, and the way in which it does so is unbuffered, meaning that the output stream will receive objects as they are returned by theIEnumerable
. In some cases, this is used to stream results in realtime, producing the JSON for objects only as they become available; theIEnumerable
's sequence is not predetermined prior to the start of serialization.This particular function is tied into the way ASP.NET Core implements endpoints. If an endpoint route method returns a value of type
IEnumerable
orIEnumerable<T>
, then ASP.NET Core will (subject to some degree of caching) stream the results of the enumeration process to the client as they become available.A natural progression of this is to define a route endpoint that returns
IAsyncEnumerable<T>
, so that any blocking operations used to retrieve the objects as they are returned don't tie up a thread (this being the entire rationale of the asyncTask
model in the first place). However, ifIAsyncEnumerable<T>
is returned from a route endpoint, ASP.NET Core will try to read to the end of theIAsyncEnumerable<T>
and buffer it entirely in memory. This is a work-around put in place because if it passes it directly to the serialization infrastructure, then what is actually sent to the client is the string{}
. Because System.Text.Json lacks any processing ofIAsyncEnumerable<T>
, it treats it as a generic object type, on which it finds no meaningful properties to serialize.Ultimately, ASP.NET Core should not be doing in-memory buffering of any
IAsyncEnumerable<T>
result. This completely defeats the entire purpose of both an enumerable type and an async pattern. It also means that if someone writes a web endpoint designed to simply continue sending data as long as the client remains connected, with anIAsyncEnumerable<T>
whose enumerator never ends, then ASP.NET Core will attempt to buffer this infinite sequence of objects and eventually run out of memory. With proper support in the underlying serialization infrastructure, ASP.NET Core can remove this buffering and dynamically streamIAsyncEnumerable<T>
sequences in the same manner as it already doesIEnumerable<T>
.I believe that systematic support for
IAsyncEnumerable<T>
within System.Text.Json would be useful, not just to support this ASP.NET Core situation, but also to fill out the set of standard system types that can be serialized & deserialized. If System.Text.Json is expected to automatically "do the right thing" withIEnumerable
andIEnumerable<T>
, it is not unreasonable for a consumer to assume it will also "do the right thing" withIAsyncEnumerable<T>
, and this is not presently the case.Proposed API
I propose updating System.Text.Json so that it automatically serializes values of type
IAsyncEnumerable<T>
in much the same way that it currently automatically serializesIEnumerable<T>
. Similarly, when deserializing, if it is asked to supply a value of typeIAsyncEnumerable<T>
, then, given as input a JSON array where each element can be deserialized asT
, it should be able to supply some value implementingIAsyncEnumerable<T>
that contains the deserialized data.I do not believe there is any merit in requiring consumers to explicitly opt into or register this function; it should be a standard part of how
JsonSerializer.Serialize
andJsonSerializer.Deserialize
work, exactly asIEnumerable<T>
already is.A complication of this is that in a high-volume setting, it is imperative that the engine not be invoking
async
operations from a synchronous context. As such, implementing this will require thatJsonConverter
be made fundamentallyasync
. Most of its implementations will not actually operate in an asynchronous fashion, but the call stack must implement it this way end-to-end in order forIAsyncEnumerable
calls that block to a continuation to release the thread back to the task scheduler. This is a breaking change that could only be introduced under a new major version number.An alternative would be to make
JsonConverter
support both synchronous andasync
modes of operation, though in this situation, the engine would need to throw an exception if it encounters anIAsyncEnumerable
object further down the call stack after having transitioned to a synchronous context. I believe this would be an antipattern that would encourage people to write code that works around the issue in ways susceptible to deadlock under some circumstances.On deserialization, I believe the only rational approach is to buffer the entire sequence, which is in fact what the
IEnumerable
/IEnumerable<T>
support already does. The reason for this is that the object graph could contain multipleIAsyncEnumerable<T>
values, and if they were actually being asynchronously deserialized, each would require its own stream pointer in the underlying data, which is obviously not in a general sense possible. Thus, I believe System.Text.Json should provide a simple object based onList<T>
that provides a minimal synchronousIAsyncEnumerable<T>
implementation, so that deserialized values of typeIAsyncEnumerable<T>
provide the core behaviour ofIAsyncEnumerable<T>
with regard to enumeration.Usage Examples
Example 1: Convert database query results into JSON
Assume the existence of a database adapter class that can retrieve records using the
async
model, returning the elements one at a time so that the entire result set is not held in memory:A consumer of this might convert all orders to JSON format with a statement such as:
(This currently returns the string
{}
.)Example 2: ASP.NET Core endpoint using
async
programming modelPseudocode:
Alternative Designs
I think all alternatives currently available are antipatterns in that they require the consumer to be responsible for the way in which
IAsyncEnumerable<T>
is processed and serialized, repeating common code across many projects, sacrificing maintainability and performance.Risks
I cannot think of any ways in which changing
IAsyncEnumerable<T>
serialization so that it returns a meaningful JSON array instead of an empty JSON object could cause problems. The only possibility I can think of is if someone were to make an object that had properties and also implementedIAsyncEnumerable
directly, they might currently have code that expects the properties to be serialized. This seems like an exceedingly odd pattern to me that I suspect nobody has actually depended upon to date.One possibility is an object that implements both
IEnumerable<T>
andIAsyncEnumerable<T>
. Presently, this object serializes as an array because System.Text.Json picks up theIEnumerable<T>
implementation. With this proposal, it would instead pick upIAsyncEnumerable<T>
. It is my belief that if an object implements both, it would probably prefer to be enumerated asynchronously. If such an object did exist and provided different sequences, though, then this change would cause the serialized data to change.Partial Implementation
I created an implementation of this prior to understanding the API proposal review process, and prior to fully comprehending the async-over-sync problem. It lives in PR #37981. The implementation reuses as much existing code as possible by leveraging existing serialization & deserialization functionality designed for the
IEnumerable
interface.IAsyncEnumerable<T>
is not directly related toIEnumerable
, but shares obvious genetic material with it.This implementation would need to be updated to provide a natively-asynchronous model for
JsonConverter
in order to fully satisfy the needs of this proposal.My proposed implementation also provides a simple
List<T>
implementingIAsyncEnumerable<T>
as a place to put deserialized data so that code enumerating thoseIAsyncEnumerable<T>
values can read it (even though it isn't actually async -- something which I don't believe to be possible anyway).The text was updated successfully, but these errors were encountered: