Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: System.Text.Json: IAsyncEnumerable support #38055

Closed
logiclrd opened this issue Jun 17, 2020 · 9 comments
Closed

API Proposal: System.Text.Json: IAsyncEnumerable support #38055

logiclrd opened this issue Jun 17, 2020 · 9 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@logiclrd
Copy link
Contributor

logiclrd commented Jun 17, 2020

Background and Motivation

System.Text.Json provides a generalized mechanism for converting data between object graphs and a textual JSON representation. In general, the mapping between object representation and JSON format is automated, with the JSON structure inferred from the object structure by means of reflection. When System.Text.Json encounters a value of type IEnumerable or IEnumerable<T>, it translates it into a JSON array, and the way in which it does so is unbuffered, meaning that the output stream will receive objects as they are returned by the IEnumerable. In some cases, this is used to stream results in realtime, producing the JSON for objects only as they become available; the IEnumerable's sequence is not predetermined prior to the start of serialization.

This particular function is tied into the way ASP.NET Core implements endpoints. If an endpoint route method returns a value of type IEnumerable or IEnumerable<T>, then ASP.NET Core will (subject to some degree of caching) stream the results of the enumeration process to the client as they become available.

A natural progression of this is to define a route endpoint that returns IAsyncEnumerable<T>, so that any blocking operations used to retrieve the objects as they are returned don't tie up a thread (this being the entire rationale of the async Task model in the first place). However, if IAsyncEnumerable<T> is returned from a route endpoint, ASP.NET Core will try to read to the end of the IAsyncEnumerable<T> and buffer it entirely in memory. This is a work-around put in place because if it passes it directly to the serialization infrastructure, then what is actually sent to the client is the string {}. Because System.Text.Json lacks any processing of IAsyncEnumerable<T>, it treats it as a generic object type, on which it finds no meaningful properties to serialize.

Ultimately, ASP.NET Core should not be doing in-memory buffering of any IAsyncEnumerable<T> result. This completely defeats the entire purpose of both an enumerable type and an async pattern. It also means that if someone writes a web endpoint designed to simply continue sending data as long as the client remains connected, with an IAsyncEnumerable<T> whose enumerator never ends, then ASP.NET Core will attempt to buffer this infinite sequence of objects and eventually run out of memory. With proper support in the underlying serialization infrastructure, ASP.NET Core can remove this buffering and dynamically stream IAsyncEnumerable<T> sequences in the same manner as it already does IEnumerable<T>.

I believe that systematic support for IAsyncEnumerable<T> within System.Text.Json would be useful, not just to support this ASP.NET Core situation, but also to fill out the set of standard system types that can be serialized & deserialized. If System.Text.Json is expected to automatically "do the right thing" with IEnumerable and IEnumerable<T>, it is not unreasonable for a consumer to assume it will also "do the right thing" with IAsyncEnumerable<T>, and this is not presently the case.

Proposed API

I propose updating System.Text.Json so that it automatically serializes values of type IAsyncEnumerable<T> in much the same way that it currently automatically serializes IEnumerable<T>. Similarly, when deserializing, if it is asked to supply a value of type IAsyncEnumerable<T>, then, given as input a JSON array where each element can be deserialized as T, it should be able to supply some value implementing IAsyncEnumerable<T> that contains the deserialized data.

I do not believe there is any merit in requiring consumers to explicitly opt into or register this function; it should be a standard part of how JsonSerializer.Serialize and JsonSerializer.Deserialize work, exactly as IEnumerable<T> already is.

A complication of this is that in a high-volume setting, it is imperative that the engine not be invoking async operations from a synchronous context. As such, implementing this will require that JsonConverter be made fundamentally async. Most of its implementations will not actually operate in an asynchronous fashion, but the call stack must implement it this way end-to-end in order for IAsyncEnumerable calls that block to a continuation to release the thread back to the task scheduler. This is a breaking change that could only be introduced under a new major version number.

An alternative would be to make JsonConverter support both synchronous and async modes of operation, though in this situation, the engine would need to throw an exception if it encounters an IAsyncEnumerable object further down the call stack after having transitioned to a synchronous context. I believe this would be an antipattern that would encourage people to write code that works around the issue in ways susceptible to deadlock under some circumstances.

On deserialization, I believe the only rational approach is to buffer the entire sequence, which is in fact what the IEnumerable/IEnumerable<T> support already does. The reason for this is that the object graph could contain multiple IAsyncEnumerable<T> values, and if they were actually being asynchronously deserialized, each would require its own stream pointer in the underlying data, which is obviously not in a general sense possible. Thus, I believe System.Text.Json should provide a simple object based on List<T> that provides a minimal synchronous IAsyncEnumerable<T> implementation, so that deserialized values of type IAsyncEnumerable<T> provide the core behaviour of IAsyncEnumerable<T> with regard to enumeration.

Usage Examples

Example 1: Convert database query results into JSON

Assume the existence of a database adapter class that can retrieve records using the async model, returning the elements one at a time so that the entire result set is not held in memory:

public class Database
{
  public async Task<IAsyncEnumerable<Order>> GetAllOrders() { ... }
}

A consumer of this might convert all orders to JSON format with a statement such as:

Database database = ...;

var ordersJson = JsonSerializer.Serialize(await database.GetAllOrders());

(This currently returns the string {}.)

Example 2: ASP.NET Core endpoint using async programming model

Pseudocode:

interface IProcessor
{
  Task<IAsyncEnumerable<IOperation>> GetOperationsAsync();
}

interface IOperation
{
  bool IsCompleted { get; }
  Task<IAsyncEnumerable<IResult>> GetResultsAsync();
}

[Route("/v1")]
public class APIController : Controller
{
  IProcessor _processor;

  public async Task<IAsyncEnumerable<Result>> GetResults()
  {
    await foreach (var operation in await _processor.GetOperationsAsync())
    {
      if (operation.IsCompleted)
      {
        await foreach (var result in await operation.GetResultsAsync())
          yield return result;
      }
    }
  }
}

Alternative Designs

I think all alternatives currently available are antipatterns in that they require the consumer to be responsible for the way in which IAsyncEnumerable<T> is processed and serialized, repeating common code across many projects, sacrificing maintainability and performance.

Risks

I cannot think of any ways in which changing IAsyncEnumerable<T> serialization so that it returns a meaningful JSON array instead of an empty JSON object could cause problems. The only possibility I can think of is if someone were to make an object that had properties and also implemented IAsyncEnumerable directly, they might currently have code that expects the properties to be serialized. This seems like an exceedingly odd pattern to me that I suspect nobody has actually depended upon to date.

One possibility is an object that implements both IEnumerable<T> and IAsyncEnumerable<T>. Presently, this object serializes as an array because System.Text.Json picks up the IEnumerable<T> implementation. With this proposal, it would instead pick up IAsyncEnumerable<T>. It is my belief that if an object implements both, it would probably prefer to be enumerated asynchronously. If such an object did exist and provided different sequences, though, then this change would cause the serialized data to change.

Partial Implementation

I created an implementation of this prior to understanding the API proposal review process, and prior to fully comprehending the async-over-sync problem. It lives in PR #37981. The implementation reuses as much existing code as possible by leveraging existing serialization & deserialization functionality designed for the IEnumerable interface. IAsyncEnumerable<T> is not directly related to IEnumerable, but shares obvious genetic material with it.

This implementation would need to be updated to provide a natively-asynchronous model for JsonConverter in order to fully satisfy the needs of this proposal.

My proposed implementation also provides a simple List<T> implementing IAsyncEnumerable<T> as a place to put deserialized data so that code enumerating those IAsyncEnumerable<T> values can read it (even though it isn't actually async -- something which I don't believe to be possible anyway).

@logiclrd logiclrd added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 17, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jun 17, 2020
@logiclrd logiclrd changed the title System.Text.Json: IAsyncEnumerable support API Proposal: System.Text.Json: IAsyncEnumerable support Jun 17, 2020
@logiclrd
Copy link
Contributor Author

logiclrd commented Jul 5, 2020

Is there a way of getting some insight into how items get prioritized for triage and when this proposal might get assigned an owner?

@logiclrd
Copy link
Contributor Author

logiclrd commented Jul 5, 2020

@marek-safar Sorry for the tag out of the blue, but after some searching I saw you removed the untriaged label from another System.Text.Json issue recently. I'm not assuming you're the person to take ownership of this issue, but maybe you know who might be? :-)

@logiclrd
Copy link
Contributor Author

logiclrd commented Jul 5, 2020

An update: I have become aware of the following scenario:

  • The thread pool has a limited number of threads. On my system this limit is not tiny (4K for my memory/architecture) but it is nevertheless quite finite.
  • With sync over async, the thread initiating the operation might itself be a thread pool thread. Each new operation that comes in pulls a thread from the pool.
  • It invokes the async operation, which sets up an underlying I/O and then returns. When that I/O is completed, it queues up a thread pool action to process it.
  • While the async operation is running, the sync part at the top has to wait for it, which means blocking that thread pool thread it is occupying.
  • If you manage to get as many overlapping I/O operations as there are thread pool threads, then the entire thread pool will be occupied by those sync top ends blocking on the completion of the underlying I/O operations. When those I/O operations complete, their continuations won't ever get scheduled because there are no free threads.

Thus, a deadlock arises.

This means that in order to support serializing IAsyncEnumerable, the converter architecture in System.Text.Json must support async operation end-to-end. This means that my proposed change is insufficient; in addition to adding a converter for it, the WriteCore methods in JsonSerializer and JsonConverter need to turn into WriteCoreAsync, otherwise the thread can't be released if the IAsyncEnumerable encounters an operation that doesn't complete immediately.

@logiclrd
Copy link
Contributor Author

logiclrd commented Jul 5, 2020

Hmm, I can't think of a way of doing this without breaking existing JsonConverters that doesn't involve implementing the entire write stack twice, sync and async, with a fallback to sync when calling into a custom converter that doesn't support async and then throwing an exception if further down the object graph it hits a call that needs to be async.

It seems, then, that the only options are either that or a major version bump that forces all converters to be async. Are either of those possibilities at this point, or is System.Text.Json painted into a corner w.r.t. async enumeration of object graphs?

@layomia
Copy link
Contributor

layomia commented Jul 7, 2020

Thank you for your investigation and proposal. Due to time constraints on 5.0, we are unlikely to fit this in. I'll triage this as "Future" for now, and it can be considered for the next release.

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@layomia layomia added this to the Future milestone Jul 7, 2020
@logiclrd
Copy link
Contributor Author

logiclrd commented Jul 7, 2020

Thanks for your response :-) I have updated the proposal to include a description of the async-over-sync problem and what I believe it would take to resolve it. Ultimately it would be a breaking change, and I have no idea how or if things in the core framework can support that, even over major version number changes.

There's some argument that perhaps making the converter architecture async as its own change might be something to try to get in for .NET 5.0, to lay the groundwork for changes like this one, because doing it later might be harder. If there's ever a time when people would "accept" a breaking change associated with a major version bump it would be in the transition from .NET Framework 4.x / .NET Core 3.x to .NET 5.x, in my opinion. :-)

@steveharter
Copy link
Member

steveharter commented Dec 7, 2020

This means that in order to support serializing IAsyncEnumerable, the converter architecture in System.Text.Json must support async operation end-to-end. This means that my proposed change is insufficient; in addition to adding a converter for it, the WriteCore methods in JsonSerializer and JsonConverter need to turn into WriteCoreAsync, otherwise the thread can't be released if the IAsyncEnumerable encounters an operation that doesn't complete immediately.

The serialization design makes extension points including converters sync-only. This avoids the cost of using async all the way down when not needed and\or avoids the cost of duplicated code (in custom converters and in internal code) by having both async and sync method have "almost duplicated" code. It does this by managing "state" classes that allow the (de)serializer to unwind the physical call stack, do the async call (read or flush), and then resume the call stack manually. Also the Utf8JsonReader is a ref struct which can't be used in an async method (in a custom converter's extension point today), so there is also a technical limitation that is avoided with the current design -- to avoid that issue with Streams today, the serializer uses the JsonReaderState to hold the state, and then creates a Utf8JsonReader from that (but only for Streams).

So IAsyncEnumerable can't be safely implemented as a custom converter (avoiding sync->async calls) at least with the converter base classes exposed publicly today. The async handling and state management must be built-in, or the converter model expanded as you mention to support async.

@steveharter
Copy link
Member

steveharter commented Dec 8, 2020

This means that in order to support serializing IAsyncEnumerable, the converter architecture in System.Text.Json must support async operation end-to-end

In the original issue I added information and a link to a prototype that extends the serializer internally to handle this (and does not extend the custom converter architecture to handle async). It only handles the Deserialize+Stream case at this time (it is a proof-of-concept, not necessarily production quality or fully tested).

If an IAsyncEnumerable is obtained in a non-async deserialize or serialize method then the serializer should throw NotSupportedException since it doesn't want to support any async over sync patterns. I didn't do this in the prototype above.

@eiriktsarpalis
Copy link
Member

Addressed by #50778.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

5 participants