-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Summarization (DDS) #7615
Comments
I do not like the last approach, as it's very easy to make a mistake (insert an await somewhere, maybe indirectly by refactoring code) and break timing of operations. So I think it should be explicit. I believe second approach (snapshotCore / summarizeCore) is preferred one, but it it can't be implemented while dds/data store attachment process (end-to-end) has to be synchronous (as it is today, as it's part of something synchronous, like setting a key on a map) - making it support asynchrony is possible (via reserving a spot in pending manager and pausing all future ops from being sent), but I'm not sure about cost / if this is right approach (though I'll point out that blob upload has some of the same problems - see #7833). My hope is that with the work @andre4i is doing in the space of named component creation, we could eventually get rid of this requirement (of attachment process having same shape as it has today), but the solution is likely 2 months out. We could try to meet half way there, i.e. separate the process of object creation and naming even in today's system (without waiting for more complex work that Andrei is doing), but given this is not back-compat change, we will need new behavior to ship dark and take couple versions to saturate. If Whiteboard is Ok waiting, that would be my preference, as it helps move this item in right directly while helping Andrei make progress on his side as well. If this is not acceptable, then we need to go (in the meantime) with first approach (two independent APIs - one async & one sync). And to capture requirements for 2nd solution (snapshotCore / summarizeCore), for future - we should scope it to one of the following cases:
|
We've decided on the approach where we separate capturing the current DDS state from the process of serializing and uploading it. BackgroundA Fluid container holds the root of a graph of IChannel instances (DataStores have an IChannel, DDSes are an IChannel). During summarization, the container runtime visits the graph of IChannel instances, invoking 'IChannel.summarize()' on each, and aggregating the returned ISummaryTrees. The final composite ISummaryTree is then serialized and uploaded to the service. Attachment works similarly, except that it begins with the 'bindFluidDataStore()' entry point, which invokes 'generateAttachMessage()' on each DataStore, eventually bottoming out in the same 'summarize()' method. Work RequiredThe required work for this item is to break the graph visitation into two passes. The first pass visits the IChannels, instructing the channel to capture whatever state it will need in order to summarize itself. Because attachment occurs on the user's client (vs. the designated summarization client), this pass must remain synchronous to avoid local mutations while the snapshot is being captured. The second pass revisits the same graph of IChannels, requesting that they produce an ISummaryTree from their previously captured state. This second pass is asynchronous, which allows allows the DDS implementer to upload or download blobs, and/or periodically yield. What the DDS captures during the first pass is up to the DDS. DDSes using functional/persistent/immutable data structures may perform a cheap clone during this phase, while other DDSes may continue to synchronously produce an ITree in the first pass and simply return it in the second pass. Note: Earlier in the thread, I suggested a couple of potential APIs for exposing the two phases to the DDS, but leave the final API shape to the implementer. :-) |
I think we should have an item to apply this design to ContainerRuntime, i.e. how it summarizes data stores themselves. That said, this can be done as the very last item, as it will not block the remaining of the work and ability of clients like Whiteboard to leverage async DDS summarization flow. |
@pleath @anthony-murphy @vladsud, please review if interested. |
What to review? Daniel's latest post? I've replied after it. |
If you're done, that's good. Thanks. |
Looks good to me. Might want @agarwal-navin to evaluate GC impact, which i think already does something similar to the two pass summary. wonder if there is the potential for consolidation here |
This same problem of serializing the data during summaries applies when getting the GC data ( |
From discussion with Navin and Daniel, there are several ways to deal with GC:
|
Background
Summarization involves capture the container's state at a specific instant in time. Normally, this process runs on a special 'read-only' summarization client, which ensures that the container's state is not modified while the snapshot is in processes.
However, when a container is initially attached to the service, summarization happens on the user's client. To avoid the possibility of the application mutating the container's state while this capture is in progress, we synchronously invoke 'snapshotCore' on the hierarchy of DataStores and DDSes without yielding the current JavaScript turn.
Motivations
Yielding the UI thread
Synchronously invoking 'snapshotCore' on the DataStore/DDS hierarchy presents some challenges to customers with moderate to large containers. For these customers, summarization is a large synchronous task that blocks the UI thread, resulting in noticeable UI glitches with the majority of the time spent in 'JSON.stringify()'.
Summary blob reuse
Another challenge is that customers use our 'uploadBlob()' API to move portions of their summary data outside of the ITree returned by 'snapshotCore'. The primary motivations for using 'uploadBlob()' are the desire to reuse chunks from previous summaries and to choose their own serialization format.
While feasible today, storing summary data via 'uploadBlob()' is complex blobs must finish uploading in advance of the summary request in order to be able to reference them via their IFluidHandle.
Data Migration
The third motivation is the ability to migrate data to a new summary format without downloading it all into the client's memory at one time. To do this, the client must be able to asynchronously download, covert, and upload chunks of the summary.
Design Options
SELECTED OPTION: Separate attachment from summarization
One solution is to decouple the attachment scenario from the summarization scenario by offering two API entry points:
attachCore(): ITree
' is synchronous and invoked once on a user's client when the container is first uploaded to the service.summarizeCore(): ITree
' is asynchronous and periodically invoked on the read-only summarization client.Both APIs would perform essentially the same work, with the only difference being that
summarizeCore()
would give the DDS implementor an the option of yielding. Many DDSes will likely implementsummarizeCore()
by invokingattachCore()
.Separate capture from serialization/upload
Another way to split the API surface is to have separate methods for capturing the current snapshot vs. serializing/uploading the snapshot:
captureCore(): TState
' synchronously returns a TState (the DDS specifies the TState type)summarizeCore(TState): ISummaryTreeWithStats
' asynchronously processes TState to produce an ISummaryTreeWithStats.DDSes may choose to synchronously summarization in the first method (i.e., TState = ISummaryTreeWithStats), thereby making 'summarizeCore()' a no-op. However, DDSes built on persistent data structures may chose to return their current immutable state from 'captureCore()' and process it in 'summarizeCore()'.
The Fluid runtime must:
Same as above, but in a single method
This is identical to the above option, except that instead of having the Fluid runtime collect the TSnapshot instance to pass them to the summarizeCore, we combine 'snapshotCore()' and 'summarizeCore()' into a single method:
Again, the runtime must synchronously walk the graph of DS/DDSes, invoking 'summarizeCore()'. However, this time we skip the first step (passing TSnapshot from one method to another), and instead gather promises in the first pass which the runtime awaits and reassembles.
(See also #6571)
Work items
The text was updated successfully, but these errors were encountered: