-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blazor] Persisting circuit state for Blazor applications #60494
Comments
I would like to make data consistency explicit goal. Our implementation should not lead to corrupted state. One such scenario is when 2 server nodes will be storing the state out of order.
This would lead to mismatch between client state and persisted state. One of the possible solutions is optimistic concurrency, essentially counter in the client, which is increased with every message/event to the server. If we detect that version of the persisted state is ahead what the node is trying to store we have choices of
If we have 1K circuits in memory with 500KB state each, there would be 500MB of traffic and JSON serializing CPU cost for doing it.
JSON doesn't deal with data cycles well. Maybe we need to call that out of scope ? How can we validate this is feasible/valuable for large portion of our user base ?
Should we figure out how to integrate this with EF ? Storing only "dirty" data ?
This should probably be out of scope for "server shutdown" scenario.
I probably didn't fully understand the eviction callback API. Is that on C# on server side or in the client ? Anyway, I think there should be TTL configuration and automatic eviction implemented by us. |
Thanks @pavelsavara for taking a look, I'll try to address your concerns below:
I understand what you are bringing up, but I don't think what we are describing here matches the design of the feature. Here is how the system will work in a solution that is similar to what you are proposing: sequenceDiagram
participant Client
participant Server A
participant Server B
Client -> Server A: Start session (circuit-id)
Client -> Server A: Trigger event
Server A -> Client: Update UI
Client -> Client: Connection lost, at this point the UI is frozen (displaying reconnection dialog)
Server A -> Server A: Circuit disconnected
Server A -> Server A: Circuit evicted (Persist state (circuit-id))
par Reconnection
Client -> Server B: Reconnect
Server B -> Server B: Check disconnected circuit cache
alt circuit-id is found in cache
Server B -> Server B: Reconnect circuit
else
Server B -> Server B: Check persisted component state
alt circuit-id is found in persisted state
Server B -> Server B: Create new circuit with persisted state
Server B -> Client: Reconnection sucessful
else
Server B -> Client: Reconnection failed
end
end
and Server A restarting
Server A -> Server A: Restart
Server A -> Server A: Waiting for new circuits
end
There are two key points that make what you are describing impossible:
The scenario that you are describing is more in line with SPA applications where state is hold on to by the client, which can update it of its own accord. In Blazor server the entire state is maintained on the server, so the client can't modify the state in any way when it's not connected to the server, hence there's no state to be reconciliated between the two. To put it in a different way, Blazor Server is more like a traditional SSR app, where you store all the state in memory inside session state.
Yes, ASP.NET has events for this, but we wouldn't register to persist circuits on shutdown automatically. The design enables you to implement this scenario but are not planning to do this out of the box. The shutdown time is configurable in ASP.NET Core, so they can adjust it as needed.
I'm not sure what you mean by this, however, this is up to the application developer that needs to measure and adjust settings if necessary for their specific app. To help clarify things, the only scenario that we handle automatically out of the box is the abrupt disconnection scenario (when the client loses connection for too long). In any other case, we give you the APIs and samples, but we don't implement anything automatically.
Sure, we'll do some testing and offer numbers, with that said, I'll add a bit more detail. The cost here comes from:
For all these things we don't offer any "guarantee" as they all scale based on your hardware and specific deployment setup. What we care about in these cases is how does your system scale:
To put some numbers in context, here is an article from Azure Blob Storage https://azure.microsoft.com/en-us/blog/high-throughput-with-azure-blob-storage/
If we extrapolate these numbers to the ones you proposed, it takes .5s to send and write the 500MB.
Yes, serializing graphs is out of scope and being JSON serializable with default settings is a general invariant for Blazor for anything that requires serialization in Blazor.
Our guidance is clear in this sense that this is a requirement and that if you need to do something that is not supported, you need to handle serialization and deserialization of that payload yourself. In particular for this feature, we are not adding any new constraint, we are reflecting the existing constraints in
We have a hard time evaluating these types of things because it's not usually possible for us to get this data.
We can't only store "dirty" data because otherwise when the app is resumed from persisted state it wouldn't render the same UI or would need to fetch the data from the original data source again, which would likely introduce a flicker on the UI. Ultimately, it's up to the developer to decide what state they want to save.
I'll try to clarify this a bit further. The only scenario that we handle out of the box is "abrupt disconnection" which is when you lose connection for long enough that we end up discarding the circuit. For any other scenario ("collaborative disconnection") we offer an API on the Server to start the process. Using that API, you can build whatever policy you want to proactively start this process. For example, by:
All these three examples and more, we make possible for you to implement, but we don't ship any of them out of the box.
Yes, this is handed already by the caching abstractions that we have in ASP.NET core, but I should have mentioned it explicitly. There are limits for all these things that we will also have to cover during threat modeling and security review. Thanks again for taking a look. I think I've answered/clarified all the questions, but if there's anything that I missed or you want to further discuss, please let me know. As I work through more of the implementation, other details will surface, and I'll expand this from those docs. |
From browser user perspective, any key pressed is a new "state".
I still think that the new web-socket connection to server B could race the previous web-socket connection to server A Also when you talk to storage backend via network and without transaction, network messages can get lost or race each other. Adding version to each message is cheap and most stores support it. |
Are we going to have "restore" callback for the user to fine tune that DB fetch ?
This is because of element identity, right ? Can you please explain why this is not a problem for "out of the box persistence" ? What the customers could do to avoid the flicker ?
Agreed, but I would like to make sure there are use-cases in which it even makes sense for the customer to use. |
That's not state from a Blazor Server perspective. The only state that matters is the one from the server. Once the app is disconnected no events are processed for the server so any keystroke, etc. is lost. This is how Blazor Server works today already, we aren't changing that as part of this feature.
This is not possible; the client will only try to connect to B after it has lost the connection to A, and won't try to reconnect to A, as it only handles 1 connection at a time.
Ordered delivery is provided by SignalR between the client and the server. |
There is a well-defined pattern for this already. You can check if you have a value already and if not, perform the fetch. From the app point of view, there's no difference between a new app and a resuming app, the only delta is whether or not state has been restored.
|
It doesn't have anything to do with element identity, it has to do with the way Blazor renders
This produces two renders when there is no data and we call If your app was already showing data, when the app is restored and goes through the render cycle, it will first render the loading screen for a split second, then once again after the data is loaded, that's where the flickering happens. |
Could we make sure that |
Enhanced navigation is a little different because you still have the old DOM to compare against. So we can pretty easily avoid, for example, merging two components that reside at different depths in the DOM tree. I would think that key ambiguity would be much more common in the state persistence case. For example, imagine you had a file explorer app with a When a key conflict does occur, would we treat this as an error, as we do today? I would assume this to be the case, but just want to check. If so, this means that in addition to potentially happening more frequently, key conflicts would be more severe in state persistence than they are in enhanced navigation.
I see that "upgrade" scenarios are listed as a non-goal, which I agree with. When using the browser to store persisted state, should we actively prevent that state from being used to reconnect to a "newer" version of the app? There might be a risk where the updated app changes its assumptions about which values it expects for persisted properties, but the computed component keys don't change, and this causes unexpected or invalid state to get supplied to a component. The framework might be able to help with clearing caches it has control over (i.e., those directly accessible by the server), but we might want to know if persisted state from the browser comes from an older version of the app so that the server can discard it. Edit: Or maybe you could just generate a new data protection key so that old state automatically becomes indecipherable. |
Summary
We want to build on top of the existing functionality provided for preserving application state to enable opting in to hibernating server circuits under several circumstances and restoring the hibernated sessions afterwards:
Persisting the server state is always an opt-in, best effort, progressive enhancement. The persisted state is not guaranteed to be recoverable, and in that case, the app falls back to the previous experience of loosing the state.
Motivation
Circuits reside in memory for their entire lifetime in a single server instance, when the connection to the client is lost, we keep a certain amount of circuits in memory for a given time to allow sessions to resume once the conection is re-stablished. However, if the amount of disconnected circuits goes above a threshold new disconected circuits are inmediately discarded and clients loose all their work. When the conection is lost for longer than the circuit is retained for, the circuit is again discarded.
When a circuit is discarded the session automatically goes away from memory and can't be recovered, causing users to loose all their unsaved work. This is specially true when the user is on a mobile platform like a phone or tablet. In this situation, when switching away from the browser application the connection with the server is normally terminated resulting in the loss of any unsaved work in the majority of cases.
There are other factors that contribute to potential information loss on circuits, like a server restarting, in which case all the circuits in that process are discarded, resulting on the work being lost.
Another important scenario is when a session is left opened but unused, for example when a user leaves their browser opened before going home. In that scenario the circuit is kept alive consuming resources that can be used for serving other users.
Goals
Non-goals
Scenarios
Server reboot
As a developer I need to reboot/update my application/operating system/container periodically. When its time for the application to update, I need to shutdown the existing application and I want to migrate existing user sessions to a different server while I perform the update. Users might get a notification about their work being partially interrupted but they can resume their session in a separate endpoint while the updates are being applied on the server.
The flow for this scenario is as follows:
Connection lost for a longer period of time
As a developer I want to provide an improved experience on mobile browsers where its common that the connection is lost when a user switches from the browser app to a different app and comes back after a while. I want to be able to get a notification when the circuit is going to be discarded and to get the opportunity to save the circuit state into more permanent storage so that the session can be resumed afterwards when the user switches back to the browser.
Proactively hibernating circuits
As a developer I want to have a mechanism that enables me to hibernate circuits that I deem inactive to preserve server resources and enable customers to resume their session afterwards.
Detailed design
Abrupt disconnection
In this scenario, the connection from the server and the client is lost abruptly. After the initial disconnection period, when the circuit is going to be evicted from memory, a new callback is triggered to persist the circuit state. At that point, the server collects the list of root components and their parameters, as well as any state within the circuit that the app developer wants to persist, and pushes it to some storage mechanism. The details about this storage mechanism are described later in the document.
If the client is still running and tries to re-connect to the server, the server first checks if the circuit is on the disconnected pool, and if not, it performs an additional check to see if there was state persisted for that circuit. If there was, the server creates a new circuit, instantiates all the root components with the given state, attaches the components to the DOM and sends a render batch to the client to re-render the components.
Collaborative disconnection
In this scenario, the client and the server have an active connection. The developer might choose to hibernate a given circuit based on some criteria, like the circuit not being interacted with for a given amount of time, the window not being visible in the browser, etc.
We will provide APIs for the developer to trigger the hibernation process for a given circuit. The developer is free to choose what criteria to use to trigger the hibernation. Some options are:
In the abrupt disconnection scenario, the server is the one that triggers the hibernation process and is forced to save that state to some storage mechanism. In the collaborative scenario, given that there is an active connection, the server might choose to push the state to the client. When the reconnection happens, the client can send the state back to the server to resume the session.
Defining what state to persist
The data to persist can come from two locations:
Persisting state for components
Persisting state for components works by annotating properties in the component with the
[SupplyFromPersistentComponentState]
attribute. This attribute is a marker for a newCascadingValueParameter
that is provided by the framework to the component. The framework uses the availablePersistentComponentState
(if there) to provide the value to the component, and registers a callback to persist the state when the circuit is going to be hibernated. The same cascading value provider takes care of unsubscribing the component if the component is removed from the component tree.By default, the data needs to be JSON serializable. A hook to customize the serialization/deserialization process will be available to support alternative formats and customization.
We also require a key under which we store each persistent component state entry. In the case of components, we are going to use the parent component type + (@key if avilable) + component type + Property name. We use these four properties as a way to "pseudo-uniquely" identify a component inside the component tree.
This is a simplification over the more "correct" behavior that would require us to traverse the component tree to create a truly unique key. However, we already use this approach in other areas of the framework, like preserving components during enhanced page navigation, and it has proven to be good enough. If we need, in the future, we are free to change this approach to a more robust one.
With the current approach, a conflict with the keys can only happen if there are multiple instances of the same component rendered under the same parent component. The most common case for this is when rendering a component inside a loop (for/foreach). When this happens, there are a couple of ways to address the situation:
@key
to provide a unique identifier for each component instance (something you should be doing anyway to help Blazor with rendering).Persisting state for scoped services
Persisting scope for services works by letting the service take an instance of
PersistentComponentState
as a parameter and using an extension method within the constructor to setup the callback to persist the state in case the circuit goes away. This same mechanism registers data to ensure that the service is re-instantiated, and the state is restored when the circuit is re-created.The state to be persisted is identified as the public properties on the service that are annotated with
[SupplyFromPersistentComponentState]
.How is state persisted
Persisting state builds on top of the existing
PersistentComponentState
API used for persisting component state to the interactive render modes during prerendering of the application. In this way, the work that the user does to annotate components and services for a better prerendering experience can be reused in this context as well as with enhanced navigation (in the future).Persistence stores
The framework will provide several built-in state persistence locations to store the state of the circuits:
Browser store
The browser store is only available in collaborative disconnection scenarios. The store will use the Data protection APIs to encrypt the state before sending it to the client, where the client will hold on to the state in memory until/after it tries to resume the session.
In memory store
This will store the state in memory on the server, with a configurable expiration time, and is a default fallback mechanism for abrupt disconnections after the circuit has been evicted. We think that it is advantageous to support this over keeping the circuit in memory for a longer time as it should require far less memory.
The current implementation will rely on MemoryCache, but it is possible that in the future we can instead rely on HybridCache to provide a more robust solution. The reason to use
MemoryCache
is that it is part of ASP.NET, which HybridCache is not.The in-memory store has limits in terms of number of entries as well as the length for each of those entries.
Azure Blob Store, Redis Store, Entity Framework Store
These are all similar to the equivalent Data Protection storage providers, and will store the state in the respective storage mechanism. The developer will need to provide the necessary configuration to use these stores.
Risks
Failing to persist the state to a third-party storage system after the circuit has been evicted.
Failing to restore the state when the circuit is re-created.
Developers storing too much state:
Inconsistent state persisted:
State is restored multiple times:
Drawbacks
This feature requires the developer to actively opt-in to the state it wants persisted and requires some level of configuration to get it enabled, as opposed to it happening without user intervention.
Considered alternatives
Automatically persisting the state for the entire component tree
This is deemed unfeasible because of the general inability to serialize random state on the circuit. The state can be anything, might not be serializable, or might be to expensive to serialize.
Open questions
Potential APIs and usage scenarios
The purpose of this section is not to bike-shed on the API design, but to provide a general idea of how the API might look like.
Configuring circuit persistence
By default no gesture is needed, client and in-memory storage are enabled by default.
Configuring an external storage mechanism
Proactively evicting a circuit from the client
The text was updated successfully, but these errors were encountered: