[Blazor] Persisting circuit state for Blazor applications

## Summary

We want to build on top of the existing functionality provided for preserving application state to enable opting in to hibernating server circuits under several circumstances and restoring the hibernated sessions afterwards:
* The connection to the client has been lost. This can happen for multiple reasons:
  * Mobile app switched app and the OS terminated the connection.
  * Other tab was opened and the browser throttled the tab.
  * The user is on a location with a spotty connection.
* The circuit has not been interacted with for a given amount of time (no event has been dispatched, no .NET interop call from the client has been received, no render update has been sent)
* The client deems that the user is not interacting with the application and wants to proactively pause the circuit to save resources.
* Proactively by the server for some other reason (like the server is restarting).

Persisting the server state is always an opt-in, best effort, progressive enhancement. The persisted state is not guaranteed to be recoverable, and in that case, the app falls back to the previous experience of losing the state.

## Motivation

Circuits reside in memory for their entire lifetime in a single server instance, when the connection to the client is lost, we keep a certain amount of circuits in memory for a given time to allow sessions to resume once the connection is re-established. However, if the amount of disconnected circuits goes above a threshold new disconnected circuits are immediately discarded and clients lose all their work. When the connection is lost for longer than the circuit is retained for, the circuit is again discarded.

When a circuit is discarded the session automatically goes away from memory and can't be recovered, causing users to lose all their unsaved work. This is especially true when the user is on a mobile platform like a phone or tablet. In this situation, when switching away from the browser application the connection with the server is normally terminated resulting in the loss of any unsaved work in the majority of cases.

There are other factors that contribute to potential information loss on circuits, like a server restarting, in which case all the circuits in that process are discarded, resulting on the work being lost.

Another important scenario is when a session is left opened but unused, for example when a user leaves their browser opened before going home. In that scenario the circuit is kept alive consuming resources that can be used for serving other users.

## Goals

* Provide the ability to hibernate a circuit and restore a session after the original circuit was discarded from memory.
* Provide the ability to proactively pause a circuit from the server.
* Enable library authors to create persist friendly components that can be leveraged by application developers.
* Enable library authors to create storage mechanisms for persisting the state of the circuits.
* Enable application developers to create more reliable apps without having to manually handle the details of persisting the application state.

## Non-goals

* Automatically hibernating and waking up circuits based on user activity.
* Guaranteeing that the state is recoverable in all cases
* Persisting state after each user interaction (we aren't reimplementing webforms).
* Changing affinity requirements on Blazor Server applications.
* Application "upgrade" scenarios (N->N+1 deployments) and reboots.

## Scenarios

### Server reboot

As a developer I need to reboot/update my application/operating system/container periodically. When its time for the application to update, I need to shutdown the existing application and I want to migrate existing user sessions to a different server while I perform the update. Users might get a notification about their work being partially interrupted but they can resume their session in a separate endpoint while the updates are being applied on the server.

The flow for this scenario is as follows:

* The developer registers a service in startup to hibernate circuits to persistent storage.
* The developer registers a service to be notified when the server is shutting down gracefully.
* When the server emits the notification that the server is shutting down, it can access the list of existing circuits and trigger their hibernation.
* The circuit state can be saved on the server or optionally sent to the client as part of the hibernation process (developer has the choice to decide).
* For each hibernated circuit, the client receives a notification about the hibernation, so that the experience on the UI can be adapted (display the connection lost UI, or a different UI, enable JS components to be notified of the situation to avoid sending events to the server, etc).
* When the hibernation is initiated by the server, the client can decide when to start the "resume" process, for example via a button on the UI or after a period of time.
* To restart the process the client sends a message to the server with the circuit id, the original component descriptors and the persisted state if it was stored on the client.
* When a server receives a "resume" message, it fetches the application state if necessary, restores it and re-renders the set of components (triggers an "attach component message") as well as sends the render batch for the rendered components.
* When the client receives the first render batch after a "resume" operation, it needs to clear the component node content before re-applying the changes to the root component.
* After this is done, the application is free to resume.

### Connection lost for a longer period of time

As a developer I want to provide an improved experience on mobile browsers where its common that the connection is lost when a user switches from the browser app to a different app and comes back after a while. I want to be able to get a notification when the circuit is going to be evicted and to get the opportunity to save the circuit state into more permanent storage so that the session can be resumed afterwards when the user switches back to the browser.

### Proactively pausing circuits

As a developer I want to have a mechanism that enables me to pause circuits that I deem inactive to preserve server resources and enable customers to resume their session afterwards.

## Detailed design

### Abrupt disconnection

In this scenario, the connection from the server and the client is lost abruptly. After the initial disconnection period, when the circuit is going to be evicted from memory, a new callback is triggered to persist the circuit state. At that point, the server collects the list of root components and their parameters, as well as any state within the circuit that the app developer wants to persist, and pushes it to some storage mechanism. The details about this storage mechanism are described later in the document.

If the client is still running and tries to re-connect to the server, the server first checks if the circuit is on the disconnected pool, and if not, it performs an additional check to see if there was state persisted for that circuit. If there was, the server creates a new circuit, instantiates all the root components with the given state, attaches the components to the DOM and sends a render batch to the client to re-render the components.

```mermaid
sequenceDiagram
  participant Client
  participant Server

  Client->>Server: Connection lost
  Server->>Server: Check if circuit is in disconnected pool
  alt Circuit in disconnected pool
    Server->>Client: Resume session
  else Circuit not in disconnected pool
    Server->>Server: Check if state is persisted
    alt State is persisted
      Server->>Server: Create new circuit
      Server->>Server: Instantiate root components with state
      Server->>Client: Send render batch to re-render components
    else State is not persisted
      Server->>Client: Unable to resume session
    end
  end
```

### Collaborative disconnection

In this scenario, the client and the server have an active connection. The developer might choose to pause a given circuit based on some criteria, like the circuit not being interacted with for a given amount of time, the window not being visible in the browser, etc.

We will provide APIs for the developer to trigger the pausing process for a given circuit. The developer is free to choose what criteria to use to trigger the pause. Some options are:
* Send a JS interop call to the server when something happens in the browser (like the window not being visible).
* On the server, respond by pausing the circuit.
* Use a CircuitHandler to monitor the circuit and trigger the pause process if no interaction is detected (no events, no JS interop).
* Monitor the application lifetime and trigger the pause process when the application is about to be shut down.

In the abrupt disconnection scenario, the server is the one that triggers the hibernation process and is forced to save that state to some storage mechanism. In the collaborative scenario, given that there is an active connection, the server might choose to push the state to the client. When the reconnection happens, the client can send the state back to the server to resume the session.

```mermaid
sequenceDiagram
  participant Client
  participant Server

  Client->>Server: Trigger pause
  Server->>Server: Persist state
  Server->>Client: Push state to client
  Server->>Server: Cleanup circuit
  Client->>Server: Reconnect (+ state)
  Server->>Server: Create new circuit
  Server->>Server: Instantiate root components with state
  Server->>Client: Send render batch to re-render components
```

### Defining what state to persist

The data to persist can come from two locations:
* Component state:
  * This is state that the component is using to render, for example, it might be a list of items retrieved from the database, or a form that the user is filling out.
* Scoped services:
  * This is state that is hold on inside a service, it might be something like the current user, or any other similar piece of state.

#### Persisting state for components

Persisting state for components works by annotating properties in the component with the `[SupplyFromPersistentComponentState]` attribute. This attribute is a marker for a new `CascadingValueParameter` that is provided by the framework to the component. The framework uses the available `PersistentComponentState` (if there) to provide the value to the component, and registers a callback to persist the state when the circuit is going to be hibernated. The same cascading value provider takes care of unsubscribing the component if the component is removed from the component tree.

```csharp
@if(Items == null)
{
  <div>Loading...</div>
}
else
{
<ul>
    @foreach (var item in Items)
    {
        <li>@item.Name</li>
    }
</ul>
}

@code {
    [SupplyFromPersistentComponentState]
    public List<Item> Items { get; set; }

    protected override Task OnInitializedAsync()
    {
        Items ??= await LoadItemsAsync();
    }
}
```

By default, the data needs to be JSON serializable. A hook to customize the serialization/deserialization process will be available to support alternative formats and customization.

We also require a key under which we store each persistent component state entry. In the case of components, we are going to use the parent component type + (@key if available) + component type + Property name. We use these four properties as a way to "pseudo-uniquely" identify a component inside the component tree.

This is a simplification over the more "correct" behavior that would require us to traverse the component tree to create a truly unique key. However, we already use this approach in other areas of the framework, like preserving components during enhanced page navigation, and it has proven to be good enough. If we need, in the future, we are free to change this approach to a more robust one.

With the current approach, a conflict with the keys can only happen if there are multiple instances of the same component rendered under the same parent component. The most common case for this is when rendering a component inside a loop (for/foreach). When this happens, there are a couple of ways to address the situation:

* Move the state to be persisted into the parent component, and provide that state to the children.

```csharp
@foreach (var item in Items)
{
  <ChildComponent Item="item" />
}
@code {
  [SupplyFromPersistentComponentState]
  public List<Item> Items { get; set; }
}
```

* Use a `@key` to provide a unique identifier for each component instance (something you should be doing anyway to help Blazor with rendering).
  * The moment you provide a key, we use can use it as input to uniquely identify the component.
  * Even in the cases where you are using some data from your model to generate the key (like an ID property) you can still append some unique identifier to the key to ensure uniqueness in that call site (you might even want to receive that unique identifier as input to your component)

```csharp
@foreach (var item in Items)
{
    <MyComponent @key="@($"unique-prefix-{item.Id}")" Item="item" />
}
```

* Persist data imperatively.
  * This is always an option available with the current PersistingComponentState
API, and for advanced use cases where more control is needed is the right choice.
  * For example, when implementing controls as part of a library where you want to
allow the consumer to control if you should be persisting the state or not, and
to give them control over the key to use and how that state is persisted.

#### Persisting state for scoped services

Persisting scope for services works by letting the service take an instance of `PersistentComponentState` as a parameter and using an extension method within the constructor to setup
the callback to persist the state in case the circuit goes away. This same
mechanism registers data to ensure that the service is re-instantiated, and the state
is restored when the circuit is re-created.

The state to be persisted is identified as the public properties on the service
that are annotated with `[SupplyFromPersistentComponentState]`.

```csharp
public class MyService
{
    public MyService(PersistentComponentState persistentState)
    {
        persistentState.PersistState(this)
    }
}
```

### How is state persisted

Persisting state builds on top of the existing `PersistentComponentState` API used for persisting component state to the interactive render modes during
prerendering of the application. In this way, the work that the user does to
annotate components and services for a better prerendering experience can be reused
in this context as well as with enhanced navigation (in the future).

### Persistence stores

The framework will provide several built-in state persistence locations to store
the state of the circuits:

* BrowserStore: Will persist the state to the client when a connection is available.
* InMemoryStore: Will persist the state in memory on the server. This acts as a second level of cache after the circuit has been evicted.
* Distributed storage through HybridCache: Will persist the state to distributed storage mechanisms like Azure Blob Storage, Redis, or databases through the HybridCache abstraction.

#### Browser store

The browser store is only available in collaborative disconnection scenarios.
The store will use the Data protection APIs to encrypt the state before sending it
to the client, where the client will hold on to the state in memory until/after
it tries to resume the session. If the client is unable to store the state or the storage fails, the system will automatically fall back to server-side storage mechanisms.

#### In memory store

This will store the state in memory on the server, with configurable
expiration times for both in-memory and distributed retention periods, and is a default fallback mechanism for abrupt disconnections after the
circuit has been evicted. We think that it is advantageous to support this over
keeping the circuit in memory for a longer time as it should require far less
memory.

The current implementation will rely on HybridCache when available, which provides a more robust solution with both local and distributed caching capabilities. When HybridCache is not available, the system falls back to MemoryCache.

The in-memory store has configurable limits in terms of number of entries as well as the
retention periods for each of those entries.

#### Distributed storage through HybridCache

All distributed storage mechanisms (Azure Blob Storage, Redis, Entity Framework, etc.) are supported through the HybridCache abstraction. Developers configure their preferred distributed storage provider through HybridCache, and the circuit persistence system automatically uses it when available. This provides a unified approach to distributed storage rather than requiring separate packages for each storage provider.

## Risks

* Failing to persist the state to a third-party storage system after the circuit has been evicted.  * This might happen if a third-party store becomes unavailable after we've persisted the state and before we've evicted the circuit.  * We can allow multiple storage mechanisms that are used in priority order, so that if one fails, we can try the next one.  * Ultimately, it's acceptable if the state gets lost at that point, as the experience then becomes equivalent to the disconnected circuit scenario in the past.

* Failing to restore the state when the circuit is re-created.  * This can happen if for example, the storage mechanism is not available at the time of restore.  * This can also happen if the application doesn't re-render the same component tree given the parameters and the state it stored.

* Developers storing too much state:  * It's up to the developer to choose and control how much they want to store. We can provide guidance and metrics to help developers make the right choice.

* Inconsistent state persisted:
  * The data can't be partially persisted. We will always data protect the state (except maybe for pure in memory scenarios). That guarantees the integrity of the data, as any change in the data will make it unreadable.

* State is restored multiple times:  * The browser drives the process to resume the circuit. At the time it requests the circuit to be restored, a persistent connection to the server has been established via SignalR. The server is only going to try to resume the circuit once and will produce an error on subsequent attempts if the resumption has already started.
  * Trying to resume an already active circuit has the same implications.

## Drawbacks

This feature requires the developer to actively opt-in to the state it wants persisted and requires some level of configuration to get it enabled, as opposed to it happening without user intervention.

## Considered alternatives

### Automatically persisting the state for the entire component tree

This is deemed unfeasible because of the general inability to serialize random
state on the circuit. The state can be anything, might not be serializable, or
might be to expensive to serialize.

## Potential APIs and usage scenarios

The purpose of this section is not to bike-shed on the API design, but to provide a general idea of how the API might look like.

### Configuring circuit persistence

```csharp
services.AddRazorComponents()
  .AddInteractiveServerComponents();
```

By default no gesture is needed, client and in-memory storage are enabled by default. HybridCache is automatically used when available in the service container.

### Configuring retention periods

```csharp
services.Configure<CircuitOptions>(options =>
{
    options.PersistedCircuitInMemoryRetentionPeriod = TimeSpan.FromHours(2);
    options.PersistedCircuitDistributedRetentionPeriod = TimeSpan.FromHours(8);
    options.PersistedCircuitInMemoryMaxRetained = 1000;
});
```

### Configuring an external storage mechanism through HybridCache

```csharp
services.AddHybridCache()
  .AddRedis("connectionstring");

services.AddRazorComponents()
  .AddInteractiveServerComponents();
```

### Proactively pausing a circuit from the client

```csharp
services.TryAddEnumerable(ServiceDescriptor.Scoped<CircuitHandler, MyCircuitHandler>());

public class MyCircuitHandler(IJSRuntime runtime) : CircuitHandler
{
  public override Task OnCircuitOpenedAsync(Circuit circuit, CancellationToken cancellationToken)
  {
    _circuit = circuit;
    await runtime.InvokeVoidAsync("registerCircuit", JSObjectReference.Create(this));
  }

  [JsInvokable]
  public async Task Pause()
  {
    await _circuit.PauseAsync();
  }
}
```

```javascript
function registerCircuit(handler) {
  window.circuitHandler = handler;
}

document.addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') {
    window.circuitHandler.invokeAsync('Pause');
  }
});
```


[Blazor] Persisting circuit state for Blazor applications #60494

Description

Summary

Motivation

Goals

Non-goals

Scenarios

Server reboot

Connection lost for a longer period of time

Proactively pausing circuits

Detailed design

Abrupt disconnection

Collaborative disconnection

Defining what state to persist

Persisting state for components

Persisting state for scoped services

How is state persisted

Persistence stores

Browser store

In memory store

Distributed storage through HybridCache

Risks

Drawbacks

Considered alternatives

Automatically persisting the state for the entire component tree

Potential APIs and usage scenarios

Configuring circuit persistence

Configuring retention periods

Configuring an external storage mechanism through HybridCache

Proactively pausing a circuit from the client

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions