Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swingstore support for hangover recovery #8433

Open
warner opened this issue Oct 4, 2023 · 1 comment
Open

swingstore support for hangover recovery #8433

warner opened this issue Oct 4, 2023 · 1 comment
Labels
swing-store SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Oct 4, 2023

Originally posted by @mhofman in #8089 (comment)

Consider having the swing-store expose a new feature that makes it easier for the host to efficiently implement the hangover inconsistency logic. This would be an ordered list of entries, each representing a command that was sent to cosmos, and the response that was received. The host can insert new entries (one at a time), get all entries, and clear all entries. We may take the opportunity to support multiple set of entries, supporting replay for multiple blocks (to solve issues like #6736). Currently this is implemented as an in-memory array that is then JSON serialized into a host key of the kv store before commit.

So the idea here is to replace the ad-hoc code in cosmic-swingset with a first-class hangover-management feature. Swingstore would act a lot like a "CDP" from #6447: it would have an "output queue" of actions it wants the host to accept, and we need to make sure that the host accepts the same sequence of actions even if the pair gets interrupted at an inconvenient time. In this case, we don't need to embargo the outputs (because all our actions are getting stored in the host DB, which doesn't commit until after swingstore commits). But we do need to record them, and provide a way to replay them at the next startup if we detect an overhang, and a way to retire them when they're no longer necessary.

The basic idea would be:

  • swing-store has a special table for outgoing actions, a list of strings, or maybe type/key/value triples, or type/verb/key/value quads
  • the table is populated by export-data writes, and/or host-provided hostStorage.addOutgoingAction() actions
  • the table is committed along with everything else in the swingstore during hostStorage.commit()
  • a new hostStorage.getOutgoingActions() API iterates through the contents, letting the host process e.g. export-data changes
  • then the host performs its own commit
  • at the start of the next block, the host calls hostStorage.retireOutgoingActions(), which clears the table

The host still needs a way to determine if the swingstore is ahead of it (some sort of block counter). At startup, if it discovers this overhang, it doesn't tell the kernel to do anything, but instead just reads out getOutgoingActions() and processes them as if it had really executed the block.

Currently, our "outgoing actions" fall into two categories:

  • device writes, like VBANK_GRAB actions (which have a return value, the new balance), and chain-storage writes (which do not)
  • export-data updates

The code in cosmic-swingset currently handles both. At the end of the block, during the COMMIT action, saveOutsideState serializes everything into a single JSON-stringified blob, into hostStorage.kvStore.set('chainSends', blob) (overwriting any previous data). When launching the kernel, it reads and parses this key to provide the initial list of actions ("chain sends"), where replayChainSends can use it in the case of an overhang.

Some day, when we have a strict device-output model, where everything a kernel device might send to the host can be expressed as a serializable piece of data (instead of an arbitrary endowment function call), we could have these device outputs get automatically serialized and added directly to the swingstore. Until then, while the host should expect real execution to invoke functions, we need the host to be reponsible for writing a description of these invocations into the outgoing actions, so it can read the stored actions back later and process them in the same way as it did during real execution. We should give it an API like hostStorage.addOutgoingAction() to contribute actions.

We must then decide whether the host has similar responsibility for writing exportCallback data into addOutgoingAction() (and acting upon save actions during replay), or if swingstore should automatically add them. If we choose to automatically add them, then we should consider having getOutgoingActions() call the exportCallback directly when it encounters an export-data record, instead of asking the host to interpret the automatically-generated records showing up from the iterator.

We might reach further, and make some incremental progress towards the device-output model, by providing a hostStorage.writeDeviceOutput() method, and establishing a pattern of device endowments feeding it with one record per output invocation (including both the arguments and the return value). The endowment code should also have a standard way to act upon an action from getOutgoingActions() by doing the same thing as the original execution (but asserting that the returned value matches the recorded one).

@mhofman
Copy link
Member

mhofman commented Oct 4, 2023

We must then decide whether the host has similar responsibility for writing exportCallback data into addOutgoingAction() (and acting upon save actions during replay), or if swingstore should automatically add them. If we choose to automatically add them, then we should consider having getOutgoingActions() call the exportCallback directly when it encounters an export-data record, instead of asking the host to interpret the automatically-generated records showing up from the iterator.

My opinion is keep it simple, let the host do the work. This table should be 100% managed by the host, with no special semantics for the swing-store. To be honest, the more I think about it, the more I think we should just use the kv store API for this, but as a collection instead of a single serialized entry. The performance gains of a dedicated table is likely minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swing-store SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants