Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async-flow: need incarnation number in the log classifying entries #9384

Closed
erights opened this issue May 20, 2024 · 6 comments · Fixed by #10293
Closed

async-flow: need incarnation number in the log classifying entries #9384

erights opened this issue May 20, 2024 · 6 comments · Fixed by #10293
Assignees
Labels
asyncFlow related to membrane-based replay and upgrade of async functions bug Something isn't working

Comments

@erights
Copy link
Member

erights commented May 20, 2024

Another thing that must happen before we commit to durable log state.

@erights erights added bug Something isn't working asyncFlow related to membrane-based replay and upgrade of async functions labels May 20, 2024
@ivanlei ivanlei assigned michaelfig and unassigned mhofman Sep 17, 2024
@mhofman
Copy link
Member

mhofman commented Sep 17, 2024

We need to decide on the design for the "incarnation number", and probably need to pick a different name. The contract code does not know the "incarnation number" of the vat. This problem is similar to the state migration of durable kinds (#7407).

While asyncFlow could maintain an internal "incarnation number" for the flow, the reality is that most vat upgrades or restarts will not require any changes to the flow's guest logic. As such, a version number maintained by the author may be preferable. However that runs the risk that the author does not realize they made a replay incompatible change, deploys the update, which now has replays of old logic failing (triggered late because replays are not eager), and new invocations using the new logic. In that case we'd still want the author to have a way out of this, so maybe we need both a "flow incarnation number" automatically maintained by async-flow, and a "flow version number" explicitly provided by the author (and maybe simply used as a mapping to flow incarnation numbers)

@michaelfig
Copy link
Member

After further discussion with @mhofman and @kriskowal, I understand that the minimal change needed is to have an invocation of prepareAsyncFlowTools increment a flowGeneration counter in its outerZone and make it part of each log entry created by the flows that are defined by that instance of the tools.

@erights
Copy link
Member Author

erights commented Oct 14, 2024

Or you could add a new instruction that declares a new generation number starting with the ops that come immediately after it.

@michaelfig
Copy link
Member

Or you could add a new instruction that declares a new generation number starting with the ops that come immediately after it.

I did suggest this, but @mhofman was against it (though I don't remember what the objection was). Mathieu, can you recap your objections in this thread?

@mhofman
Copy link
Member

mhofman commented Oct 14, 2024

From what I understand, @erights is suggesting that we record the change in flowGeneration by using a synthetic entry in the log instead of recording the current generation as an extra element for every log entry. I don't remember discussing this explicitly, and no objections comes to mind right now.

@michaelfig
Copy link
Member

@mhofman summarised what I also understood of @erights' suggestion:

record the change in flowGeneration by using a synthetic entry in the log instead of recording the current generation as an extra element for every log entry.

And then added:

I don't remember discussing this explicitly, and no [objection] comes to mind right now.

Ah. Maybe I didn't voice it clearly enough, or it didn't actually get out of my head into our conversation. :)

@eright's suggestion is certainly a much less intrusive change than adding metadata to every single log entry, so I'll pivot my approach to use it instead.

@mergify mergify bot closed this as completed in #10293 Oct 20, 2024
mergify bot added a commit that referenced this issue Oct 20, 2024
closes: #9384

## Description


This pull request introduces a `startGeneration` log entry to capture the generation (restart) number for subsequent log entries. The most important changes are grouped below by theme.

### Log Entry Enhancements:

* Introduced `startGeneration` and updated `LogEntryShape` to include a new log entry type for starting a generation (`packages/async-flow/src/type-guards.js`).

### Log Store Improvements:

* Retrieve the current generation number from a `zone.mapStore` and use it in LogStore ephemera to track its generation (`packages/async-flow/src/log-store.js`).
* Modified `prepareLogStore` to define a predicate for filtering log entries, and added methods for retrieving unfiltered log entries (`packages/async-flow/src/log-store.js`).
* Updated test cases to validate the new generation number and unfiltered log entries (`packages/async-flow/test/log-store.test.js`).

### Replay Membrane Adjustments:

* Added support for `startGeneration` in the replay membrane to handle generation-specific operations (`packages/async-flow/src/replay-membrane.js`).

### Type Definitions:

* Updated `LogEntry` and `FutureLogEntry` types to include the new generation entry type (`packages/async-flow/src/types.ts`).

### Security Considerations

n/a

### Scaling Considerations

Low overhead implementation.

### Documentation Considerations

Not necessary yet, just laying groundwork for the infrastructure so that a future change can use it to implement asyncFlow versioning.

### Testing Considerations

Some new `*Unfiltered` methods of the `LogStore` now have some tests.

### Upgrade Considerations

Designed to enable future upgrade of async flows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asyncFlow related to membrane-based replay and upgrade of async functions bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants