Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support state-sync #7225

Merged
merged 20 commits into from
Apr 19, 2023
Merged

feat: Support state-sync #7225

merged 20 commits into from
Apr 19, 2023

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Mar 23, 2023

Review commit-by-commit

closes: #5542
closes: #3769
closes: #6563
closes: #5934

Description

This PR builds on #7115, #7026 and agoric-labs/cosmos-sdk#297 to implement state-sync in the cosmos and cosmic-swingset layers.

It takes over the cosmos state-sync process which is normally started in a go-routine without synchronization. Because the swingstore DB is not capable of doing a read at arbitrary historical savepoints, we need to introduce a synchronization with block commits.

When we reach a snapshot height, we first initiate a swingstore export before proceeding with the normal cosmos export process. We also add a synchronization step to stall the subsequent block commit until the swingstore export has been reported as having started. Once the swingstore export is in progress we can rely on its read transaction to proceed processing blocks, even if the export process spans multiple blocks.

On the cosmic-swingset side, we add a tool that leverages the new swingstore export logic to save the exported artifacts to disk. We also use the streaming export of swingstore data during block execution to save the main export data in vstorage under a new path. Conversely on the import side, we add a tool that reads a disk based export to produce an exporter that complies with the new swingstore import logic. These tools are usable independent of the state-sync logic. For state-sync they are driven by "blocking messages" sent by the golang side.

On the cosmos side, we implement the ExtensionSnapshotter interface. During snapshot taking, the extension running in the snapshot goroutine will request the path to the exported data from javascript and block until it is available. Then it will create a proto message for each artifact present in the exported data. On the restore side, we recreate a disk-based swingstore export by exporting the swingstore "export data" from vstorage (which was previously restored by state-sync), and the artifacts from the extension state-sync payloads. We then send a message to JavaScript to import this data into the expected swingstore location.

Security Considerations

The artifact verification on restore will be performed by the swingstore import logic. The "export data" is the root of trust and must be from a trusted source. Since it is extracted from the verified cosmos DB, it is considered trusted.

Scaling Considerations

The export process is performed as much as possible in parallel of normal execution. The artifact export is spawned in a sub process, and the streaming export data is buffered and sent to golang after the kernel has finished processing it current I/O, which should put it at idle time while it awaits for a syscall or delivery result.

Documentation Considerations

Nothing special besides normal cosmos state-sync instructions.

Testing Considerations

Tested manually extensively, including introducing synthetic delays in the snapshot process to test synchronization, or edge conditions such as overlapping snapshots.

Updated the deployment integration test to have the validators produce state-sync artifacts and have the loadgen follower start from a state-sync snapshot.

We should increase coverage of the edge cases through unit tests. See #7349

@mhofman
Copy link
Member Author

mhofman commented Mar 23, 2023

I created this PR as draft since I'm still working on tests, but please start reviewing as the implementation is fully ready. I will not rebase this PR until approved.

@datadog-full-agoric
Copy link

datadog-full-agoric bot commented Mar 23, 2023

Datadog Report

Branch report: mhofman/5542-export-state-sync
Commit report: b469d2d

agoric-sdk: 2 Failed (0 Known Flaky), 0 New Flaky, 3469 Passed, 53 Skipped, 22m 25.99s Wall Time

❌ Failed Tests (2)

  • state › in-memory transcriptStore read/write - agoric.swing-store - Details

    Expand for error
     ---
         name: AssertionError
         assertion: deepEqual
         values:
           'Difference (- actual, + expected):': >2-
               [
                 [
             -     [
     
             -       'transcript.empty.current',
     ...
    
  • state › persistent transcriptStore read/write - agoric.swing-store - Details

    Expand for error
     ---
         name: AssertionError
         assertion: deepEqual
         values:
           'Difference (- actual, + expected):': >2-
               [
                 [
             -     [
     
             -       'transcript.empty.current',
     ...
    

@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Merging #7225 (b86ae62) into master (f4de2ed) will decrease coverage by 0.40%.
The diff coverage is 26.32%.

❗ Current head b86ae62 differs from pull request most recent head c692911. Consider uploading reports for the commit c692911 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7225      +/-   ##
==========================================
- Coverage   71.01%   70.61%   -0.40%     
==========================================
  Files         450      461      +11     
  Lines       86398    89298    +2900     
  Branches        3        3              
==========================================
+ Hits        61358    63062    +1704     
- Misses      24974    26169    +1195     
- Partials       66       67       +1     
Impacted Files Coverage Δ
golang/cosmos/app/app.go 0.45% <0.00%> (-0.02%) ⬇️
golang/cosmos/daemon/cmd/root.go 61.83% <0.00%> (+0.59%) ⬆️
golang/cosmos/x/swingset/abci.go 0.00% <0.00%> (ø)
golang/cosmos/x/swingset/keeper/keeper.go 11.87% <0.00%> (-0.10%) ⬇️
golang/cosmos/x/swingset/keeper/snapshotter.go 0.00% <0.00%> (ø)
...ges/cosmic-swingset/src/helpers/bufferedStorage.js 75.25% <ø> (ø)
...kages/cosmic-swingset/src/helpers/is-entrypoint.js 100.00% <ø> (ø)
...smic-swingset/src/helpers/json-stable-stringify.js 71.27% <ø> (ø)
packages/cosmic-swingset/src/helpers/make-queue.js 70.45% <ø> (ø)
golang/cosmos/x/swingset/types/swingset.pb.go 0.53% <0.53%> (+<0.01%) ⬆️
... and 10 more

... and 24 files with indirect coverage changes

Impacted file tree graph

@mhofman mhofman added the force:integration Force integration tests to run on PR label Mar 23, 2023
@mhofman
Copy link
Member Author

mhofman commented Mar 24, 2023

@michaelfig I'd like you to be the primary reviewer
@JimLarson you always give great feedback on golang style and I expect you'll find plenty of things for me to correct. I also want to make sure you don't see any problem in the cosmos sdk integration I had to do.
@warner (or @gibson042), this is more of an FYI, don't feel obligated to review.

@michaelfig
Copy link
Member

@michaelfig I'd like you to be the primary reviewer

Gladly! Can you please rebase your fixup!s now that the tests are passing if you want me to review commit-by-commit?

@mhofman
Copy link
Member Author

mhofman commented Mar 26, 2023

The fixups are pretty minimal, but if you prefer, I'll do a clean rebase.

FYI @warner @JimLarson

Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass. Looking good so far, but I'd like a few random things addressed, and will wait for other reviewers before approving.

golang/cosmos/app/app.go Outdated Show resolved Hide resolved
golang/cosmos/app/app.go Outdated Show resolved Hide resolved
golang/cosmos/x/swingset/keeper/keeper.go Show resolved Hide resolved
golang/cosmos/x/swingset/keeper/snapshotter.go Outdated Show resolved Hide resolved
golang/cosmos/x/swingset/keeper/snapshotter.go Outdated Show resolved Hide resolved
import fsPromisesPower from 'fs/promises';
import pathPower from 'path';

import { Fail } from '@agoric/assert';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import { Fail } from '@agoric/assert';
const { Fail } = assert;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the recommend style? Why not use the import form? I'm personally not a major fan of relying on globals that are not part of the JS standard library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended style has vacillated over time. globalThis.assert is supplied by the SES shim, in the same way that globalThis.harden is. @agoric/assert is just a thin wrapper around that global.

Personally, I think the module import is more legible, but the direct manipulation of the global costs less overhead. @erights may have more of an opinion.

Copy link
Member

@dckc dckc Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this issue isn't decided:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we never did settle this. The fact that @agoric/assert, as indicated by the @agoric/ part, is in agoric-sdk only, and so not available in the endo repository, is for me the main reason why I continue to prefer the assignment form. It works everywhere we use Hardened JS.

If this package were migrated to be @endo/assert such that everything above SES could use it, then the points @turadg makes in that thread on the advantages of the import style would, for me, rise to the remaining important difference, causing me to switch to the import style.

packages/cosmic-swingset/src/import-kernel-db.js Outdated Show resolved Hide resolved
@michaelfig
Copy link
Member

The fixups are pretty minimal, but if you prefer, I'll do a clean rebase.

No need to on my account. I was able to see what I needed to do a review.

@mhofman mhofman force-pushed the mhofman/5542-export-state-sync branch from e68df6d to b5745e2 Compare March 27, 2023 18:31
@mhofman
Copy link
Member Author

mhofman commented Mar 27, 2023

I just force pushed the branch to remove the 5 commits related to debug/testing features that were sitting atop the initial PR, but below the initial fixups. Here is what was removed (and were never planned to be merged anyway): e68df6d..b5745e2

@mhofman
Copy link
Member Author

mhofman commented Mar 28, 2023

I think I've addressed all actionable review feedback so far, or left comments where I didn't make any changes.

FYI, given the extent of fixup commits, I'm maintaining a parallel branch where all those are squashed into their respective target: 5542-export-state-sync-fixedup

PTAL @michaelfig

golang/cosmos/x/swingset/keeper/snapshotter.go Outdated Show resolved Hide resolved
golang/cosmos/x/swingset/keeper/snapshotter.go Outdated Show resolved Hide resolved
import fsPromisesPower from 'fs/promises';
import pathPower from 'path';

import { Fail } from '@agoric/assert';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended style has vacillated over time. globalThis.assert is supplied by the SES shim, in the same way that globalThis.harden is. @agoric/assert is just a thin wrapper around that global.

Personally, I think the module import is more legible, but the direct manipulation of the global costs less overhead. @erights may have more of an opinion.

@michaelfig
Copy link
Member

I think I've addressed all actionable review feedback so far, or left comments where I didn't make any changes.

I'm pretty happy about where this is going, and I'm fine with you autosquashing before you mark the PR ready for review. I would encourage @JimLarson and @gibson042 not to begin review until the PR is out of draft.

@mhofman mhofman removed request for warner and JimLarson March 28, 2023 18:12
@mhofman
Copy link
Member Author

mhofman commented Mar 28, 2023

Will extract a few tangential commits out in a separate PR, and re-request review once ready.

@mhofman
Copy link
Member Author

mhofman commented Mar 31, 2023

@gibson042 @JimLarson I've just rebased this on master after extracting a couple housekeeping changes in separate PRs. While it's still missing the unit tests for the state-sync logic, I believe this is ready for review at this point, and will take out of draft to make it explicit.

@mhofman mhofman requested review from gibson042 and JimLarson March 31, 2023 15:44
@mhofman mhofman marked this pull request as ready for review March 31, 2023 15:44
@mhofman
Copy link
Member Author

mhofman commented Mar 31, 2023

Reminder, please review commit-by-commit!

@mhofman mhofman force-pushed the mhofman/5542-export-state-sync branch from 3c3a9d3 to 283fded Compare April 19, 2023 01:00
@mhofman
Copy link
Member Author

mhofman commented Apr 19, 2023

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Apr 19, 2023

requeue

☑️ This pull request is already queued

@mhofman
Copy link
Member Author

mhofman commented Apr 19, 2023

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Apr 19, 2023

refresh

✅ Pull request refreshed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge
Projects
None yet
6 participants