Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the efficiency of writing snapshots #6248

Merged
merged 7 commits into from
Sep 17, 2022

Conversation

gibson042
Copy link
Member

closes: #6225
refs: #6188

Description

  • Use TMPDIR for temporary files.
  • Parallelize reading raw snapshot data for hash computation and compression.

Security Considerations

n/a

Documentation Considerations

We might want to mention somewhere that snapshot creation puts some temporary files in the system temporary directory (as communicated by e.g. the POSIX TMPDIR environment variable).

Testing Considerations

Unit tests verify stable functionality, but the performance changes theirselves should be evaluated on a real network.

@gibson042 gibson042 added enhancement New feature or request SwingSet package: SwingSet performance Performance related issues labels Sep 16, 2022
@gibson042 gibson042 requested a review from mhofman September 16, 2022 21:30
const snapReader = createReadStream(tmpSnapPath);
cleanup.push(() => snapReader.destroy());
await fsStreamReady(snapReader);
// TODO: hoist.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be resolving these "hoist" TODOs while the PR goes through CI, along with updating now-redundant definitions for e.g. path vs. stream parameters. They were defined inline during development, and left in place for now to avoid delaying the PR (which can be merged with them depending upon urgency).

@mhofman mhofman force-pushed the gibson-6225-makeSnapshot-performance branch from 6cbaca6 to 93dd458 Compare September 16, 2022 23:50
@mhofman
Copy link
Member

mhofman commented Sep 16, 2022

@gibson042 I just pushed the changes we paired on

Copy link
Member Author

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhofman Some further cleanup.

unlink,
},
{ keepSnapshots = false } = {},
) {
/** @type {(opts: unknown) => Promise<string>} */
const ptmpName = promisify(tmpName);

/** @type {(fd: number) => Promise<void>} */
const pfsync = promisify(fsync);

/**
* Returns the result of calling a function with the name
* of a temp file that exists only for the duration of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would withTempName be simpler on top of aggregateTryFinally?

  async function withTempName(fn, prefix = 'tmp') {
    const name = await ptmpName({
      tmpdir: root,
      template: `${prefix}-XXXXXX.xss`,
    });
    return aggregateTryFinally(
      () => fn(name),
      // Ignore file deletion errors.
      () => unlink(name).catch(sink),
    );
  }

Or maybe the point is moot, since load can be refactored to eliminate withTempName.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can use the same tmpFile approach for load.

packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
Comment on lines +248 to +247
cleanup.push(() => {
snapReader.destroy();
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup ignores return values and the other uses of cleanup.push don't suppress them like this.

Suggested change
cleanup.push(() => {
snapReader.destroy();
});
cleanup.push(() => snapReader.destroy());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value would be adopted in the promise result in the finally stage, meaning the stream object would be kept around for longer than necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just in the time between running the function in PromiseAllOrErrors and throwing away the aggregated value, which should be inconsequential.

packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
@gibson042 gibson042 force-pushed the gibson-6225-makeSnapshot-performance branch from 155f0fa to 591f3a2 Compare September 17, 2022 02:34
@mhofman mhofman force-pushed the gibson-6225-makeSnapshot-performance branch from 591f3a2 to d3b60e8 Compare September 17, 2022 02:43
@mhofman mhofman changed the base branch from master to 6245-inbound-queue-metrics September 17, 2022 02:43
@mhofman mhofman added the force:integration Force integration tests to run on PR label Sep 17, 2022
const cleanup = [];
return aggregateTryFinally(
async () => {
// TODO: Refactor to use tmpFile rather than tmpName.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, we can't use tmpFile here since it's not this process that opens the file, but not worth another CI churn to change the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring I have in mind would be updating the signature of saveRaw to accept a file handle or stream rather than a file name, but I understand that may not even be possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah we decompress to the file, then tell xsnap the filename, so it'd work.

Comment on lines +319 to +321
await PromiseAllOrErrors(
cleanup.reverse().map(fn => Promise.resolve().then(() => fn())),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, shouldn't we wait for previous steps to be done to continue with the next cleanup step? Disposal of resource is hard!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an actual problem since only the last cleanup (first queued) is actually async. Let's follow-up to clean this up though.

Base automatically changed from 6245-inbound-queue-metrics to master September 17, 2022 03:17
@gibson042 gibson042 added the automerge:rebase Automatically rebase updates, then merge label Sep 17, 2022
@turadg turadg force-pushed the gibson-6225-makeSnapshot-performance branch from d3b60e8 to 7bac0b9 Compare September 17, 2022 03:26
@mergify mergify bot merged commit 26f482b into master Sep 17, 2022
@mergify mergify bot deleted the gibson-6225-makeSnapshot-performance branch September 17, 2022 03:47
@mhofman
Copy link
Member

mhofman commented Oct 20, 2022

For future reference, internally the os.tmpdir() call is used, which currently has the following implementation in node:

    path = safeGetenv('TMPDIR') ||
           safeGetenv('TMP') ||
           safeGetenv('TEMP') ||
           '/tmp';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge enhancement New feature or request force:integration Force integration tests to run on PR performance Performance related issues SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot creation is inefficient
2 participants