Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Sep 7, 2023

This PR adds a module benchmarkerator.js that enables performance benchmarks to be run as standalone Node executables, while automatically managing benchmark orchestration, performance measurement, and report generation. Currently this module lives in the boot package because it makes use of a lot of the existing bootstrap test tooling there. However, I think it likely that this will not be its final home, though where it properly should end up is still an open question. I also expect over time that it's going to drift away from the bootstrap test stuff.

This includes a version of the vault open benchmark (the one that has been a repeated guinea pig for our benchmark tooling efforts) that is implemented using The Benchmarkerator.

This is a preliminary cut at an "outside view" benchmark driver, just as the earlier PR #8239 was a preliminary cut at an "inside view" benchmark driver. There are a number of things that are still in need of work, but I wanted to get this into the review pipeline for critique. Among the stuff that's already on the TODO list for this:

  • Since the only benchmark that's currently implemented using this is the vault open test, it is likely that when we start implementing other benchmarks we'll find that the chain setup support that's currently in there is overly tuned to that particular case. In particular

    • It's likely to be missing things needed for other test cases
    • It's likely to contain stuff that's overspecialized for vaults

    I suspect that dealing with these will probably end up driving an additional degree of configurability into the design.

  • It currently leans heavily on the bootstrap test scaffolding in ways that may be inappropriate. In particular:

    • The bootstrap test drivers handle a bunch of messaging cases by relaying through the bootstrap vat instead of sending messages to their destinations directly (i.e., by using queueToVatRoot, queueToKref) and the using the passableEncoding module (instead of the kmarshal module) which uses an in-band encoding of remote references using symbols instead of the capdata slots abstraction, along with manual serialization and deserialization in the relay vat.
    • It also injects inbound bridge traffic by relaying through the bootstrap vat, instead of feeding them in via the bridge device the way that cosmic swingset does.

@FUDCo FUDCo added the performance Performance related issues label Sep 7, 2023
@FUDCo FUDCo requested review from turadg, warner and toliaqat September 7, 2023 01:24
@FUDCo FUDCo self-assigned this Sep 7, 2023
@@ -0,0 +1,45 @@
// This is a version of bench-vaults-performance.js designed to run as a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else in tests runs with yarn test i.e. ava.

Since this is a standalone tool, please put it in the a new tools directory.

Ditto for benchmarkerator.js.

I'm not as certain where makeStandaloneSwingsetTestKit and makeWalletFactoryDriver should go. I suppose both drivers.js and supports.js should also move to tool so they're available to package that import from this one. (Tests aren't.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving benchmarkerator.js to a tools directory is probably right, but I think the open question is: the tools directory of which package? On the other hand, I feel pretty strongly that benchmark-vaults.js should not go into a tools directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench looks in the test directory for benchmarks to run and has for a while). If we produce a lot more of these benchmarks (which is sort of the idea behind making it easy to create them by providing support tooling) I think a case could be made for conventionally having a benchmarks directory parallel to test.

I think moving the bootstrap test driver stuff into tools makes sense, especially if we'll want to be using that stuff from other packages. Deep imports from test definitely feel sketchy. On the other hand, I think it's plausible that the longer term evolution of the benchmark tools eventually phases out the driver stuff here in favor of alternatives better fit to purpose, so if the only place they're actually used goes back to being just the bootstrap tests, then keeping them where they are makes sense. On the other other hand, completely moving the benchmark stuff off the bootstrap test driver utils, if that ever happens, is certainly not going to happen right away, so having the support stuff in a place intended for common use at least until then makes sense.

Copy link
Member

@turadg turadg Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools directory of which package?

The lowest point a which all its dependencies are satisfied. I think that's this one, unless you want to make a higher one or move some of the supporting tools her into a lower one. Even so I'd save that for another PR.

benchmark-vaults.js should not go into a tools directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench looks in the test directory for benchmarks to run and has for a while).

Easy to change where yarn bench looks.

I think a test should be something that passes or fails in CI. Are these perf tests that pass or fail or a tool for collecting performance data? If the latter, I think that disqualifies them from tests. The next most natural option to me is tools.

I think a case could be made for conventionally having a benchmarks directory parallel to test.

No objection. That would satisfy my requirement about what a "test" is.

I think moving the bootstrap test driver stuff into tools makes sense…

+1 to that paragraph

Copy link
Contributor Author

@FUDCo FUDCo Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these perf tests that pass or fail or a tool for collecting performance data?

They're not pass/fail tests, so that does argue for their exclusion from the test directory.

However, they are not tools for collecting perf data generally but actual perf tests of specific things, in the same way that tests are actual correctness tests of specific things. In other words, I don't think "find out how long opening a vault takes" and "find out how long doing a liquidation takes" and "find out how long installing a new contract takes" are operations that belong in a directory called tools. In contrast, the benchmark framework itself (Benchmarkerator and its brethren) I think does belong in a tools directory in the same way that, say, a module for setting up Ava belongs there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the above conversation, the resolution approach I'm going to take is: land this PR with things located where they are now, then do a follow-on PR that does all the file relocation and renaming as discussed (i.e., tooling and support code in tools, the benchmarks themselves in a new benchmarks directory). I suspect the benchmark tooling may eventually want to be in a different package (possibly even a new package of its own), but I think we can reasonably defer that until we have better visibility onto how and where it's being used.

// taxonomy the name of this module should sit, but it's probably somewhere.

/**
* The benchmark context object. The benchmarkerator provides this to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a fun name trading against clarity. consider benchmarker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for clarity.

* params: Record<string, string>,
*
* // Other unparsed parameters from the command line
* argv: string[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd for argv to be arguments other than the ones parsed. more on this below…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea (this being a work in progress) was to provide a simple way to specify named configuration properties on the command line, while still allowing the command line to be parsed by the benchmark if its author needed to have some kind of more sophisticated parameterization syntax if simple name/value pairs wouldn't work for their use case.

};

let stillScanningArgs = true;
while (argv[0] && argv[0].startsWith('-') && stillScanningArgs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are robust libraries for parsing arguments. please use one.

elsewhere in this repo we use https://github.com/tj/commander.js

the one that is built into Node 18+ is https://github.com/pkgjs/parseargs

Copy link
Contributor Author

@FUDCo FUDCo Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partial to https://www.npmjs.com/package/yargs myself, but these kinds of packages seemed like gross overkill for what's going on here.


const actors = {
gov1: await walletFactoryDriver.provideSmartWallet(
'agoric1ldmtatp24qlllgxmrsjzcpe20fvlkp448zcuce',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these constants meaningful? please source from a constants block or module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. They're pretty arbitrary. I think this is one of those things that's still in discovery mode.

@FUDCo FUDCo force-pushed the benchmarkerator branch 2 times, most recently from 1b81633 to f79d8d4 Compare September 12, 2023 00:39
Copy link
Contributor

@toliaqat toliaqat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the detailed documentation. Thanks for writing it up.

The PR is in the right direction. I have left a couple of comments. Regarding the comment on splitting Benchmarking Tool and Swingset-related code in two different files: If you do go on that route, I would also go on to recommend thinking of trying an already existing library as a benchmarking tool instead of building our own if functionality is pure benchmarking.
Example library that might be useful: https://www.npmjs.com/package/tinybench

@@ -0,0 +1,580 @@
import process from 'node:process';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the benchmarking code and the SDK helper methods are entwined within the same component. For maintainability, reusability, and clarity, I recommend segregating these two functionalities.

Benchmarking Tool (Benchmarkerator): The core tool should primarily deal with orchestrating the benchmark tests, i.e., setting up benchmarks, executing them, and reporting the results. It should be agnostic to the specifics of what it is testing. This will make the tool more generic and possibly usable in other contexts or projects.

BenchmarkTestBase: Instead of having makeSwingsetTestKit and walletFactory actors inside the benchmarking tool, maybe create a separate BenchmarkTestBase that the benchmark implementors import on a need basis.

Copy link
Contributor Author

@FUDCo FUDCo Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but we're still working our way down the learning curve on this, and our experience sample size is still too small to yield a clear delineation line. While there's a lot of stuff that is clearly on one side or the other, there's still enough confused stuff in the middle that I think we need a couple more exemplars in order to get a picture good enough to know where to slice the thing into pieces.

// taxonomy the name of this module should sit, but it's probably somewhere.

/**
* The benchmark context object. The benchmarkerator provides this to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for clarity.

@@ -0,0 +1,171 @@
# The Benchmarkerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This readme made understanding and reviewing the tool a breeze. Appreciate the effort you put into it!

@FUDCo
Copy link
Contributor Author

FUDCo commented Sep 12, 2023

In the interest of being a good project management citizen, I've created issue #8327, which this PR is retrospectively responsive to.

Copy link
Contributor

@toliaqat toliaqat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. PR looks good to me. I will let @warner take a quick look as well.

@FUDCo FUDCo requested a review from gibson042 September 18, 2023 20:06
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! we'll iterate more over time, I see nothing here that should prevent it from landing

@turadg
Copy link
Member

turadg commented Sep 19, 2023

The merge conflicts are from #8313

I rebased this branch onto master and pushed to https://github.com/Agoric/agoric-sdk/tree/ta/benchmarkerator

Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @toliaqat's praise for benchmarkerator.md, which really gave me a good perspective on the approach.

This seems like a step forward, but maybe a little in the wrong direction... there's a fine line between a general-purpose benchmarking tool capable of evaluating cosmic-swingset scenarios vs. a cosmic-swingset benchmarking tool, and I would expect the latter to directly consume e.g. bundle/vat configuration shaped like decentral-…-config.json files, and also to directly expose the system under test (e.g., chain-perspective inbound message queue and storage and node-perspective Swingset kernel).

I do like the "actors" concept, but it should probably be a higher level of configuration that consumes those more powerful kernel and/or chain interfaces (e.g., configuration could include a section like actorMakers: { alice: makeSmartWalletDriver, … } to make context include a destructurable actors property).

Comment on lines +164 to +170
In addition, if you specify the `--dump` command line option, a JSON-formatted
(i.e., machine readable) version of this same data will be output to the file
`benchmark-NAME.json` in the current working directory (where _NAME_ is the name
that you provided as the argument to `bench.run`).
Copy link
Member

@gibson042 gibson042 Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this seems like bad form. What about keeping it simple for now and outputting only JSON to standard output for CLI redirection, and then later (if needed) either a separate invocation for pretty-rendering or a CLI option to incorporate that onto standard error?

$ # option 1
$ node benchmark-foo.js | tee benchmark-foo.json | npx benchmarkerator --pretty
$ # option 2
$ node benchmark-foo.js --stderr-pretty > benchmark-foo.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't control output generated by the benchmark itself (indeed, the current output even without the stats report and without the --verbose option is already incredibly verbose), so separating the benchmark's own output from the stats report output would require various subprocess voodoo that would impede debugging.

An alternative I have been contemplating is changing the option to --dump FILENAME, i.e., specify the data destination directly on the command line rather than inferring the filename from the benchmark definition. The thing deterring me from that path is that it would make it trickier to drive multiple benchmark files in a benchmarks directory from yarn, as the yarn script would itself be responsible for synthesizing the filename, which could be sketchy.

@FUDCo FUDCo force-pushed the benchmarkerator branch 2 times, most recently from 50caf26 to 18cb033 Compare September 21, 2023 19:21
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard
serialization scheme used in some tests, which encoded remote references in-band
using Symbols with magic names rather than using the normal marshal package
machinery that puts these into the 'slots' element of the standard capdata
format.  This bypassed various message filtering and transformation logic in the
kernel and also required special methods to be present in the bootstrap vat to
translate this encoding and relay messages to their actual intended
destinations.

This has now been removed.  The relatively small number of tests which used
`passableEncoding` have been updated to use `kmarshal` instead.  Messages and
data are now encoded in a form that all the other code understands.  Test
messages are also now delivered directly to their destinations without having to
count on the existence of a relayer.

In support of this, the controller's `queueToVatRoot` method has been augmented
by the addition of a `queueToVatObject` method, allowing tests to send messages
to specific objects, targeted using remotable references of the sort returned
by `kunser`.  The test support library that a lot of the bootstrap tests use
has been updated to use this improved mechanism.

In addition, `kmarshal` itself has been upgraded using a trick that MarkM
provided for tagging promises, which allows `kmarshal` to be truly stateless.
The (former) statefulness of `kmarshal` caused problems when the module was
imported into different compartments, as each compartment ended up with its own
module instance and thus its own version of the state. This in turn caused these
compartments to have different beliefs about how particular promises were
represented, which caused various things to break.  That's all fixed now.

One wart which has NOT been taken care of in this PR, but which will be
addressed in a follow-on PR that we were already planning for, is the
duplication of `kmarshal.js` in both the SwingSet package and the liveslots
package.  The forthcoming PR will perform a bunch of file renaming and
relocation to put a bunch of support tooling, used by both benchmarks and tests,
into a package of its own, thereby eliminating a lot of weird dependencies and
files in places they don't belong.  As part of this I plan to relocate
`kmarshal` into a package of its own that can then be cleanly imported by the
kernel, liveslots, and the various tests and test support tooling.

All this is in support of issue #8327
@turadg
Copy link
Member

turadg commented Sep 26, 2023

@FUDCo I rebased this onto master and resolved the conflicts. It saved time to squash the fixups first so I did that.
https://github.com/Agoric/agoric-sdk/compare/benchmarkerator-on-master

@FUDCo FUDCo added the automerge:rebase Automatically rebase updates, then merge label Sep 27, 2023
@ivanlei ivanlei added automerge:rebase Automatically rebase updates, then merge and removed automerge:rebase Automatically rebase updates, then merge labels Sep 27, 2023
@mergify mergify bot merged commit 31ed402 into master Sep 27, 2023
@mergify mergify bot deleted the benchmarkerator branch September 27, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge performance Performance related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants