Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

FUDCo · 2023-09-07T01:24:35Z

This PR adds a module benchmarkerator.js that enables performance benchmarks to be run as standalone Node executables, while automatically managing benchmark orchestration, performance measurement, and report generation. Currently this module lives in the boot package because it makes use of a lot of the existing bootstrap test tooling there. However, I think it likely that this will not be its final home, though where it properly should end up is still an open question. I also expect over time that it's going to drift away from the bootstrap test stuff.

This includes a version of the vault open benchmark (the one that has been a repeated guinea pig for our benchmark tooling efforts) that is implemented using The Benchmarkerator.

This is a preliminary cut at an "outside view" benchmark driver, just as the earlier PR #8239 was a preliminary cut at an "inside view" benchmark driver. There are a number of things that are still in need of work, but I wanted to get this into the review pipeline for critique. Among the stuff that's already on the TODO list for this:

Since the only benchmark that's currently implemented using this is the vault open test, it is likely that when we start implementing other benchmarks we'll find that the chain setup support that's currently in there is overly tuned to that particular case. In particular
- It's likely to be missing things needed for other test cases
- It's likely to contain stuff that's overspecialized for vaults
I suspect that dealing with these will probably end up driving an additional degree of configurability into the design.
It currently leans heavily on the bootstrap test scaffolding in ways that may be inappropriate. In particular:
- The bootstrap test drivers handle a bunch of messaging cases by relaying through the bootstrap vat instead of sending messages to their destinations directly (i.e., by using queueToVatRoot, queueToKref) and the using the passableEncoding module (instead of the kmarshal module) which uses an in-band encoding of remote references using symbols instead of the capdata slots abstraction, along with manual serialization and deserialization in the relay vat.
- It also injects inbound bridge traffic by relaying through the bootstrap vat, instead of feeding them in via the bridge device the way that cosmic swingset does.

turadg · 2023-09-07T21:48:44Z

packages/boot/test/bootstrapTests/benchmark-vaults.js

@@ -0,0 +1,45 @@
+// This is a version of bench-vaults-performance.js designed to run as a


Everything else in tests runs with yarn test i.e. ava.

Since this is a standalone tool, please put it in the a new tools directory.

Ditto for benchmarkerator.js.

I'm not as certain where makeStandaloneSwingsetTestKit and makeWalletFactoryDriver should go. I suppose both drivers.js and supports.js should also move to tool so they're available to package that import from this one. (Tests aren't.)

Moving benchmarkerator.js to a tools directory is probably right, but I think the open question is: the tools directory of which package? On the other hand, I feel pretty strongly that benchmark-vaults.js should not go into a tools directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench looks in the test directory for benchmarks to run and has for a while). If we produce a lot more of these benchmarks (which is sort of the idea behind making it easy to create them by providing support tooling) I think a case could be made for conventionally having a benchmarks directory parallel to test.

I think moving the bootstrap test driver stuff into tools makes sense, especially if we'll want to be using that stuff from other packages. Deep imports from test definitely feel sketchy. On the other hand, I think it's plausible that the longer term evolution of the benchmark tools eventually phases out the driver stuff here in favor of alternatives better fit to purpose, so if the only place they're actually used goes back to being just the bootstrap tests, then keeping them where they are makes sense. On the other other hand, completely moving the benchmark stuff off the bootstrap test driver utils, if that ever happens, is certainly not going to happen right away, so having the support stuff in a place intended for common use at least until then makes sense.

tools directory of which package?

The lowest point a which all its dependencies are satisfied. I think that's this one, unless you want to make a higher one or move some of the supporting tools her into a lower one. Even so I'd save that for another PR.

benchmark-vaults.js should not go into a tools directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench looks in the test directory for benchmarks to run and has for a while).

Easy to change where yarn bench looks.

I think a test should be something that passes or fails in CI. Are these perf tests that pass or fail or a tool for collecting performance data? If the latter, I think that disqualifies them from tests. The next most natural option to me is tools.

I think a case could be made for conventionally having a benchmarks directory parallel to test.

No objection. That would satisfy my requirement about what a "test" is.

I think moving the bootstrap test driver stuff into tools makes sense…

+1 to that paragraph

Are these perf tests that pass or fail or a tool for collecting performance data?

They're not pass/fail tests, so that does argue for their exclusion from the test directory.

However, they are not tools for collecting perf data generally but actual perf tests of specific things, in the same way that tests are actual correctness tests of specific things. In other words, I don't think "find out how long opening a vault takes" and "find out how long doing a liquidation takes" and "find out how long installing a new contract takes" are operations that belong in a directory called tools. In contrast, the benchmark framework itself (Benchmarkerator and its brethren) I think does belong in a tools directory in the same way that, say, a module for setting up Ava belongs there.

Based on the above conversation, the resolution approach I'm going to take is: land this PR with things located where they are now, then do a follow-on PR that does all the file relocation and renaming as discussed (i.e., tooling and support code in tools, the benchmarks themselves in a new benchmarks directory). I suspect the benchmark tooling may eventually want to be in a different package (possibly even a new package of its own), but I think we can reasonably defer that until we have better visibility onto how and where it's being used.

packages/boot/test/bootstrapTests/benchmarkerator.js

turadg · 2023-09-07T21:53:16Z

packages/boot/test/bootstrapTests/benchmarkerator.js

+// taxonomy the name of this module should sit, but it's probably somewhere.
+
+/**
+ * The benchmark context object.  The benchmarkerator provides this to


it's a fun name trading against clarity. consider benchmarker

+1 for clarity.

packages/boot/test/bootstrapTests/benchmarkerator.js

turadg · 2023-09-07T21:54:31Z

packages/boot/test/bootstrapTests/benchmarkerator.js

+ *  params: Record<string, string>,
+ *
+ *  // Other unparsed parameters from the command line
+ *  argv: string[],


odd for argv to be arguments other than the ones parsed. more on this below…

The idea (this being a work in progress) was to provide a simple way to specify named configuration properties on the command line, while still allowing the command line to be parsed by the benchmark if its author needed to have some kind of more sophisticated parameterization syntax if simple name/value pairs wouldn't work for their use case.

turadg · 2023-09-07T21:58:40Z

packages/boot/test/bootstrapTests/benchmarkerator.js

+};
+
+let stillScanningArgs = true;
+while (argv[0] && argv[0].startsWith('-') && stillScanningArgs) {


there are robust libraries for parsing arguments. please use one.

elsewhere in this repo we use https://github.com/tj/commander.js

the one that is built into Node 18+ is https://github.com/pkgjs/parseargs

I'm partial to https://www.npmjs.com/package/yargs myself, but these kinds of packages seemed like gross overkill for what's going on here.

turadg · 2023-09-07T22:01:52Z

packages/boot/test/bootstrapTests/benchmarkerator.js

+
+  const actors = {
+    gov1: await walletFactoryDriver.provideSmartWallet(
+      'agoric1ldmtatp24qlllgxmrsjzcpe20fvlkp448zcuce',


are these constants meaningful? please source from a constants block or module

Meh. They're pretty arbitrary. I think this is one of those things that's still in discovery mode.

packages/boot/test/bootstrapTests/drivers.js

packages/boot/test/bootstrapTests/supports.js

toliaqat

Really like the detailed documentation. Thanks for writing it up.

The PR is in the right direction. I have left a couple of comments. Regarding the comment on splitting Benchmarking Tool and Swingset-related code in two different files: If you do go on that route, I would also go on to recommend thinking of trying an already existing library as a benchmarking tool instead of building our own if functionality is pure benchmarking.
Example library that might be useful: https://www.npmjs.com/package/tinybench

toliaqat · 2023-09-12T08:03:14Z

packages/boot/test/bootstrapTests/benchmarkerator.js

@@ -0,0 +1,580 @@
+import process from 'node:process';


It appears that the benchmarking code and the SDK helper methods are entwined within the same component. For maintainability, reusability, and clarity, I recommend segregating these two functionalities.

Benchmarking Tool (Benchmarkerator): The core tool should primarily deal with orchestrating the benchmark tests, i.e., setting up benchmarks, executing them, and reporting the results. It should be agnostic to the specifics of what it is testing. This will make the tool more generic and possibly usable in other contexts or projects.

BenchmarkTestBase: Instead of having makeSwingsetTestKit and walletFactory actors inside the benchmarking tool, maybe create a separate BenchmarkTestBase that the benchmark implementors import on a need basis.

I don't disagree, but we're still working our way down the learning curve on this, and our experience sample size is still too small to yield a clear delineation line. While there's a lot of stuff that is clearly on one side or the other, there's still enough confused stuff in the middle that I think we need a couple more exemplars in order to get a picture good enough to know where to slice the thing into pieces.

toliaqat · 2023-09-12T08:03:35Z

packages/boot/test/bootstrapTests/benchmarkerator.js

+// taxonomy the name of this module should sit, but it's probably somewhere.
+
+/**
+ * The benchmark context object.  The benchmarkerator provides this to


+1 for clarity.

toliaqat · 2023-09-12T08:05:29Z

packages/boot/test/bootstrapTests/benchmarkerator.md

@@ -0,0 +1,171 @@
+# The Benchmarkerator


This readme made understanding and reviewing the tool a breeze. Appreciate the effort you put into it!

FUDCo · 2023-09-12T23:34:29Z

In the interest of being a good project management citizen, I've created issue #8327, which this PR is retrospectively responsive to.

toliaqat

Sorry for the delay. PR looks good to me. I will let @warner take a quick look as well.

warner

yep! we'll iterate more over time, I see nothing here that should prevent it from landing

turadg · 2023-09-19T14:45:37Z

The merge conflicts are from #8313

I rebased this branch onto master and pushed to https://github.com/Agoric/agoric-sdk/tree/ta/benchmarkerator

gibson042

I second @toliaqat's praise for benchmarkerator.md, which really gave me a good perspective on the approach.

This seems like a step forward, but maybe a little in the wrong direction... there's a fine line between a general-purpose benchmarking tool capable of evaluating cosmic-swingset scenarios vs. a cosmic-swingset benchmarking tool, and I would expect the latter to directly consume e.g. bundle/vat configuration shaped like decentral-…-config.json files, and also to directly expose the system under test (e.g., chain-perspective inbound message queue and storage and node-perspective Swingset kernel).

I do like the "actors" concept, but it should probably be a higher level of configuration that consumes those more powerful kernel and/or chain interfaces (e.g., configuration could include a section like actorMakers: { alice: makeSmartWalletDriver, … } to make context include a destructurable actors property).

packages/boot/test/bootstrapTests/benchmarkerator.md

gibson042 · 2023-09-19T04:31:56Z

packages/boot/test/bootstrapTests/benchmarkerator.md

+In addition, if you specify the `--dump` command line option, a JSON-formatted
+(i.e., machine readable) version of this same data will be output to the file
+`benchmark-NAME.json` in the current working directory (where _NAME_ is the name
+that you provided as the argument to `bench.run`).


Hrm, this seems like bad form. What about keeping it simple for now and outputting only JSON to standard output for CLI redirection, and then later (if needed) either a separate invocation for pretty-rendering or a CLI option to incorporate that onto standard error?

$ # option 1 $ node benchmark-foo.js | tee benchmark-foo.json | npx benchmarkerator --pretty $ # option 2 $ node benchmark-foo.js --stderr-pretty > benchmark-foo.json

We can't control output generated by the benchmark itself (indeed, the current output even without the stats report and without the --verbose option is already incredibly verbose), so separating the benchmark's own output from the stats report output would require various subprocess voodoo that would impede debugging.

An alternative I have been contemplating is changing the option to --dump FILENAME, i.e., specify the data destination directly on the command line rather than inferring the filename from the benchmark definition. The thing deterring me from that path is that it would make it trickier to drive multiple benchmark files in a benchmarks directory from yarn, as the yarn script would itself be responsible for synthesizing the filename, which could be sketchy.

packages/boot/test/bootstrapTests/benchmarkerator.js

`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard serialization scheme used in some tests, which encoded remote references in-band using Symbols with magic names rather than using the normal marshal package machinery that puts these into the 'slots' element of the standard capdata format. This bypassed various message filtering and transformation logic in the kernel and also required special methods to be present in the bootstrap vat to translate this encoding and relay messages to their actual intended destinations. This has now been removed. The relatively small number of tests which used `passableEncoding` have been updated to use `kmarshal` instead. Messages and data are now encoded in a form that all the other code understands. Test messages are also now delivered directly to their destinations without having to count on the existence of a relayer. In support of this, the controller's `queueToVatRoot` method has been augmented by the addition of a `queueToVatObject` method, allowing tests to send messages to specific objects, targeted using remotable references of the sort returned by `kunser`. The test support library that a lot of the bootstrap tests use has been updated to use this improved mechanism. In addition, `kmarshal` itself has been upgraded using a trick that MarkM provided for tagging promises, which allows `kmarshal` to be truly stateless. The (former) statefulness of `kmarshal` caused problems when the module was imported into different compartments, as each compartment ended up with its own module instance and thus its own version of the state. This in turn caused these compartments to have different beliefs about how particular promises were represented, which caused various things to break. That's all fixed now. One wart which has NOT been taken care of in this PR, but which will be addressed in a follow-on PR that we were already planning for, is the duplication of `kmarshal.js` in both the SwingSet package and the liveslots package. The forthcoming PR will perform a bunch of file renaming and relocation to put a bunch of support tooling, used by both benchmarks and tests, into a package of its own, thereby eliminating a lot of weird dependencies and files in places they don't belong. As part of this I plan to relocate `kmarshal` into a package of its own that can then be cleanly imported by the kernel, liveslots, and the various tests and test support tooling. All this is in support of issue #8327

turadg · 2023-09-26T23:02:50Z

@FUDCo I rebased this onto master and resolved the conflicts. It saved time to squash the fixups first so I did that.
https://github.com/Agoric/agoric-sdk/compare/benchmarkerator-on-master

FUDCo added the performance Performance related issues label Sep 7, 2023

FUDCo requested review from turadg, warner and toliaqat September 7, 2023 01:24

FUDCo self-assigned this Sep 7, 2023

FUDCo force-pushed the benchmarkerator branch from d634155 to e082299 Compare September 7, 2023 20:06

turadg reviewed Sep 7, 2023

View reviewed changes

FUDCo force-pushed the benchmarkerator branch 2 times, most recently from 1b81633 to f79d8d4 Compare September 12, 2023 00:39

toliaqat reviewed Sep 12, 2023

View reviewed changes

toliaqat approved these changes Sep 18, 2023

View reviewed changes

FUDCo requested a review from gibson042 September 18, 2023 20:06

warner approved these changes Sep 18, 2023

View reviewed changes

gibson042 reviewed Sep 19, 2023

View reviewed changes

FUDCo force-pushed the benchmarkerator branch 2 times, most recently from 50caf26 to 18cb033 Compare September 21, 2023 19:21

turadg mentioned this pull request Sep 23, 2023

Eliminate the passableEncoding hack #8373

Merged

FUDCo added 2 commits September 26, 2023 16:01

feat: Add tooling for standalone performance benchmarks

058e54a

FUDCo force-pushed the benchmarkerator branch from 4cca300 to 87dbbda Compare September 27, 2023 01:12

FUDCo added the automerge:rebase Automatically rebase updates, then merge label Sep 27, 2023

ivanlei added automerge:rebase Automatically rebase updates, then merge and removed automerge:rebase Automatically rebase updates, then merge labels Sep 27, 2023

mergify bot merged commit 31ed402 into master Sep 27, 2023

mergify bot deleted the benchmarkerator branch September 27, 2023 09:52

erights mentioned this pull request Sep 28, 2023

fix(SwingSet): adapt to promise tagging support #8403

Merged

FUDCo mentioned this pull request Oct 2, 2023

Move benchmarkerator and some test support code out of the boot package #8421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

FUDCo commented Sep 7, 2023 •

edited

Loading

turadg Sep 7, 2023

FUDCo Sep 7, 2023

turadg Sep 7, 2023 •

edited

Loading

FUDCo Sep 7, 2023 •

edited

Loading

FUDCo Sep 8, 2023

turadg Sep 7, 2023

toliaqat Sep 12, 2023

turadg Sep 7, 2023

FUDCo Sep 8, 2023

turadg Sep 7, 2023

FUDCo Sep 8, 2023 •

edited

Loading

turadg Sep 7, 2023

FUDCo Sep 8, 2023

toliaqat left a comment •

edited

Loading

toliaqat Sep 12, 2023

FUDCo Sep 12, 2023 •

edited

Loading

toliaqat Sep 12, 2023

toliaqat Sep 12, 2023

FUDCo commented Sep 12, 2023

toliaqat left a comment

warner left a comment

turadg commented Sep 19, 2023

gibson042 left a comment

gibson042 Sep 19, 2023 •

edited

Loading

FUDCo Sep 21, 2023

turadg commented Sep 26, 2023

		@@ -0,0 +1,45 @@
		// This is a version of bench-vaults-performance.js designed to run as a

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312

Conversation

FUDCo commented Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

turadg Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

FUDCo Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FUDCo Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toliaqat left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FUDCo Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FUDCo commented Sep 12, 2023

toliaqat left a comment

Choose a reason for hiding this comment

warner left a comment

Choose a reason for hiding this comment

turadg commented Sep 19, 2023

gibson042 left a comment

Choose a reason for hiding this comment

gibson042 Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

turadg commented Sep 26, 2023

FUDCo commented Sep 7, 2023 •

edited

Loading

turadg Sep 7, 2023 •

edited

Loading

FUDCo Sep 7, 2023 •

edited

Loading

FUDCo Sep 8, 2023 •

edited

Loading

toliaqat left a comment •

edited

Loading

FUDCo Sep 12, 2023 •

edited

Loading

gibson042 Sep 19, 2023 •

edited

Loading