-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing The Benchmarkerator, a platform for standalone performance benchmarks #8312
Conversation
d634155
to
e082299
Compare
@@ -0,0 +1,45 @@ | |||
// This is a version of bench-vaults-performance.js designed to run as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything else in tests
runs with yarn test
i.e. ava
.
Since this is a standalone tool, please put it in the a new tools
directory.
Ditto for benchmarkerator.js
.
I'm not as certain where makeStandaloneSwingsetTestKit
and makeWalletFactoryDriver
should go. I suppose both drivers.js
and supports.js
should also move to tool
so they're available to package that import from this one. (Tests aren't.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving benchmarkerator.js
to a tools
directory is probably right, but I think the open question is: the tools
directory of which package? On the other hand, I feel pretty strongly that benchmark-vaults.js
should not go into a tools
directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench
looks in the test
directory for benchmarks to run and has for a while). If we produce a lot more of these benchmarks (which is sort of the idea behind making it easy to create them by providing support tooling) I think a case could be made for conventionally having a benchmarks
directory parallel to test
.
I think moving the bootstrap test driver stuff into tools
makes sense, especially if we'll want to be using that stuff from other packages. Deep imports from test
definitely feel sketchy. On the other hand, I think it's plausible that the longer term evolution of the benchmark tools eventually phases out the driver stuff here in favor of alternatives better fit to purpose, so if the only place they're actually used goes back to being just the bootstrap tests, then keeping them where they are makes sense. On the other other hand, completely moving the benchmark stuff off the bootstrap test driver utils, if that ever happens, is certainly not going to happen right away, so having the support stuff in a place intended for common use at least until then makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools
directory of which package?
The lowest point a which all its dependencies are satisfied. I think that's this one, unless you want to make a higher one or move some of the supporting tools her into a lower one. Even so I'd save that for another PR.
benchmark-vaults.js should not go into a tools directory, since that's not what it is -- it's a specific exercise of specific functionality, much more like a test in that regard (note that yarn bench looks in the test directory for benchmarks to run and has for a while).
Easy to change where yarn bench
looks.
I think a test should be something that passes or fails in CI. Are these perf tests that pass or fail or a tool for collecting performance data? If the latter, I think that disqualifies them from tests
. The next most natural option to me is tools
.
I think a case could be made for conventionally having a benchmarks directory parallel to test.
No objection. That would satisfy my requirement about what a "test" is.
I think moving the bootstrap test driver stuff into tools makes sense…
+1 to that paragraph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these perf tests that pass or fail or a tool for collecting performance data?
They're not pass/fail tests, so that does argue for their exclusion from the test
directory.
However, they are not tools for collecting perf data generally but actual perf tests of specific things, in the same way that tests are actual correctness tests of specific things. In other words, I don't think "find out how long opening a vault takes" and "find out how long doing a liquidation takes" and "find out how long installing a new contract takes" are operations that belong in a directory called tools
. In contrast, the benchmark framework itself (Benchmarkerator and its brethren) I think does belong in a tools
directory in the same way that, say, a module for setting up Ava belongs there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the above conversation, the resolution approach I'm going to take is: land this PR with things located where they are now, then do a follow-on PR that does all the file relocation and renaming as discussed (i.e., tooling and support code in tools
, the benchmarks themselves in a new benchmarks
directory). I suspect the benchmark tooling may eventually want to be in a different package (possibly even a new package of its own), but I think we can reasonably defer that until we have better visibility onto how and where it's being used.
// taxonomy the name of this module should sit, but it's probably somewhere. | ||
|
||
/** | ||
* The benchmark context object. The benchmarkerator provides this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a fun name trading against clarity. consider benchmarker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for clarity.
* params: Record<string, string>, | ||
* | ||
* // Other unparsed parameters from the command line | ||
* argv: string[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd for argv
to be arguments other than the ones parsed. more on this below…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea (this being a work in progress) was to provide a simple way to specify named configuration properties on the command line, while still allowing the command line to be parsed by the benchmark if its author needed to have some kind of more sophisticated parameterization syntax if simple name/value pairs wouldn't work for their use case.
}; | ||
|
||
let stillScanningArgs = true; | ||
while (argv[0] && argv[0].startsWith('-') && stillScanningArgs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are robust libraries for parsing arguments. please use one.
elsewhere in this repo we use https://github.com/tj/commander.js
the one that is built into Node 18+ is https://github.com/pkgjs/parseargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm partial to https://www.npmjs.com/package/yargs myself, but these kinds of packages seemed like gross overkill for what's going on here.
|
||
const actors = { | ||
gov1: await walletFactoryDriver.provideSmartWallet( | ||
'agoric1ldmtatp24qlllgxmrsjzcpe20fvlkp448zcuce', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these constants meaningful? please source from a constants block or module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh. They're pretty arbitrary. I think this is one of those things that's still in discovery mode.
1b81633
to
f79d8d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the detailed documentation. Thanks for writing it up.
The PR is in the right direction. I have left a couple of comments. Regarding the comment on splitting Benchmarking Tool and Swingset-related code in two different files: If you do go on that route, I would also go on to recommend thinking of trying an already existing library as a benchmarking tool instead of building our own if functionality is pure benchmarking.
Example library that might be useful: https://www.npmjs.com/package/tinybench
@@ -0,0 +1,580 @@ | |||
import process from 'node:process'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the benchmarking code and the SDK helper methods are entwined within the same component. For maintainability, reusability, and clarity, I recommend segregating these two functionalities.
Benchmarking Tool (Benchmarkerator): The core tool should primarily deal with orchestrating the benchmark tests, i.e., setting up benchmarks, executing them, and reporting the results. It should be agnostic to the specifics of what it is testing. This will make the tool more generic and possibly usable in other contexts or projects.
BenchmarkTestBase: Instead of having makeSwingsetTestKit and walletFactory actors inside the benchmarking tool, maybe create a separate BenchmarkTestBase that the benchmark implementors import on a need basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but we're still working our way down the learning curve on this, and our experience sample size is still too small to yield a clear delineation line. While there's a lot of stuff that is clearly on one side or the other, there's still enough confused stuff in the middle that I think we need a couple more exemplars in order to get a picture good enough to know where to slice the thing into pieces.
// taxonomy the name of this module should sit, but it's probably somewhere. | ||
|
||
/** | ||
* The benchmark context object. The benchmarkerator provides this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for clarity.
@@ -0,0 +1,171 @@ | |||
# The Benchmarkerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This readme made understanding and reviewing the tool a breeze. Appreciate the effort you put into it!
In the interest of being a good project management citizen, I've created issue #8327, which this PR is retrospectively responsive to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. PR looks good to me. I will let @warner take a quick look as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! we'll iterate more over time, I see nothing here that should prevent it from landing
The merge conflicts are from #8313 I rebased this branch onto master and pushed to https://github.com/Agoric/agoric-sdk/tree/ta/benchmarkerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second @toliaqat's praise for benchmarkerator.md, which really gave me a good perspective on the approach.
This seems like a step forward, but maybe a little in the wrong direction... there's a fine line between a general-purpose benchmarking tool capable of evaluating cosmic-swingset scenarios vs. a cosmic-swingset benchmarking tool, and I would expect the latter to directly consume e.g. bundle/vat configuration shaped like decentral-…-config.json files, and also to directly expose the system under test (e.g., chain-perspective inbound message queue and storage and node-perspective Swingset kernel).
I do like the "actors" concept, but it should probably be a higher level of configuration that consumes those more powerful kernel and/or chain interfaces (e.g., configuration could include a section like actorMakers: { alice: makeSmartWalletDriver, … }
to make context include a destructurable actors
property).
In addition, if you specify the `--dump` command line option, a JSON-formatted | ||
(i.e., machine readable) version of this same data will be output to the file | ||
`benchmark-NAME.json` in the current working directory (where _NAME_ is the name | ||
that you provided as the argument to `bench.run`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this seems like bad form. What about keeping it simple for now and outputting only JSON to standard output for CLI redirection, and then later (if needed) either a separate invocation for pretty-rendering or a CLI option to incorporate that onto standard error?
$ # option 1
$ node benchmark-foo.js | tee benchmark-foo.json | npx benchmarkerator --pretty
$ # option 2
$ node benchmark-foo.js --stderr-pretty > benchmark-foo.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't control output generated by the benchmark itself (indeed, the current output even without the stats report and without the --verbose
option is already incredibly verbose), so separating the benchmark's own output from the stats report output would require various subprocess voodoo that would impede debugging.
An alternative I have been contemplating is changing the option to --dump FILENAME
, i.e., specify the data destination directly on the command line rather than inferring the filename from the benchmark definition. The thing deterring me from that path is that it would make it trickier to drive multiple benchmark files in a benchmarks directory from yarn
, as the yarn script would itself be responsible for synthesizing the filename, which could be sketchy.
50caf26
to
18cb033
Compare
`PassableEncoding` (not to be confused with `encodePassable`) was a non-standard serialization scheme used in some tests, which encoded remote references in-band using Symbols with magic names rather than using the normal marshal package machinery that puts these into the 'slots' element of the standard capdata format. This bypassed various message filtering and transformation logic in the kernel and also required special methods to be present in the bootstrap vat to translate this encoding and relay messages to their actual intended destinations. This has now been removed. The relatively small number of tests which used `passableEncoding` have been updated to use `kmarshal` instead. Messages and data are now encoded in a form that all the other code understands. Test messages are also now delivered directly to their destinations without having to count on the existence of a relayer. In support of this, the controller's `queueToVatRoot` method has been augmented by the addition of a `queueToVatObject` method, allowing tests to send messages to specific objects, targeted using remotable references of the sort returned by `kunser`. The test support library that a lot of the bootstrap tests use has been updated to use this improved mechanism. In addition, `kmarshal` itself has been upgraded using a trick that MarkM provided for tagging promises, which allows `kmarshal` to be truly stateless. The (former) statefulness of `kmarshal` caused problems when the module was imported into different compartments, as each compartment ended up with its own module instance and thus its own version of the state. This in turn caused these compartments to have different beliefs about how particular promises were represented, which caused various things to break. That's all fixed now. One wart which has NOT been taken care of in this PR, but which will be addressed in a follow-on PR that we were already planning for, is the duplication of `kmarshal.js` in both the SwingSet package and the liveslots package. The forthcoming PR will perform a bunch of file renaming and relocation to put a bunch of support tooling, used by both benchmarks and tests, into a package of its own, thereby eliminating a lot of weird dependencies and files in places they don't belong. As part of this I plan to relocate `kmarshal` into a package of its own that can then be cleanly imported by the kernel, liveslots, and the various tests and test support tooling. All this is in support of issue #8327
@FUDCo I rebased this onto master and resolved the conflicts. It saved time to squash the fixups first so I did that. |
4cca300
to
87dbbda
Compare
This PR adds a module
benchmarkerator.js
that enables performance benchmarks to be run as standalone Node executables, while automatically managing benchmark orchestration, performance measurement, and report generation. Currently this module lives in theboot
package because it makes use of a lot of the existing bootstrap test tooling there. However, I think it likely that this will not be its final home, though where it properly should end up is still an open question. I also expect over time that it's going to drift away from the bootstrap test stuff.This includes a version of the vault open benchmark (the one that has been a repeated guinea pig for our benchmark tooling efforts) that is implemented using The Benchmarkerator.
This is a preliminary cut at an "outside view" benchmark driver, just as the earlier PR #8239 was a preliminary cut at an "inside view" benchmark driver. There are a number of things that are still in need of work, but I wanted to get this into the review pipeline for critique. Among the stuff that's already on the TODO list for this:
Since the only benchmark that's currently implemented using this is the vault open test, it is likely that when we start implementing other benchmarks we'll find that the chain setup support that's currently in there is overly tuned to that particular case. In particular
I suspect that dealing with these will probably end up driving an additional degree of configurability into the design.
It currently leans heavily on the bootstrap test scaffolding in ways that may be inappropriate. In particular:
queueToVatRoot
,queueToKref
) and the using thepassableEncoding
module (instead of thekmarshal
module) which uses an in-band encoding of remote references using symbols instead of the capdataslots
abstraction, along with manual serialization and deserialization in the relay vat.