Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better runner compatibility with testnet #44

Merged
merged 9 commits into from
Jan 17, 2022

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Jan 8, 2022

This PR updates the runner with a grab bag of fixes I did while making the stats generation more robust, and working on moving the provisioning check of the client for a testnet chain to the setup phase.

Checking earlier allows the testnet loadgen to pause before starting and wait until the client is provisioned, instead of requiring action after the start, which may be a while when catching up to an existing chain. I originally started adding the ability to auto provision a client for a local testnet (based on the keys and the same logic as a local-chain), but reverted course and moved that functionality into a separate script used by the integration test (see Agoric/agoric-sdk#4256).

List of various fixes:

  • Update to node 16 and Debian bullseye (with fallback to node 14 for older incompatible SDKs)
  • Handle some older SDK versions which output lockdown sniffing to stdout instead of stderr
  • Rewrote the config argv parsing logic, to make it behave slightly more sanely
  • Fixed some deadlock issues, e.g. adding some timeouts on task ready (see Fix timeout escapes in runner #40) or slog streams not closing properly
  • Capture client storage and slog file (should help track some transient seg faults in the solo)
  • Automatically capture the state of the client and chain if an error occurs (see Save storage improvements #39)
  • Background the compression of the state directories after snapshotting them (overlayfs supports CoW). Closes Save storage improvements #39
  • Avoid resetting the whole agoric-servers project in local-chain tests. Removes loadgen project git dependency.
  • Elide long lines from the chain or solo output (improves github actions perf, see ci: optimise integration test agoric-sdk#4113)

Stack on top of other compatibility fixes (#37).

As always, best reviewed commit-by-commit.

@mhofman mhofman requested review from warner and michaelfig January 8, 2022 08:26
@mhofman mhofman mentioned this pull request Jan 8, 2022
@mhofman mhofman force-pushed the mhofman/runner-testnet-compat branch from 4961771 to afdc022 Compare January 9, 2022 03:59
Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The only stylistic issue I have is that you're using == and !=, which the team has vowed never to do (except as an accident). I can understand that maybe it's a terse way to express what you want, but I would be more comfortable if you wrote things out explicitly using === or !==. I can't really tell what your intent is with the double-equals forms.

@mhofman
Copy link
Member Author

mhofman commented Jan 9, 2022

The only stylistic issue I have is that you're using ... == null and ... != null, which the team has vowed never to do

Interesting. If that's the case we probably should have the corresponding eslint rule enabled.

I've been following the == null rule to mean null-ish, so either null or undefined, which the community seem to have settled on, including by standardizing ?. and ??.

@michaelfig
Copy link
Member

Interesting. If that's the case we probably should have the corresponding eslint rule enabled.

Even more interesting, the "require triple-equals" rule has exceptions for == null.

I've been following the == null rule to mean null-ish, so either null or undefined, which the community seem to have settled on, including by standardizing ?. and ??.

That's a good enough reason for me. I don't want to block landing this change, but I do want to be sure @erights is aware of these deviations from Jessie:

For context, this is the loadgen package, which definitely is bound to Node.js, and is running off-chain. It may be a good reason to allow deviations from Jessie where they are community-accepted idioms (including using the class keyword, which some of our code seems to need to do to interact with external libraries).

@mhofman
Copy link
Member Author

mhofman commented Jan 9, 2022

Oh yeah this is definitely node specific code. And I know I don't always strictly follow ocap for threading powers.

I also wasn't able to be strict about nested await. Might be interesting to do a pass later to see what could be done about all those deviations, but I'd rather not block these PR on that.

@erights
Copy link
Member

erights commented Jan 10, 2022

Where are these == null occurrences?

@michaelfig
Copy link
Member

michaelfig commented Jan 10, 2022

Where are these == null occurrences?

For example, https://github.com/Agoric/testnet-load-generator/pull/44/files#diff-01ae6218516e017b56a165545f3fa9bf13ae8abb57449c9ffaf49b7dbd048b48R650

(That link doesn't work. After expanding runner/lib/main.js in this PR, line 650.)

Object.values(stats.cycles).reduce(
(max, cycleStats) =>
cycleStats &&
cycleStats.duration != null &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth it. Please expand to test both null and undefined if that's the semantics you want. My reasoning:

Experienced JS and Node programmers will know the idiom. To any readers both unfamiliar with the idiom and with JS's truly bizarre == coercion rules, this is a barrier to understanding. One of my favorite Carl Hewitt quotes:

A program should not only work, it should appear to work

This code does not say what it means. Code should be read more often than it is written. The expertise required to read a piece of code and basically puzzle it out should be less than the expertise needed to write that code.

If you did want to keep the idiom, I'd say you must annotate each occurrence with an explanatory comment, which kinda defeats the purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this to my attention.

Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works, I'm happy with it :)

@mhofman mhofman force-pushed the mhofman/runner-compat-fixes branch 2 times, most recently from 1e66f9e to 2e870bc Compare January 17, 2022 22:16
Base automatically changed from mhofman/runner-compat-fixes to main January 17, 2022 22:23
@mhofman
Copy link
Member Author

mhofman commented Jan 17, 2022

@erights I'd like to merge this PR as-is. This is in the loadgen runner, and orchestrator cli, which is probably the furthest away from the chain you can get.

If disallowing == null is what we decide, I'd like this to be enforced by our eslint config, which this repo and package use.

@mhofman mhofman requested a review from erights January 17, 2022 22:30
@mhofman mhofman force-pushed the mhofman/runner-testnet-compat branch from 58cd8be to ceb913f Compare January 17, 2022 22:31
@mhofman mhofman merged commit 376964d into main Jan 17, 2022
@mhofman mhofman deleted the mhofman/runner-testnet-compat branch January 17, 2022 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save storage improvements
4 participants