Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ses): finite deep stacks, on by default #1513

Merged
merged 2 commits into from
Mar 19, 2023
Merged

Conversation

erights
Copy link
Contributor

@erights erights commented Mar 17, 2023

The deep stacks continue to use the assert.note mechanism to associate an error with previous stack-capturing errors. The unbounded leak that caused us to turn deep stacks off was really the unbounded memory of error-note-graphs.

Prior to this PR, the line disabling deep stacks by default was commented

// ... We intend to change the default after verifying that having
// the feature enabled in production does not cause memory to leak.

This PR makes the error-note storage into a finite budget LRU cache, defaulting to a budget of 1000. With the memory leak fixed, track-turns.js is changed to enable deep stacks by default. Measurements will still be needed to see what the cost is, and what is a reasonable budget.

@erights erights self-assigned this Mar 17, 2023
@erights erights marked this pull request as draft March 17, 2023 03:03
@erights
Copy link
Contributor Author

erights commented Mar 17, 2023

@kriskowal if you could have a look at this, that'd be great. CI is giving me a jackpot of errors, all of which seem rooted in

Error running main: TypeError: Cannot read property 'length' of undefined
    at main (file:///home/runner/work/endo/endo/packages/ses/scripts/bundle.js:28:46)

where the immediately relevant code in bundle.js is

  const { code: terse } = terser.minify(bundle, {
    mangle: false,
    keep_classnames: true,
  });

  console.log(`Bundle size: ${bundle.length} bytes`);
  console.log(`Minified bundle size: ${terse.length} bytes`);

However, this PR itself is clearly completely independent of bundling. I don't see how it could be affected. Any ideas?

@erights erights requested a review from kriskowal March 17, 2023 03:30
Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unbounded leak that caused us to turn deep stacks off was really the unbounded memory of error-note-graphs.

Huh, really? I would have expected takeNoteLogArgsArray to free that... was nothing calling it?

packages/eventual-send/src/track-turns.js Outdated Show resolved Hide resolved
packages/ses/src/error/note-log-args.js Outdated Show resolved Hide resolved
packages/ses/src/error/note-log-args.js Show resolved Hide resolved
packages/ses/src/error/note-log-args.js Show resolved Hide resolved
packages/ses/src/error/note-log-args.js Outdated Show resolved Hide resolved
packages/ses/src/error/note-log-args.js Outdated Show resolved Hide resolved
@erights
Copy link
Contributor Author

erights commented Mar 17, 2023

The unbounded leak that caused us to turn deep stacks off was really the unbounded memory of error-note-graphs.

Huh, really? I would have expected takeNoteLogArgsArray to free that... was nothing calling it?

Only for errors that are actually reported. The fatal leak were for errors that could eventually be potentially reported, but were not.

@erights
Copy link
Contributor Author

erights commented Mar 17, 2023

PTAL

@erights erights requested a review from gibson042 March 17, 2023 20:34
@erights erights marked this pull request as ready for review March 18, 2023 01:14
@erights
Copy link
Contributor Author

erights commented Mar 18, 2023

I also changed the default budget from 10_000 to 1000 because, in the absence of measurement, I'd really like to avoid guessing too high. If too low, it is still overall better than the status quo, with deep stacks default turned off. And we can always increase it later.

We should also consider configuring the budget with an environment option.

@erights erights force-pushed the markm-error-collector branch 2 times, most recently from d5fd4c2 to a678a6d Compare March 18, 2023 01:30
@erights
Copy link
Contributor Author

erights commented Mar 18, 2023

Worked around stale tooling. Bug at #1514 . (@kriskowal , thanks!)

@erights erights force-pushed the markm-error-collector branch 3 times, most recently from 4d40dcc to f9fa1a4 Compare March 18, 2023 02:35
@erights
Copy link
Contributor Author

erights commented Mar 18, 2023

Re #1513 (comment)

I reduced the note-args budget again, this time to 100. At 1000 I ran into one problem:

// Bizarrely, if set to 1000 or 10_000, captp/test/test-gc.js
// test 'test loopback gc' fails ONLY ON WINDOWS.
// TODO: Figure out why, and either special case or fix.
// TODO: Once captp/test/test-gc.js doesn't stop us,
// set this default back to a more reasonable number.
// TODO: make this default configurable by an environment option.
const defaultLogArgsBudget = 100;

@erights
Copy link
Contributor Author

erights commented Mar 18, 2023

Given the nature of this PR, before merging, it is worth checking the effect on performance. The following first crude test is consistent enough to show we're ok, or even a bit better!

#1515 , identical to current master, has the following timings from CI

image

image

Compare with the timings of this PR under CI. Bizarrely, this PR is always the same or faster. (ignoring the failed test)

image

image

@erights
Copy link
Contributor Author

erights commented Mar 18, 2023

At 100 the same test now fails or passes intermittently, which is even worse!

@erights erights changed the base branch from master to mfig-captp-gc-tests March 19, 2023 20:24
@erights erights marked this pull request as draft March 19, 2023 20:25
@erights erights requested a review from michaelfig March 19, 2023 20:26
Base automatically changed from mfig-captp-gc-tests to master March 19, 2023 23:20
@erights erights force-pushed the markm-error-collector branch 3 times, most recently from 35b7cce to ae6cd87 Compare March 19, 2023 23:32
@erights erights marked this pull request as ready for review March 19, 2023 23:36
@erights
Copy link
Contributor Author

erights commented Mar 19, 2023

Re #1513 (comment) , #1516 fixed the immediate problem, the OS-dependent behavior. However, this PR still changes the counts, for reasons we (@michaelfig and I) do not yet precisely understand but believe are fine for now.

@michaelfig, I changed those lines as we discussed but also added a TODO

  // TODO(mfig,erights): explain why #1513 changed these counts from 3 to 2
  t.is(getFarStats().sendCount.CTP_DROP, 2);
  t.is(getNearStats().recvCount.CTP_DROP, 2);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants