Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsnap leaks memory during snapshot write #5975

Closed
warner opened this issue Aug 17, 2022 · 7 comments
Closed

xsnap leaks memory during snapshot write #5975

warner opened this issue Aug 17, 2022 · 7 comments
Assignees
Labels
bug Something isn't working xsnap the XS execution tool

Comments

@warner
Copy link
Member

warner commented Aug 17, 2022

I was having trouble reproducing #5910 locally, by replaying a transcript extracted from the original slogfile. I realized that one thing the original was doing, but my replay was not, was writing out the periodic heap snapshot (every 200 deliveries or so). When I added that to packages/SwingSet/misc-tools/replay-transcript.js, I saw an immediate growth of the xsnap process' VmRSS size, about 5MB for every snapshot it wrote.

I'll dive into xsnap.c next, but I'm guessing that a temporary buffer is allocated to write a snapshot, and then it is never freed.

@warner warner added bug Something isn't working xsnap the XS execution tool labels Aug 17, 2022
@warner warner added this to the Mainnet 1 RC0 milestone Aug 17, 2022
warner added a commit that referenced this issue Aug 17, 2022
Little demo to show xsnap worker process memory growing after each
snapshot write, about 31 kB.

refs #5975
and the original #5910
@warner
Copy link
Member Author

warner commented Aug 17, 2022

My test program (https://github.com/Agoric/agoric-sdk/blob/614c849d61f8a2c24c82fbc48fcab29338774f25/packages/xsnap/test-snapshot-memleak.js) shows a memory leak of a trivial worker (just lockdown and our supervisor bundle) of about 30 kB per snapshot write. The leak is unaffected by putting large strings into the worker's heap, but adding new properties to the global object increases each leak by about 24 bytes. This suggests that the snapshot writing process is malloc()ing a temporary buffer based on the number of slots in use (not chunks, which hold strings), and then not freeing the buffer afterwards.

That makes the leak roughly proportional to the complexity of the heap (number of objects, properties, edges between objects, etc). Zoe is one of our more complicated vats, so the snapshots include a lot more slots, so the leak is faster.

I've let the Moddable folks know.

For now, our workaround of evicting each worker after each snapshot write is a decent one. If we really wanted to, we could claw back some performance by only evicting after every N snapshots, but I think it's better to wait for the upstream fix.

@warner
Copy link
Member Author

warner commented Aug 17, 2022

The XS code does a really good job of not allocating memory. And I don't see anything obvious in the xsnap-worker.c code that might accumulate data created during the snapshot write.

I do see that the fxWriteSnapshot() function wraps the code in a mxTry/mxCatch macro pair, and I think those use longjmp, and the code is making a lot of subroutine calls, perhaps enough to grow the stack by a non-trivial degree. Perhaps the longjmp behavior is somehow creating an alternate stack and then leaking it? That's not how I know stacks to work, but I'm not seeing any obvious explanation for the leak yet.

@Tartuffo Tartuffo removed this from the Mainnet 1 RC0 milestone Aug 17, 2022
@phoddie
Copy link

phoddie commented Aug 17, 2022

Here's a patch to xsSnapshot.c from Patrick (also provided by email) to use until we get the next Moddable SDK update pushed. (The leak was introduced as part of the recent work to fix snapshot divergences).

@@ -2035,6 +2035,8 @@ int fxWriteSnapshot(txMachine* the, txSnapshot* snapshot)
mxCatch(the) {

}
+ if (snapshot->slots)
+ c_free(snapshot->slots);

projectionAddress = &(snapshot->firstProjection);
while ((projection = *projectionAddress)) {

@warner
Copy link
Member Author

warner commented Aug 17, 2022

Awesome. I've confirmed that fixes the leak.

warner added a commit that referenced this issue Aug 17, 2022
report the worker PID, create heap snapshots (to exercise #5975) leak
@warner
Copy link
Member Author

warner commented Aug 17, 2022

Next step: wait for Moddable to update the SDK, then make an https://github.com/agoric-labs/moddable PR to pull that in, then make an agoric-sdk PR to update our packages/xsnap/moddable submodule to that version.

mergify bot added a commit that referenced this issue Aug 17, 2022
report the worker PID, create heap snapshots (to exercise #5975) leak

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
warner added a commit to agoric-labs/moddable that referenced this issue Aug 17, 2022
Interim patch from Moddable to close the memory leak that happens each
time a heap snapshot is written.

refs Agoric/agoric-sdk#5975
warner added a commit that referenced this issue Aug 17, 2022
This switches our 'moddable' submodule to use a version that include
the patch for #5975

The long-term fix will be for us to update to a new Moddable SDK
release, which will include this fix (among others).

refs #5975
mergify bot pushed a commit that referenced this issue Aug 18, 2022
…5987)

This switches our 'moddable' submodule to use a version that include
the patch for #5975

The long-term fix will be for us to update to a new Moddable SDK
release, which will include this fix (among others).

refs #5975
warner added a commit that referenced this issue Aug 19, 2022
This provides the upstream version of the snapshot-write memory leak
fix, along with other bugfixes.

closes #5975
warner added a commit that referenced this issue Aug 19, 2022
This provides the upstream version of the snapshot-write memory leak
fix, along with other bugfixes.

closes #5975
@warner warner self-assigned this Aug 20, 2022
warner added a commit that referenced this issue Aug 22, 2022
This provides the upstream version of the snapshot-write memory leak
fix, along with other bugfixes.

closes #5975
@dckc
Copy link
Member

dckc commented Aug 24, 2022

I'm inclined to close this, since the symptoms are addressed by #5987 .

The open issue now is some technical debt: we're out of sync with moddable master. We were keeping this open until that was addresses, presuming it would be straightforward; but #6011 turned out to be not straightforward. Do we want to track "we're out of sync with moddable" as an issue? @warner @kriskowal @mhofman ?

@dckc
Copy link
Member

dckc commented Aug 30, 2022

I got a second from @mhofman and no objections on closing this.

@dckc dckc closed this as completed Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working xsnap the XS execution tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants