Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveslots finalizers run metered #6795

Open
mhofman opened this issue Jan 15, 2023 · 4 comments
Open

Liveslots finalizers run metered #6795

mhofman opened this issue Jan 15, 2023 · 4 comments
Assignees
Labels
bug Something isn't working liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet vaults_triage DO NOT USE

Comments

@mhofman
Copy link
Member

mhofman commented Jan 15, 2023

Describe the bug

I don't see any code disabling / or restoring metering around finalizers execution.

Since metering differences may result in execution differences through the run policy (and in the future will be directly checked to be in consensus with #6770), we should be careful to exempt gc related tasks from metering. Conceptually related to #6784, but differently observable.

I have confirmed from looking at the xsnap-worker code that finalizers run after draining of the promise queue, but before executing any setImmediate callbacks. The callbacks themselves don't disable the metering, and I'm not sure if the callback invocation itself may not increment the meter.

Of course the finalizer itself should strive to not produce an observably different execution, which is currently covered by #6760.

@mhofman
Copy link
Member Author

mhofman commented Jan 15, 2023

A possible solution is to:

  • Update xsnapPlatform.c to capture and restore the meter around executing FR callback jobs.
  • In the supervisor specific to xsnap-worker, wrap FinalizationRegistry so that the callback is run unmetered to reflect the meter disablement of the host
  • For good measure, wrap WeakRef similarly and assert that any deref() call occurs while unmetered.

@warner
Copy link
Member

warner commented Jan 24, 2023

If we add computrons usage to consensus (perhaps via the transcript holding the delivery results), this becomes even more important.

@warner
Copy link
Member

warner commented Apr 12, 2023

@mhofman and I discussed this yesterday. I'm kinda inclined to WONTFIX this, and stop trying to make our metering insensitive to GC.

  • the original goal was to avoid overconstraining validators: allow them to run slightly-different versions of XS
    • we're clearly not allowing that: the XS/xsnap they get is a well-defined component/dependency of the kernel they're using, and we aren't trying to permit them to each run a different kernel
  • the secondary goal was to tolerate variations during replay (if we upgrade validators to a new xsnap, their existing transcripts will be replayed under the new version, and differences could cause problems)
    • the biggest problem was syscall divergence, because getting the syscall results out of the transcript depends upon them happening in the correct order
      • we think that's been resolved
    • the second problem is knife-edge effects on metering limits: the new xsnap must complete any delivery that the transcript recorded as successful in the old xsnap
      • if the new xsnap consumes more computrons or memory, and the old delivery was close to the edge, then this will fail, and panic the kernel upon restart
      • "don't do that"
    • the third problem is differences in recorded metering data (i.e. what we charge the vat for)
      • but we'll ignore that during replay: you only get charged during the initial delivery
  • a tertiary goal was to improve the chances of our hail-mary "sleeper-agent/replay-style" upgrade idea, in which we must replay a whole incarnation's worth of transcripts (or maybe even all incarnations), and syscall variation would make that harder
    • this is exactly the sort of situation where we'd want to run under a significantly different version of XS
      • like one which was not compatible with existing heap snapshots
    • we would expect significantly different metering and allocation behavior from the new XS
      • so the upgrade plan would need to tolerate that, rather than expecting the vat to hide it
    • if we replay the entire history (all incarnations), then we'd build a new vatstore anyways
      • so we'd extract deliveries and send/resolve syscalls from the transcript for replay
      • but we'd build a new transcript that includes the new vatstore calls it makes

We put a bunch of energy into making the computron usage not be affected by code running in GC-sensitvity situtations, but as this ticket notes, that work wasn't complete. And we haven't put any energy into trying to make allocation counters be unaffected (and it's probably impossible, since allocation affects the future much more than a computron counter).

So I'm not sure it's worth trying to hide that usage. And it would simplify the code a fair bit if we removed the unmetered wrappers.

@warner
Copy link
Member

warner commented Apr 24, 2023

@mhofman and I agreed to move this out of Vaults, while we think about how to handle GC metering in general

@toliaqat toliaqat added the liveslots requires vat-upgrade to deploy changes label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

3 participants