You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've gone back and forth about how much local variation we can accomodate and still maintain consensus in a distributed swingset (i.e. the chain). For various reasons, it would be nice if each validator could make local decisions about:
the exact version of XS to use, to allow bugfixes or performance improvements to be deployed incrementally, instead of requiring a "flag day" (a simultaneous upgrade of all validators across the entire chain)
when to "page out" a vat (i.e. kill the xsnap process): this saves memory at the expense of time spent loading the vat back in later, and different validators may have different amount of memory
paging a vat out means it must some day be paged back in, at the latest by the time a message must be delivered to that vat, which gives each validator another cache-policy decision to make
when to record a heap snapshot of any given vat (making the subsequent page-in faster, by removing/skipping transcript entries), at the cost of more disk IO
when to perform/allow/force GC within a vat, which affects the memory usage
Our primary requirement is that all validators in a consensus machine actually maintain consensus: they agree upon some pre-defined subset of their activity, and that subset is sufficient to capture the overall state that users care about (e.g. token balances, governance outcomes, etc). We can exclude minor things from that subset if they cannot cause variations in the major things.
Metering makes this especially tricky, because so much of a vat's activity is subject to the CPU and memory meters. For example, if we want to exclude the UNREACHABLE-vs-COLLECTED-vs-FINALIZED state of an Object from consensus (allowing variation in the timing of GC), we must also exclude (from metering) the behavior of any code which is influenced by that state distinction. Like writing cryptographic code whose memory accesses or timing does not depend upon secret data, this requires tremendous care, as well as a deep understanding of how the underlying engine behaves, and is not generally covered by automated testing (making it fragile).
The basic decision tree I've figured out so far looks like this:
The green boxes/circles indicate choices that we've already made, or which are pretty obvious. We certainly must allow validators to restart the process. We know that taking an XS snapshot affects the GC behavior (it does a forced GC just before writing the snapshot). We're already using the deadSet and an "unmetered box" to conceal the consequences of GC.
We're uncertain whether the GC behavior of a reloaded (post-read) snapshot is identical to the original (post-write) process: this was previously not the case, because the "headroom" was reduced during the reload process, but recent changes to XS (in particular using mmap instead of malloc, and writing the size of the mmap-ed slab into the snapshot) may have changed this. We're uncertain whether finalizers can run spontaneously (and prefer not to rely upon the opposite). We don't know whether it's possible to use C hooks to disable CPU metering during finalization.
This ticket is to explain and explore the options we have. It's related to #1872 and #2615 .
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
We've gone back and forth about how much local variation we can accomodate and still maintain consensus in a distributed swingset (i.e. the chain). For various reasons, it would be nice if each validator could make local decisions about:
xsnap
process): this saves memory at the expense of time spent loading the vat back in later, and different validators may have different amount of memoryOur primary requirement is that all validators in a consensus machine actually maintain consensus: they agree upon some pre-defined subset of their activity, and that subset is sufficient to capture the overall state that users care about (e.g. token balances, governance outcomes, etc). We can exclude minor things from that subset if they cannot cause variations in the major things.
Metering makes this especially tricky, because so much of a vat's activity is subject to the CPU and memory meters. For example, if we want to exclude the UNREACHABLE-vs-COLLECTED-vs-FINALIZED state of an
Object
from consensus (allowing variation in the timing of GC), we must also exclude (from metering) the behavior of any code which is influenced by that state distinction. Like writing cryptographic code whose memory accesses or timing does not depend upon secret data, this requires tremendous care, as well as a deep understanding of how the underlying engine behaves, and is not generally covered by automated testing (making it fragile).The basic decision tree I've figured out so far looks like this:
The green boxes/circles indicate choices that we've already made, or which are pretty obvious. We certainly must allow validators to restart the process. We know that taking an XS snapshot affects the GC behavior (it does a forced GC just before writing the snapshot). We're already using the
deadSet
and an "unmetered box" to conceal the consequences of GC.We're uncertain whether the GC behavior of a reloaded (post-read) snapshot is identical to the original (post-write) process: this was previously not the case, because the "headroom" was reduced during the reload process, but recent changes to XS (in particular using
mmap
instead ofmalloc
, and writing the size of the mmap-ed slab into the snapshot) may have changed this. We're uncertain whether finalizers can run spontaneously (and prefer not to rely upon the opposite). We don't know whether it's possible to use C hooks to disable CPU metering during finalization.This ticket is to explain and explore the options we have. It's related to #1872 and #2615 .
The text was updated successfully, but these errors were encountered: