Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor events and advice injectors #1457

Open
bobbinth opened this issue Aug 17, 2024 · 5 comments
Open

Refactor events and advice injectors #1457

bobbinth opened this issue Aug 17, 2024 · 5 comments
Assignees
Labels
air Related to Miden AIR and constraints assembly Related to Miden assembly instruction set Related to Miden VM instruction set processor Related to Miden VM processor
Milestone

Comments

@bobbinth
Copy link
Contributor

Background

The trigger for this issue is the situation we faced after migration to the MastForest-based execution is illustrated by the following code:

proc.foo
    push.1
end

proc.bar
    adv.push_mapval
    push.1
end

begin
    exec.foo
    exec.bar
end

In the above, both foo and bar have the same MAST root because they only difference between them is the presence of the adv_push_mapval decorator and presence of decorators does not affect the program MAST.

The reason why decorators do not affect the MAST is because from the trace (and verifier) standpoint, the two procedures are actually identical. However, from the executor/prover standpoint they are clearly not. So, we have the following issue:

  • For the verifier, MAST roots uniquely identify the underlying program and so, the verifier can be sure that they've verified the proof for the right program.
  • For the prover, MAST roots currently do not identify the program - and so, to understand which procedure to execute, we cannot rely on them and need to rely on something else instead.

The second point is problematic because we currently rely on MAST roots to identify procedures during execution (e.g., all calls are done by MAST roots).

Broadly, there are two ways to solve this:

  1. For execution purposes, we can compute hashes in some other way and this way will include decorators into account. Then, we can map each such commitment to MAST root and serve it to the processor during execution.
  2. Update the methodology by which we compute MAST roots to include decorators in them. The downside here is that decorators would now need to be included in the program hash and thus, will affect the state of the VM. This my cause performance degradation (more on this later).

Related problem

There is another problem that is somewhat related to this. There is a subset of decorators we call "advice injectors" which provide custom functionality for specific instructions/procedures - but in a way are hard-coded in the VM. For example, we have an adv.push_smtpeek decorator which exists in the processor solely to support several SMT-related procedures in miden-stdlib (this is a kind of backward coupling).

However, for anything other than mdien-stdlib we cannot add decorators to the VM (as we cannot predict what kind of decorators the people will need downstream). To address this gap, we have the emit.<event_id> decorator on which we rely in miden-base. But this also has its own issues as it is not clear how multiple 3rd-party libraries can use custom events at the same time.

So, to summarize, we have advice injectors and events that do kind of similar things but for different contexts, and we also have some implied dependencies between downstream components (i.e., miden-stdlib) and the VM itself.

Potentail solution

I think we can address the above issues holistically. Taking a step back, we really have 3 classes of operations in the VM:

  1. Regular operations - these changes the internal VM state (stack, memory).
  2. Advice injector and event - these do not change the state of the VM, but may change the state of the advice provider.
  3. Decorators which do not affect the VM state or the advice provider - e.g., trace, debug, and AsmOp decorators.

Only the first two have semantic meaning: regular operations are meaningful both for the prover and the verifier; advice injectors and events are meaningful for the prover. The 3rd class of operations can be safely stripped from a program and it should not effect anything about how the program is executed or verified.

So, my proposal would be to change advice injectors and events into a kind of "Noop" operations which affect the program hash, but do not change the state of the stack or memory. This is basically approach 2 described above: we include these decorators into the MAST root computation and so if a decorator changes, MAST root of a program would change as well.

Below is a description of how this can be done.

Emit operation

First, we would add an Emit operation to the set of operations which can be executed by the processor. This operation could look like so:

pub enum Operation {
    ...
    Emit(u32),
    ...
}

It would be the second operation which carries an immediate value (Push(Felt) being the other such operation).

The processor treats Emit operation very similar to the Push operation, except nothing gets pushed onto the stack. This way, from the trace standpoint, Emit is effectively a Noop but the immediate value is included in the MAST root computation.

The processor would also call the Host::on_event() handler passing the inner u32 value as event_id. This mechanism would replace our current Event decorator (we are basically changing Event decorator into Emit operation).

However, we can push this a step further and replace all advice injectors with the Emit instruction. Specifically, we can interpret event_id as follows:

  • 16 most significant bits identify the event source.
  • 16 least significant bits identify the actual event.

The sources could have the following meaning:

  • 0 - events emitted internally by the VM. These would be used for decorators used in assembly instructions.
  • 1 - events emitted by the miden-stdlib.
  • 2 - events emitted by the miden-lib (i.e., for the rollup).
  • 3.. - 3rd party libraries.

Events with domain 0 will be handled by the VM internally and Host::on_event() handler would not be called for them. For everything else, Host::on_event() handler will forward the events to the appropriate library.

Effects on the Host and AdviceProvider traits

The above means that we should be able to simplify the Host and AdviceProvider traits. Specifically, we no longer need to make all advice injectors "overridable" - and so, we can internalize a lot of the currently public logic (AdviceProvider trait could be simplified quite a bit).

We could also get rid of the concept of AdviceExtractor, and maybe also AdviceInjector (we should be able to handle purely via event handlers). But we may need to expose the advice provider via the Host interface. Maybe something like this:

pub trait Host {
    type AdviceProvider: AdviceProvider;

    fn advice(&mut self) -> &mut Self::AdviceProvider;

}

Host:on_event() would actually need to be modified to look as follows:

pub trait Host {

    fn on_event<S: ProcessState>(
        &mut self,
        process: &S,
        event_source: u16,
        event_id: u16,
    ) -> Result<(), ExecutionError>;

}

Impact on constraints

Since the Emit operation would need to be treated similarly to the Push operation, the constraints for it would need to be similar. This means that Emit needs to be a very high-degree operation (e.g., constraints for it are of degree 5). Unfortunately, we currently don't have any more available slots for such operations.

However, since now we have two similar operations, we should be able to move them both into the "high-degree" operation category, and use their leading bits to define a degree 4 flag - e.g., something like op_with_immediate (we do something similar for control flow flags and shift-left flags). If this works, the net impact will be that we'll free up one slot in the "very high-degree" operation bucket.

Impact on libraries

We'll need to come up with a mechanism of how attach custom event handlers to libraries. There are probably many ways to do this. For example, handlers could be defined in specific library structs (e.g., StdLibrary could define a set of event handlers), or we could have some kind of registry at the VM level. I'm leaving figuring out the concrete details here for the future.

Implementation plan

Implementing the above would be quite a bit of work - so, I think we should do it stages which could look as follows:

  1. Replace Event decorator with Emit operation (this would also involve updating the constraints).
  2. Substitute advice injectors with events (this would also include Host/AdviceProvider refactoring).
  3. Introduce a way for libraries to provide custom event handlers.
@bobbinth bobbinth added assembly Related to Miden assembly instruction set Related to Miden VM instruction set processor Related to Miden VM processor air Related to Miden AIR and constraints labels Aug 17, 2024
@bitwalker
Copy link
Contributor

bitwalker commented Aug 19, 2024

I think the initial example here does a good job of making the technical issue clear, but is maybe not that great at demonstrating why it's an issue in practice. I've thrown together a basic example that I think illustrates why not accounting for advice decorators in the MAST root is problematic:

proc.with_decorators
    adv.insert_hdword
    hmerge
    adv.push_mapval
    dropw
end

proc.without_decorators
    hmerge
    dropw
end

begin
    push.1.2.3.4                  # [A]
    push.5.6.7.8                  # [B, A]
    dupw.1 dupw.1                 # [B, A, B, A]
    exec.with_decorators          # [B, A]
    padw adv_loadw padw adv_loadw # [B, A, B, A]
    dupw.2 assert_eqw             # [A, B, A]
    dupw.2 assert_eqw             # [B, A]
    exec.without_decorators       # [B, A]
    padw adv_loadw padw adv_loadw # [?, ?, B, A]
    dupw.2 assert_eqw             # ? (possibly unreachable)
    dupw.2 assert_eqw             # ? (possibly unreachable)
end

Compiling with just with_decorators (i.e. commenting out code after (and including) exec.without_decorators:

  • Hash: 0xfb88a333ac0cba2876affb051a73fa950aef65863a296f3e798b7dfb722b5a82
  • Cycles: 83
  • Length:128

Doing the same with without_decorators, but without seeding the advice stack fails with an AdviceStackReadError exception.

Doing the same with without_decorators, but seeding the advice stack with [1, 2, 3, 4, 5, 6, 7, 8], produces an identical trace:

  • Hash: 0xfb88a333ac0cba2876affb051a73fa950aef65863a296f3e798b7dfb722b5a82
  • Cycles: 83
  • Length: 128

So these two procedures have the same MAST root, but very different semantics:

  1. with_decorators stores [A, B] in the advice map with hash(A, B) as the key, we then compute that same key on the operand stack using hmerge and then arrange to place [A, B] on the advice stack by using adv.push_mapval. We then drop the key, returning having consumed the top two words of the operand stack. The remaining instructions up to exec.without_decorators, pops [A, B] off the advice stack and asserts that the value we inserted in the advice map was round-tripped. From a static analysis point of view, this program has clear semantics and well-defined behavior (and if we were to apply aggressive optimizations to such code, we could actually elide all of the code up to exec.without_decorators because assuming a faithful advice provider, it is trivially true that if we insert a value in the advice map, and request it immediately using the same key, that we get the same data, and thus the assertions are always true.
  2. without_decorators simply computes hash(A, B) and then drops the resulting hash, and returns having consumed the top two words on the operand stack. The remaining instructions attempt to perform the same check as done in step 1, however in this case, the program behavior is undefined from a static analysis point of view. We can't assume the assertions will fail, but it is extremely unlikely that they will succeed.

From an execution trace point of view, the decorators are invisible, so these two programs (as shown above) can have identical traces, depending on the contents of the advice stack - but trying to infer anything from the equivalence of those traces is pretty meaningless, since the programs are prima facie not the same (in one, we know precisely what data we expect to be asserting against, in the latter, it is effectively a roll of the dice). So what does it mean for the traces to be identical? Is that valuable on its own?

Anyway, I wanted to add a richer example here, since I think the first question one would have is "what's the practical impact of this?".

EDIT: I should also note, if compiling the example code above, if you leave both procedure definitions in the source file, you have no idea which one you'll actually get, a fun roll of the dice!

@bitwalker
Copy link
Contributor

However, for anything other than mdien-stdlib we cannot add decorators to the VM (as we cannot predict what kind of decorators the people will need downstream). To address this gap, we have the emit.<event_id> decorator on which we rely in miden-base. But this also has its own issues as it is not clear how multiple 3rd-party libraries can use custom events at the same time.

Ah that's interesting, I hadn't considered that other custom decorators might be desirable, but that makes a ton of sense (e.g. for implementing things like adv.push_u64div).

It seems to me that supporting generic user-defined decorators (including optional syntax in MASM) is not particularly difficult, but would require telling the VM to load one or more dynamic libraries containing the decorator implementations at startup. Each such decorator (say, by implementing a Decorator trait), would declare the set of things it wants to register with the VM, and this is used to detect conflicts in things like emit/trace ids. It adds some complexity to the VM, but provides a much sounder solution than hoping you don't accidentally invoke a decorator because your program contains emit.FOO, and FOO happens to correspond to a specific decorator you don't know about.

Using emit.<id> for this is ostensibly simpler, but as you pointed out, the issue becomes one of distinguishing whether or not any given emit is intending to invoke the decorator, or was meant for some other purpose. Not sure what our options are there - some kind of namespacing might do it, but I'm not sure where/how to represent that (you might be able to do it by reserving bits of the id for that purpose, but you've just shifted the problem to conflicting namespaces instead).

Probably not a critical issue at this juncture though.

@bobbinth
Copy link
Contributor Author

bobbinth commented Sep 8, 2024

@igamigo discovered one other situation where we face a similar problem as described in the original post: error codes for assert instructions. These are technically not decorators but error codes are also not included in MAST computations. The result of this is that sometimes error codes may not be reported correctly. I think there are two ways to deal with this:

  1. Include error codes into MAST similar to how we want to do it for advice injectors.
  2. Include error codes into eq_hash computation.

I think we should go with the second approach because error codes are conceptually similar to debugging info. Basically, even if we strip all error codes from assert operations, the program will still execute correctly, but it might report incorrect errors if errors do happen.

Separately, I think the first approach will be quite non-trivial to implement because there are quite a few instructions which use error codes.

@plafer
Copy link
Contributor

plafer commented Nov 4, 2024

However, we can push this a step further and replace all advice injectors with the Emit >instruction. Specifically, we can interpret event_id as follows:

16 most significant bits identify the event source.
16 least significant bits identify the actual event.

Thinking about this a bit more, I think event source collisions would be all too frequent. Even if all libraries properly choose an event source at random, relying on a probability of collision of 1 / 2^16 seems a bit low. An alternative is to allow 32-bit event source and 32-bit event, and hash them together to get the actual immediate value that we place in the trace. For example, Operation::Emit would look like:

Emit { event_source: u32, event_id: u32 }

where Operation::imm_value() would hash the source and id together and return the least significant 32 bits of the hash.

The MASM Instruction::Emit would be changed similarly, and syntax would become

emit.EVENT_SOURCE.EVENT_NAME

(e.g. where EVENT_SOURCE and EVENT_NAME are constants defined at the top of the file).

As for registering event handlers, I see the processor having an extra BTreeMap<EventSource, Box<dyn EventHandler>> field, where EventSource is a u32 and EventHandler is a trait that allows you to modify the advice provider for an event, e.g.

// (pseudo-code)
trait EventHandler {
  fn on_event(event_id: u32, advice_provider: &mut AdviceProvider);
}

impl Processor {
  // would fail if we already have a handler for that source id
  fn register_event_handler(source_id: SourceId, handler: Box<dyn EventHandler>) -> Result<(), ExecutionError> { ... }
}

I would indeed love to simplify all the Advice*-related code in the VM, to this day it confuses me every time I look something up.

@bobbinth
Copy link
Contributor Author

bobbinth commented Nov 5, 2024

Thinking about this a bit more, I think event source collisions would be all too frequent. Even if all libraries properly choose an event source at random, relying on a probability of collision of 1 / 2^16 seems a bit low. An alternative is to allow 32-bit event source and 32-bit event, and hash them together to get the actual immediate value that we place in the trace.

Agreed that $2^{16}$ is not ideal, but a couple of points:

  • Increasing size of event source/id to 32 bits will indeed help quite a bit. The only potential negative here is that two u32 values won't fit into a Felt. So, we'll need to make either event source or event ID slightly smaller than a u32. This is somewhat annoying, but shouldn't be a big deal.
  • I don't see additional benefit from hashing. Given the same inputs we'll always get the same output - so, hashing doesn't help us reduce collision probabilities here.

Overall, my initial choice of u16 values was somewhat influenced by the fact that network ports are u16 values as well and I thought that we'd be able to detect collisions at the time we try to load libraries into the assembler/VM (though, with vendored libraries that may be difficult). But maybe we should indeed increase the values to u32s.

(if we go with u32s, we should probably also update error codes as they follow a similar scheme).

As for registering event handlers, I see the processor having an extra BTreeMap<EventSource, Box<dyn EventHandler>> field, where EventSource is a u32 and EventHandler is a trait that allows you to modify the advice provider for an event, e.g.

// (pseudo-code)
trait EventHandler {
  fn on_event(event_id: u32, advice_provider: &mut AdviceProvider);
}

impl Processor {
  // would fail if we already have a handler for that source id
  fn register_event_handler(source_id: SourceId, handler: Box<dyn EventHandler>) -> Result<(), ExecutionError> { ... }
}

I like this approach - and this is roughly what I was thinking about for the 3rd step in the "implementation plan" from the original post. We'll need to work through some details here, as for example, on_event() should receive a reference to ProcessState so that it can examine the internal state of the VM (i.e., stack and memory). Also, we'll need to figure out what's the most ergonomic for the user to attach handlers (e.g., do the handlers come with libraries? what happens if the user merges libraries together? etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
air Related to Miden AIR and constraints assembly Related to Miden assembly instruction set Related to Miden VM instruction set processor Related to Miden VM processor
Projects
None yet
Development

No branches or pull requests

3 participants