Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for selectors #125

Closed
Tracked by #76 ...
tohrnii opened this issue Jan 26, 2023 · 10 comments
Closed
Tracked by #76 ...

Add support for selectors #125

tohrnii opened this issue Jan 26, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request v0.3

Comments

@tohrnii
Copy link
Contributor

tohrnii commented Jan 26, 2023

Frequently, we want to apply some set of constraints only if some condition is true. For example, for Rescue Prime constraints, we usually want to apply them only on the first 7 steps of an 8 step cycle. This, of course, could be baked into the constraints themselves, but that would make them much less readable and modular.

So, some syntax which will help with applying selectors could be very helpful. Here is how this could work:

Single selector

First, if we have just a single selector, we could do something like this:

// define a cyclic column consisting of 7 ones and 1 zero.
cycl k0 = [1, 1, 1, 1, 1, 1, 1, 0]

ev foo(main: [a, b, c]):
  enf bar([a, b]) when k0

ev bar(main: [a, b]):
  enf a' = a + b

The above specifies that to all constraints returned from bar evaluator would be multiplied by k0. Specifically, the final constraint would be:

$$ k_0 \cdot (a' - a - b) = 0 $$

We could also have the compiler check whether $k_0$ is binary and output either an error or a warning if it isn't.

Multi selector

While we should be able to describe everything using single selectors, frequently it could be much more convenient to specify that one out of several constraints should apply in a given situation. For this, we could use something like that:

ev foo(main: [a, b, c]):
  // make sure values in a are either ones or zeros
  enf a^2 = a

  match enf:
    bar_add([b, c]) when a
    bar_mul([b, c]) when 1 - a

ev bar_add(main: [a, b]):
  enf a' = a + b

ev bar_mul(main: [a, b]):
  enf a' = a * b

The above would translate into the following constraint:

$$ a \cdot (b' - b - c) + (1 - a) \cdot (b' - b \cdot c) = 0 $$

We could also have the compiler check whether all when conditions are mutually exclusive and output an error or a warning if they are not.

The above methodology would also work if evaluators return different number of constraints. For example:

ev foo(main: [a, b, c]):
  // make sure values in a are either ones or zeros
  enf a^2 = a

  match enf:
    bar_add([b, c]) when a
    bar_mul([b, c]) when 1 - a

ev bar_add(main: [a, b]):
  enf a' = a + b

ev bar_mul(main: [a, b]):
  enf a' = a * b
  enf b' = a

The above would translate to two constraints:

$$ a \cdot (b' - b - c) + (1 - a) \cdot (b' - b \cdot c) = 0 $$

$$ (1 - a) \cdot (c' - b) = 0 $$

And if some constraints are redundant, the compiler can simplify those constraints very easily. For example:

ev foo(main: [a, b, c]):
  // make sure values in a are either ones or zeros
  enf a^2 = a

  match enf:
    bar_add([b, c]) when a
    bar_mul([b, c]) when 1 - a

ev bar_add(main: [a, b]):
  enf a' = a + b
  enf b' = a

ev bar_mul(main: [a, b]):
  enf a' = a * b
  enf b' = a

Would translate to something like:

$$ a \cdot (b' - b - c) + (1 - a) \cdot (b' - b \cdot c) = 0 $$

$$ (c' - b) = 0 $$

Instead of the second constraint being something like:
$$a \cdot (c' - b) + (1 - a) \cdot (c' - b) = 0$$

Originally posted by @bobbinth in 0xPolygonMiden/miden-vm#254 (comment)

@bobbinth
Copy link
Contributor

bobbinth commented Mar 7, 2023

One potential syntax improvement I mentioned in #191 (comment): we could use boolean operators to make the selectors more readable. Specifically, something like:

match enf:
  d_next_agg = delta_c when n0
  d_next_agg = delta_a when !n0 & n1
  d_next_agg = delta_i when !n0 & !n1

Instead of:

match enf:
  d_next_agg = delta_c when n0
  d_next_agg = delta_a when (1 - n0) * n1
  d_next_agg = delta_i when (1 - n0) * (1 - n1)

The boolean operators would be pretty simple:

  • !a transaltes to $(1 - a)$.
  • a & b translates to $a \cdot b$.
  • a | b translates to $a + b - a \cdot b$.

@bobbinth
Copy link
Contributor

One of the things I've been thinking about is how to combine different branches of the match enf statement. For example, let's say we have the following constraints:

match enf:
    op_add(s[0..3]) when c
    op_mul(s[0..3]) when !c

ev op_add([s[3])
  enf s[0]' = s[0] + s[1]
  enf s[1]' = s[2]

ev op_add([s[3])
  enf s[0]' = s[0] * s[1]
  enf s[1]' = s[2]

Naively, the above would generate the following two constraints:

$$ c \cdot (s_0' - (s_0 + s_1)) + (1 - c) \cdot (s_0' - s_0 \cdot s_1) = 0 $$

$$ c \cdot (s_1 '- s_2) + (1 - c) \cdot (s_1' - s_2) = 0 $$

Ideally, however, we'd get something like this:

$$ s_0' - c \cdot (s_0 + s_1) + (1 - c) \cdot s_0 \cdot s_1 = 0 $$

$$ s_1' - s_2 = 0 $$

I'm hoping that these type of optimizations we will be able to do with optimization passes in the IR - though, it might be quite a bit simpler to do them while building an algebraic graph for the selector expression. However, consider the following constraints:

match enf:
    op_add(s[0..3]) when c
    op_mul(s[0..3]) when !c

ev op_add([s[3])
  enf s[0]' = s[0] + s[1]
  enf s[1]' = s[2]

ev op_add([s[3])
  enf s[1]' = s[2]
  enf s[0]' = s[0] * s[1]

Obviously, these are exactly the same constraints, but naive parsing would give us the following expressions:

$$ c \cdot (s_0' - (s_0 + s_1)) + (1 - c) \cdot (s_1' - s_2) = 0 $$

$$ c \cdot (s_1 '- s_2) + (1 - c) \cdot (s_0' - s_0 \cdot s_1) = 0 $$

And I'm actually not sure how easy (or whether at all possible), it would be to reduce them to the optimized form shown previously.

So, I think there is an argument to be made for optimizing these types of expressions at the time when we build an algebraic graph for selector expressions.

One way to do this would be to evaluate all enf expressions at a random point and then compare which of them are equal across different branches. To illustrate this on the example above, let's say we evaluate all enf expressions at point $\alpha$ and we get the following results:

op_add
  s[0]' - (s[0] + s[1]) evaluates to x
  s[1]' - s[2]          evaluates to y

op_mul
  s[0]' - s[0] * s[1]   evaluates to z
  s[1]' - s[2]          evaluates to y

By noticing that two constraints evaluate to the same value (y) it is easy to tell that these constraints can be "paired up" to produce a more optimized expression.

@tohrnii
Copy link
Contributor Author

tohrnii commented Apr 27, 2023

One way to do this would be to evaluate all enf expressions at a random point and then compare which of them are equal across different branches.

We could also keep a constraint expression -> selectors hashmap (Hashmap<NodeIndex, Vec<NodeIndex>>) before inserting the selectors expression to the constraint expression and then check if the key already exists when adding new constraints.

I think there are other cases that might be trickier to simplify like above if the user doesn't write them consistently. For example, consider something like:

match enf:
    enf s[1]' - s[2] = 0 when c & a
    enf (s[1]' - s[2]) * (1 - c) when !a

These should be combined but it may be tricky to combine them. We could of course ignore cases like these and just output the unoptimized form.

@bobbinth
Copy link
Contributor

We could also keep a constraint expression -> selectors hashmap (Hashmap<NodeIndex, Vec<NodeIndex>>) before inserting the selectors expression to the constraint expression and then check if the key already exists when adding new constraints.

I think evaluating at a random point may be more robust as something like x + y = 0 and y + x = 0 could look like two different constraints - but evaluating them on a random point should give the same result.

@bitwalker
Copy link
Contributor

bitwalker commented Jun 30, 2023

Naively, the above would generate the following two constraints

That doesn't sound right - wouldn't you actually have four constraints, because each match arm calls an evaluator which consists of two constraints; and while each match arm is supposed to be mutually exclusive with the others, so two of the constraints at any given point would be effectively no-ops, they all must still be evaluated at every point right?

While it seems possible to combine constraints in such a way that the properties of both are upheld simultaneously, it isn't clear how we could do so as a generic transformation, safely. It would be really important to have a proof of the safety of that transformation, or the risk of accidentally breaking constraints would be really high.

That said, I wonder if this would be an ideal application for e-graphs. Input all of the rewrite rules that we'd want to apply, and are known to be safe, e.g. x * 0 => 0, then for each constraint, obtain it's optimal representation. Then construct an egraph representing the program being compiled (i.e. with all trace columns, inputs, etc. declared as leaf nodes), and check each constraint to see if an equivalent constraint is already present in the graph, and if not, insert it into the graph. At the end, you'd have a set of optimal constraints, where each constraint is unique. I think you can also use egglog to check for contradictory constraints, so we could add that as an additional rule to the egraph.

With that in mind, to me this really calls into question the benefit of enf match, since none of what I just described benefits from using it versus just declaring each constraint independently.

One way to do this would be to evaluate all enf expressions at a random point and then compare which of them are equal across different branches. To illustrate this on the example above, let's say we evaluate all enf expressions at point and we get the following results:

Checking whether match arms are exclusive seems like a good problem for a SAT solver - generate a variable for each trace column, and then ask it to determine if the disjunction of the selectors is satisfiable. If it isn't, then that means none of the selectors are valid (i.e. they are nonsensical expressions like s[0] & !s[0]). If it is satisfiable, then we will get as output a set of solutions, each of which contains the values for each variable for that solution. To determine if the match arms are mutually exclusive, we apply each solution to all of the match arms and see if more than one evaluate to true. If the arms are not mutually exclusive, we raise an error.

For example:

enf match:
    case s[0] & s[1]: ...
    case !s[1] & s[2]: ...

First, we assign each scalar it's own variable:

a = s[0]
b = s[1]
c = s[2]

We'd input the disjunction of the selectors to the solver, i.e. (a AND b) OR ((NOT b) AND c). That will produce the following output:

[
  [a: true, b: true, c: false], # (a AND b)
  [a: true, b: true, c: true],  # (a AND b)
  [a: true, b: false, c: true], # ((NOT b) AND c)
  [a: false, b: false, c: true], # ((NOT b) AND c)
]

If we apply these solutions to each selector, we only ever observe one selector at a time evaluating to true. This tells us the selectors are mutually exclusive, so the match constraint is valid.

But let's consider what would happen if we had the following code instead:

enf match:
    case s[0] & s[1]: ...
    case s[1] & s[2]: ...

The variables we assign would be the same here, but this time, we get the following solutions:

[
  [a: true, b: true, c: false], # (a AND b)
  [a: true, b: true, c: true],  # (a AND b), (b AND c)
  [a: false, b: true, c: true], # (b AND c)
]

When applying the second solution, this time both selectors evaluate to true. This tells us the selectors are not mutually exclusive, which should result in an error being raised.

I think evaluating at a random point may be more robust as something like x + y = 0 and y + x = 0 could look like two different constraints - but evaluating them on a random point should give the same result.

You can't extrapolate equivalance of those expressions based on equality of the results - there could always be values for the variables which produce different results, and the only way to know that is to try all inputs. This is where egraphs really shine, because they are designed to canonicalize/rewrite expressions and determine equivalence based on a set of rules. So in your example, you could add x and y as leaf nodes in the egraph, x + y and y + x as expressions to the egraph, and call union on the latter. You can then identify any expressions which are equivalent based on that rule, e.g. 7 + a and a + 7 would be considered equivalent in the graph. You can further define rewrite rules such as x + 0 => x, or x * 0 => 0, and given something like a + 0, it will give you back a as the "best" equivalent representation.

@bobbinth
Copy link
Contributor

That doesn't sound right - wouldn't you actually have four constraints, because each match arm calls an evaluator which consists of two constraints; and while each match arm is supposed to be mutually exclusive with the others, so two of the constraints at any given point would be effectively no-ops, they all must still be evaluated at every point right?

I guess there is an even more naive way to write them as 4 separate constraints. But they can trivially be combined into two constraints as I described in the comment. Basically, we can have 4 constraints like so:

$$ c \cdot (s_0' - s_0 - s_1) = 0 $$

$$ c \cdot (s_1' - s_1) = 0 $$

$$ (1 - c) \cdot (s_0' - s_0 \cdot s_1) = 0 $$

$$ (1 - c) \cdot (s_1' - s_1) = 0 $$

Or we can have 2 constraints which look like this:

$$ c \cdot (s_0' - (s_0 + s_1)) + (1 - c) \cdot (s_0' - s_0 \cdot s_1) = 0 $$

$$ c \cdot (s_1 '- s_2) + (1 - c) \cdot (s_1' - s_2) = 0 $$

And assuming values of $c$ can be only either $0$ or $1$, both sets of constraints describe an equivalent system (e.g., plug in values $0$ and $1$ instead of $c$, and we'll get exactly the same set of equations in both cases).

While it seems possible to combine constraints in such a way that the properties of both are upheld simultaneously, it isn't clear how we could do so as a generic transformation, safely. It would be really important to have a proof of the safety of that transformation, or the risk of accidentally breaking constraints would be really high.

This is the main reason for having enf match syntax. Assuming that values provided in case clauses are mutually exclusive, information contained in enf match is sufficient to guaranteed correctness of combined constraints.

That said, I wonder if this would be an ideal application for e-graphs. Input all of the rewrite rules that we'd want to apply, and are known to be safe, e.g. x * 0 => 0, then for each constraint, obtain it's optimal representation. Then construct an egraph representing the program being compiled (i.e. with all trace columns, inputs, etc. declared as leaf nodes), and check each constraint to see if an equivalent constraint is already present in the graph, and if not, insert it into the graph. At the end, you'd have a set of optimal constraints, where each constraint is unique. I think you can also use egglog to check for contradictory constraints, so we could add that as an additional rule to the egraph.

I think this could be a separate and very interesting problem to solve - but I'd say that a fully generic solution is much more complicated as compared to what we can do with enf match and enf match gets us to 95% of what we need (not to mention is that it also makes it easier to write and audit constraints).

With that in mind, to me this really calls into question the benefit of enf match, since none of what I just described benefits from using it versus just declaring each constraint independently.

Being able to combine constraints in the way I described above is very important. The reasons is that during STARK protocol, we need to generate a random value for each constraint. So, if we have 4 constraints, we need 4 random values, if we have 2 constraints, we need only 2 random values. Generating and applying these random values is very expensive for the verifier - so, the fewer there are the better.

In the specific case of Miden VM, we'd probably have more than 3000 individual constraints, but if we combine them together as described above, we probably will end up with fewer than 300. At 3000 constraints, the system basically becomes not suitable for our use case, but at 300 it is not too bad.

A good example of this is stack constraints (still described using old syntax). There is about 88 different arms in the match statement. All of these arms are mutually exclusive and all of them have at least 16 constraints (some more). If we were to describe them individually, we'd end up with close to 1,500 constraints. But if we combine them together, we should end up somewhere between 50 and 100. This is a huge (and crucial) improvement.

Checking whether match arms are exclusive seems like a good problem for a SAT solver - generate a variable for each trace column, and then ask it to determine if the disjunction of the selectors is satisfiable. If it isn't, then that means none of the selectors are valid (i.e. they are nonsensical expressions like s[0] & !s[0]). If it is satisfiable, then we will get as output a set of solutions, each of which contains the values for each variable for that solution. To determine if the match arms are mutually exclusive, we apply each solution to all of the match arms and see if more than one evaluate to true. If the arms are not mutually exclusive, we raise an error.

Yes, I think this would be a good problem to tackle in the future. There are other things we'll need to check there as well. For example, the values used in the case statement must be binary (i.e., either 1 or 0) - so, we might want to add a notion of binary types to the type system (as well as rules on how binary types can be enforced).

You can't extrapolate equivalance of those expressions based on equality of the results - there could always be values for the variables which produce different results, and the only way to know that is to try all inputs.

Generally true, but if we draw inputs from large enough domain the probability of failure is negligible. In our case, we can tolerate probability of failure of $2^{-128}$ because the rest of the protocol has an even higher probability of failure. I'm not against more sophisticated analysis to prove complete equivalence - but we can start with random approach, and probably use it in the future to identify expressions which are likely to be equivalent to be verified using more sophisticated techniques.

@bobbinth
Copy link
Contributor

bobbinth commented Jun 30, 2023

You can't extrapolate equivalence of those expressions based on equality of the results - there could always be values for the variables which produce different results, and the only way to know that is to try all inputs.

Generally true, but if we draw inputs from large enough domain the probability of failure is negligible. In our case, we can tolerate probability of failure of 2−128 because the rest of the protocol has an even higher probability of failure. I'm not against more sophisticated analysis to prove complete equivalence - but we can start with random approach, and probably use it in the future to identify expressions which are likely to be equivalent to be verified using more sophisticated techniques.

One other thing to note here: we are not dealing with arbitrary functions here but with polynomials of bounded degree. In fact, the degree of our polynomials will never be greater than $2^{32}$. Assuming we have two polynomials of degree $2^{32}$ if we evaluate them at a random point drawn from a domain of size $2^{128}$, the probability that both will evaluate to the same value on this point is:

$$ \frac{2^{32}}{2^{128}} = 2^{-96} $$

This is because two distinct polynomials of degree $n$ have at most $n - 1$ points in common.

@bitwalker
Copy link
Contributor

I guess there is an even more naive way to write them as 4 separate constraints. But they can trivially be combined into two constraints as I described in the comment. Basically, we can have 4 constraints like so:
...
Or we can have 2 constraints which look like this:
...

So to be clear, I definitely recognize how they can be combined in this specific instance, but that doesn't actually get us to a transformation which is applicable in general, at least as stated so far. In this specific case, we have a simple true/false selector, and both arms of the match constraint express equalities against the same columns - that seems like the absolute best case scenario. In practice, I would imagine selectors will involve sets of columns, with at least some non-overlapping members (e.g. a & b and !b & c); it also seems likely that constraints will express equalities involving different columns.

Consider the following contrived example:

trace_columns: [clk, op[2], flags[2], regs[8]]

integrity_constraints:
    enf match:
        case op[0] & op[1] & flags[0]: foo([regs[0..8]])
        case op[0] & op[1] & !flags[0]: bar([regs[0..8]])
        case !op[0] & op[1]: baz([regs[0..8]])

ev foo([regs[8]]):
    enf regs[0] = 1
    enf regs[1]' = regs[1] + regs[2]

ev bar([regs[8]]):
    enf regs[0] = regs[1]
    enf regs[2]' = regs[2] + regs[3]

ev baz([regs[8]]):
    enf regs[0] = regs[1] + regs[2]
    enf regs[3]' = regs[3] + regs[4]

How does this get combined? To be clear, I'm not saying it can't be, rather it isn't clear to me what the actual transformation is symbolically. I think this is why egraphs end up being a nice solution for this, because we are able to easily implement and reason about the small step semantics of the transformations we wish to apply and can prove the correctness of, and thanks to the egraph formalism, be able to say confidently that the resulting output is itself correct, because the output must be equivalent to the input.

I think this could be a separate and very interesting problem to solve - but I'd say that a fully generic solution is much more complicated as compared to what we can do with enf match and enf match gets us to 95% of what we need (not to mention is that it also makes it easier to write and audit constraints).

Without automated verification, there is a huge safety hole here - while enf match can play a role similar to unsafe in Rust in terms of drawing the attention of an auditor, auditing a match statement with 80+ arms is exceedingly error-prone for someone to do by hand. But putting that aside, my suggestion to use egraphs is less about being a generic solution, and more about solving multiple optimization problems at once, and doing so within a formalism that lets us reason about correctness. If there is a straightforward transformation we can apply to enf match that accomplishes the primary optimization we need and maintains the correctness invariants we care about, then documenting what that is would clear up any confusion/questions I have.

A good example of this is stack constraints

This is a good example of where it isn't possible for us to validate the mutual exclusivity of the selectors. From the perspective of static analysis, each selector is expressed in terms of a different column in op_flags, so you could easily have multiple selectors evaluate to true at the same time. Using a more powerful system based on dependent types or proofs (e.g. Dafny), it is probably possible to statically verify that all of the branches are mutually exclusive, but not with the tools available in AirScript today. So perhaps we just have to assume enf match invariants are upheld by the programmer.

I'm not against more sophisticated analysis to prove complete equivalence - but we can start with random approach, and probably use it in the future to identify expressions which are likely to be equivalent to be verified using more sophisticated techniques.

I think likely shakes out to be about the same amount of effort either way; you either implement an evaluator and evaluate a couple randomly chosen values from the domain, and then act on the results of that evaluation - or you construct an egraph representing the constraints and ask it if the constraints are equivalent.

An alternative, less sophisticated approach which can produce similar results, is to implement a canonicalization pass which performs a series of local transformations that guarantee certain properties of the resulting program. For example, you might canonicalize the ordering of operands for commutative operators by name, so y * x becomes x * y when canonicalized, making it trivial to recognize when two expressions of that type are identical. Constant propagation will ensure that certain things like x * 0 and x + 0 are canonicalized to 0 and x respectively, but you can also implement similar things during canonicalization, e.g. x + x as x * 2, etc. Egraphs are essentially canonicalization on steroids, because you not only get canonical form, but the optimal canonical form, and on more complex expressions, so it is strictly better - but naive canonicalization can still be quite effective when used in conjunction with other optimizations.


To sum up, I think what I'm really looking for are the following:

  • What things can we assume about enf match. For example, you mentioned that in your example c must be either 0 or 1, is that true of all values used in selectors? That does seem to limit the usefulness of selectors, at least in some scenarios, if you can't do something akin to what is being done in stack_constraints to calculate each op_flags element.
  • What are the small step semantics of the transformation being proposed, assuming you have one in mind. For example:

Given the following abstract representation of an enf match statement, where s indicates a selector, and c indicates a constraint:

enf match:
    s0: c0, c1
    ...
    sN: cX,..cY

The transformation starts by combining constraints for each selector:

enf match:
    s0: c0 + c1
    ...
    sN: cX + .. + cY

It is then finalized by combining all of the selectors and their corresponding constraints:

enf (s0 * (c0 + c1)) + .. + (sN * (cX + .. + cY))

I have no idea if that transformation would be correct/safe, but it was just an example to illustrate how we can apply the transformation symbolically without having to reason about specific programs.

@bobbinth
Copy link
Contributor

bobbinth commented Jul 1, 2023

I probably should have explained the methodology of how constraints can be combined much earlier.

What things can we assume about enf match. For example, you mentioned that in your example c must be either 0 or 1, is that true of all values used in selectors? That does seem to limit the usefulness of selectors, at least in some scenarios, if you can't do something akin to what is being done in stack_constraints to calculate each op_flags element.

There are two main rules:

  1. Values which go into selector expressions are always binary (either $0$ or $1$). These values can be combined in a variety of ways using logical operators (which reduce to multiplications and $1 - x$ expressions), but given that inputs are always binary, the outputs are guaranteed to be binary as well.
  2. Only one selector expression can evaluate to $1$ for the whole enf match statement. In other words, selectors are mutually exclusive.

Let's go through the example you gave.

In this example, we base selector expressions on 3 columns: $op_0$, $op_1$, and $flags_0$. This means, that values in these 3 columns must be binary. This is usually enforced by separate constraints. In this specific case, the constraints will be:

enf op[0]^2 = op[0]
enf op[1]^2 = op[1]
enf flags[0]^2 = flags[0]

If these constraint is missing, we should not be able to use these columns in selector expressions. For now, the language does not enforce this, but in the future, we should probably give a warning (or maybe even an error) if we can't determine that all selector inputs are binary.

With the above in mind, the correct way to write your example would be:

integrity_constraints:
    enf op[0]^2 = op[0]
    enf op[1]^2 = op[1]
    enf flags[0]^2 = flags[0]

    enf match:
        case op[0] & op[1] & flags[0]: foo([regs[0..8]])
        case op[0] & op[1] & !flags[0]: bar([regs[0..8]])
        case !op[0] & op[1]: baz([regs[0..8]])

Next, using notation similar to yours, the naive (but still extremely useful) way to combine constraints would be:

Input:

enf match:
    s0: c0_0, c0_1, .., c0_n
    s1: c1_0, c1_1, .., c1_n
    ...
    sN: cN_0, cN_1, .., cN_n

Output:

enf s0 * c0_0 + s1 * c1_0 + .. + sN * cN_0
enf s0 * c0_1 + s1 * c1_1 + .. + sN * cN_1
...
enf s0 * c0_n + s1 * c1_n + .. + sN * cN_n

Basically, we take the first constraint from each arm, multiply them by their respective selectors, and add them together. Then we do the same with the second constraint from each arm etc.

For your specific example, the combined constraints would look like this:

$$ s_0 \cdot (reg_0 - 1) + s_1 \cdot (reg_0 - reg_1) + s_2 \cdot (reg_0 - reg_1 - reg_2) = 0 $$

$$ s_0 \cdot (reg_1' - reg_1 - reg_2) + s_1 \cdot (reg_2' - reg_2 - reg_3) + s_2 \cdot (reg_3' - reg_3 - reg_4) = 0 $$

Where $s_0$, $s_1$, and $s_2$ (introduced primarily for brevity) are the results of selector expressions computed as:

$$ s_0 = op_0 \cdot op_1 \cdot flag_0 $$

$$ s_1 = op_0 \cdot op_1 \cdot (1 - flag_0) $$

$$ s_2 = (1 - op_0) \cdot op_1 $$

Notice that if $op_0$, $op_1$, and $flags_0$ are binary, $s_0$, $s_1$, and $s_2$ are also guaranteed to be binary.

As I mentioned above, this is the naive method of combining constraints. Using evaluation at a random point, can do things even better (though not in this specific case). I'll describe this approach in more detail in a separate comment.

Without automated verification, there is a huge safety hole here - while enf match can play a role similar to unsafe in Rust

Agreed. I think we should try to automate as much as possible, but this will take time. I want to start with something that works assuming the user coded to constraints correctly, and then over time we can add more checks.

This is a good example of where it isn't possible for us to validate the mutual exclusivity of the selectors.

This is actually not that difficult - though, may require additional syntax in the language. Specifically, all selectors are generated by a single compute_op_flags() function. This function takes a vector of 9 binary values (though, currently we don't have a way to enforce that values are binary) and outputs a vector of 88 binary values. We can very easily run through all possible combinations of 9 input values and check that for each combination only a single value is set to $1$ in the output vector (when we described these constraints in Rust, we actually added a test that does basically this).

I'm not saying that this is the best way to perform such checks, but we could do it this way if needed. But again, this is something we'll do in the future - once all the basic things are working.

@bobbinth
Copy link
Contributor

bobbinth commented Jul 2, 2023

Basic implementation done in #328 and the remaining part carried over to #332.

@bobbinth bobbinth closed this as completed Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v0.3
Projects
None yet
Development

No branches or pull requests

4 participants