Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ast: support dotted heads (open-policy-agent#4660)
This change allows rules to have string prefixes in their heads -- we've come to call them "ref heads". String prefixes means that where before, you had package a.b.c allow = true you can now have package a b.c.allow = true This allows for more concise policies, and different ways to structure larger rule corpuses. Backwards-compatibility: - There are code paths that accept ast.Module structs that don't necessarily come from the parser -- so we're backfilling the rule's Head.Reference field from the Name when it's not present. This is exposed through (Head).Ref() which always returns a Ref. This also affects the `opa parse` "pretty" output: With x.rego as package x import future.keywords a.b.c.d if true e[x] if true we get $ opa parse x rego module package ref data "x" import ref future "keywords" rule head ref a "b" "c" "d" true body expr index=0 true rule head ref e x true body expr index=0 true Note that Name: e Key: x becomes Reference: e[x] in the output above (since that's how we're parsing it, back-compat edge cases aside) - One special case for backcompat is `p[x] { ... }`: rule | ref | key | value | name ------------------------+-------+-----+-------+----- p[x] { ... } | p | x | nil | "p" p contains x if { ... } | p | x | nil | "p" p[x] if { ... } | p[x] | nil | true | "" For interpreting a rule, we now have the following procedure: 1. if it has a Key, it's a multi-value rule; and its Ref defines the set: Head{Key: x, Ref: p} ~> p is a set ^-- we'd get this from `p contains x if true` or `p[x] { true }` (back compat) 2. if it has a Value, it's a single-value rule; its Ref may contain vars: Head{Ref: p.q.r[s], Value: 12} ~> body determines s, `p.q.r.[s]` is 12 ^-- we'd get this from `p.q.r[s] = 12 { s := "whatever" }` Head{Key: x, Ref: p[x], Value: 3} ~> `p[x]` has value 3, `x` is determined by the rule body ^-- we'd get this from `p[x] = 3 if x := 2` or `p[x] = 3 { x := 2 }` (back compat) Here, the Key isn't used, it's present for backwards compatibility: for ref- less rule heads, `p[x] = 3` used to be a partial object: key x, value 3, name "p" - The destinction between complete rules and partial object rules disappears. They're both single-value rules now. - We're now outputting the refs of the rules completely in error messages, as it's hard to make sense of "rule r" when there's rule r in package a.b.c and rule b.c.r in package a. Restrictions/next steps: - Support for ref head rules in the REPL is pretty poor so far. Anything that works does so rather accidentally. You should be able to work with policies that contain ref heads, but you cannot interactively define them. This is because before, we'd looked at REPL input like p.foo.bar = true and noticed that it cannot be a rule, so it's got to be a query. This is no longer the case with ref heads. - Currently vars in Refs are only allowed in the last position. This is expected to change in the future. - Also, for multi-value rules, we can not have a var at all -- so the following isn't supported yet: p.q.r[s] contains t if { ... } ----- Most of the work happens when the RuleTree is derived from the ModuleTree -- in the RuleTree, it doesn't matter if a rule was `p` in `package a.b.c` or `b.c.p` in `package a`. As such, the planner and wasm compiler hasn't seen that many adaptations: - We're putting rules into the ruletree _including_ the var parts, so p.q.a = 1 p.q.[x] = 2 { x := "b" } end up in two different leaves: p `-> q `-> a = 1 `-> [x] = 2` - When planing a ref, we're checking if a rule tree node's children have var keys, and plan "one level higher" accordingly: Both sets of rules, p.q.a and p.q[x] will be planned into one function (same as before); and accordingly return an object {"a": 1, "b": 2} - When we don't have vars in the last ref part, we'll end up planning the rules separately. This will have an effect on the IR. p.q = 1 p.r = 2 Before, these would have been one function; now, it's two. As a result, in Wasm, some "object insertion" conflicts can become "var assignment conflicts", but that's in line with the now-new view of "multi-value" and "single-value" rules, not partial {set/obj} vs complete. * planner: only check ref.GroundPrefix() for optimizations In a previous commit, we've only mapped p.q.r[7] as p.q.r; and as such, also need to lookup the ref p.q.r[__local0__] via p.q.r (I think. Full disclosure: there might be edge cases here that are unaccounted for, but right now, I'm aiming for making the existing tests green...) New compiler stage: In the compiler, we're having a new early rewriting step to ensure that the RuleTree's keys are comparible. They're ast.Value, but some of them cause us grief: - ast.Object cannot be compared structurally; so _, ok := map[ast.Value]bool{ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")}): true}[ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")})] `ok` will never be true here. - ast.Ref is a slice type, not hashable, so adding that to the RuleTree would cause a runtime panic: p[y.z] { y := input } is now rewritten to p[__local0__] { y := input; __local0__ := y.z } This required moving the InitLocalVarGen stage up the chain, but as it's still below ResolveRefs, we should be OK. As a consequence, we've had to adapt `oracle` to cope with that rewriting: 1. The compiler rewrites rule head refs early because the rule tree expects only simple vars, no refs, in rule head refs. So `p[x.y]` becomes `p[local] { local = x.y }` 2. The oracle circles in on the node it's finding the definition for based on source location, and the logic for doing that depends on unaltered modules. So here, (2.) is relaxed: the logic for building the lookup node stack can now cope with generated statements that have been appended to the rule bodies. There is a peculiarity about ref rules and extents: See the added tests: having a ref rule implies that we get an empty object in the full extent: package p foo.bar if false makes the extent of data.p: {"foo": {}} This is somewhat odd, but also follows from the behaviour we have right now with empty modules: package p.foo bar if false this also gives data.p the extent {"foo": {}}. This could be worked around by recording, in the rule tree, when a node was added because it's an intermediary with no values, but only children. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
- Loading branch information