Skip to content

Commit

Permalink
ast: support dotted heads (open-policy-agent#4660)
Browse files Browse the repository at this point in the history
This change allows rules to have string prefixes in their heads -- we've
come to call them "ref heads".

String prefixes means that where before, you had

    package a.b.c
    allow = true

you can now have

    package a
    b.c.allow = true

This allows for more concise policies, and different ways to structure
larger rule corpuses.

Backwards-compatibility:

- There are code paths that accept ast.Module structs that don't necessarily
  come from the parser -- so we're backfilling the rule's Head.Reference
  field from the Name when it's not present.
  This is exposed through (Head).Ref() which always returns a Ref.

  This also affects the `opa parse` "pretty" output:

  With x.rego as

    package x
    import future.keywords
    a.b.c.d if true
    e[x] if true

  we get

    $ opa parse x rego
    module
     package
      ref
       data
       "x"
     import
      ref
       future
       "keywords"

     rule
      head
       ref
        a
        "b"
        "c"
        "d"
       true
      body
       expr index=0
        true
     rule
      head
       ref
        e
        x
       true
      body
       expr index=0
        true

  Note that

    Name: e
    Key: x

  becomes

    Reference: e[x]

  in the output above (since that's how we're parsing it, back-compat edge cases aside)

- One special case for backcompat is `p[x] { ... }`:

    rule                    | ref   | key | value | name
    ------------------------+-------+-----+-------+-----
    p[x] { ... }            | p     | x   | nil   | "p"
    p contains x if { ... } | p     | x   | nil   | "p"
    p[x] if { ... }         | p[x]  | nil | true  | ""

  For interpreting a rule, we now have the following procedure:

  1. if it has a Key, it's a multi-value rule; and its Ref defines the set:

     Head{Key: x, Ref: p} ~> p is a set
     ^-- we'd get this from `p contains x if true`
         or `p[x] { true }` (back compat)

  2. if it has a Value, it's a single-value rule; its Ref may contain vars:

     Head{Ref: p.q.r[s], Value: 12} ~> body determines s, `p.q.r.[s]` is 12
     ^-- we'd get this from `p.q.r[s] = 12 { s := "whatever" }`

     Head{Key: x, Ref: p[x], Value: 3} ~> `p[x]` has value 3, `x` is determined
                                          by the rule body
     ^-- we'd get this from `p[x] = 3 if x := 2`
         or `p[x] = 3 { x := 2 }` (back compat)

     Here, the Key isn't used, it's present for backwards compatibility: for ref-
     less rule heads, `p[x] = 3` used to be a partial object: key x, value 3,
     name "p"

- The destinction between complete rules and partial object rules disappears.
  They're both single-value rules now.

- We're now outputting the refs of the rules completely in error messages, as
  it's hard to make sense of "rule r" when there's rule r in package a.b.c and
  rule b.c.r in package a.

Restrictions/next steps:

- Support for ref head rules in the REPL is pretty poor so far. Anything that
  works does so rather accidentally. You should be able to work with policies
  that contain ref heads, but you cannot interactively define them.
  
  This is because before, we'd looked at REPL input like

      p.foo.bar = true

  and noticed that it cannot be a rule, so it's got to be a query. This is no
  longer the case with ref heads.

- Currently vars in Refs are only allowed in the last position. This is expected
 to change in the future.

- Also, for multi-value rules, we can not have a var at all -- so the following
  isn't supported yet:

      p.q.r[s] contains t if { ... }

-----

Most of the work happens when the RuleTree is derived from the ModuleTree -- in
the RuleTree, it doesn't matter if a rule was `p` in `package a.b.c` or `b.c.p`
in `package a`.

As such, the planner and wasm compiler hasn't seen that many adaptations:

- We're putting rules into the ruletree _including_ the var parts, so

  p.q.a = 1
  p.q.[x] = 2 { x := "b" }

  end up in two different leaves:

  p
  `-> q
       `-> a = 1
       `-> [x] = 2`

- When planing a ref, we're checking if a rule tree node's children have
  var keys, and plan "one level higher" accordingly:

  Both sets of rules, p.q.a and p.q[x] will be planned into one function
  (same as before); and accordingly return an object {"a": 1, "b": 2}

- When we don't have vars in the last ref part, we'll end up planning
  the rules separately. This will have an effect on the IR.

  p.q = 1
  p.r = 2

  Before, these would have been one function; now, it's two. As a result,
  in Wasm, some "object insertion" conflicts can become "var assignment
  conflicts", but that's in line with the now-new view of "multi-value"
  and "single-value" rules, not partial {set/obj} vs complete.
* planner: only check ref.GroundPrefix() for optimizations

In a previous commit, we've only mapped

    p.q.r[7]

as p.q.r;  and as such, also need to lookup the ref

    p.q.r[__local0__]

via p.q.r

(I think. Full disclosure: there might be edge cases here that are unaccounted
for, but right now, I'm aiming for making the existing tests green...)


New compiler stage:

In the compiler, we're having a new early rewriting step to ensure that the
RuleTree's keys are comparible. They're ast.Value, but some of them cause us
grief:

- ast.Object cannot be compared structurally; so

      _, ok := map[ast.Value]bool{ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")}): true}[ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")})]

  `ok` will never be true here.

- ast.Ref is a slice type, not hashable, so adding that to the RuleTree would
  cause a runtime panic:

      p[y.z] { y := input }

  is now rewritten to

    p[__local0__] { y := input; __local0__ := y.z }

This required moving the InitLocalVarGen stage up the chain, but as it's still
below ResolveRefs, we should be OK.

As a consequence, we've had to adapt `oracle` to cope with that rewriting:

1. The compiler rewrites rule head refs early because the rule tree expects
   only simple vars, no refs, in rule head refs. So `p[x.y]` becomes
   `p[local] { local = x.y }`
2. The oracle circles in on the node it's finding the definition for based
   on source location, and the logic for doing that depends on unaltered
   modules.

So here, (2.) is relaxed: the logic for building the lookup node stack can
now cope with generated statements that have been appended to the rule bodies.


There is a peculiarity about ref rules and extents:

See the added tests: having a ref rule implies that we get an empty object
in the full extent:

    package p
    foo.bar if false

makes the extent of data.p: {"foo": {}}

This is somewhat odd, but also follows from the behaviour we have right now
with empty modules:

    package p.foo
    bar if false

this also gives data.p the extent {"foo": {}}.

This could be worked around by recording, in the rule tree, when a node was
added because it's an intermediary with no values, but only children.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
  • Loading branch information
srenatus authored Oct 14, 2022
1 parent b41fc6b commit 965301f
Show file tree
Hide file tree
Showing 49 changed files with 4,423 additions and 904 deletions.
112 changes: 111 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,117 @@ project adheres to [Semantic Versioning](http://semver.org/).

## Unreleased

### Refs in Rule Heads

With this version of OPA, we can use a shorthand for defining deeply-nested structures
in Rego:

Before, we had to use multiple packages, and hence multiple files to define a structure
like this:
```json
{
"method": {
"get": {
"allowed": true
}
"post": {
"allowed": true
}
}
}
```

```rego
package method.get
default allowed := false
allowed { ... }
```


```rego
package method.post
default allowed := false
allowed { ... }
```

Now, we can define those rules in single package (and file):

```rego
package method
import future.keywords.if
default get.allowed := false
get.allowed if { ... }
default post.allowed := false
post.allowed if { ... }
```

Note that in this example, the use of the future keyword `if` is mandatory
for backwards-compatibility: without it, `get.allowed` would be interpreted
as `get["allowed"]`, a definition of a partial set rule.

Currently, variables may only appear in the last part of the rule head:

```rego
package method
import future.keywords.if
endpoints[ep].allowed if ep := "/v1/data" # invalid
repos.get.endpoint[x] if x := "/v1/data" # valid
```

The valid rule defines this structure:
```json
{
"method": {
"repos": {
"get": {
"endpoint": {
"/v1/data": true
}
}
}
}
}
```

To define a nested key-value pair, we would use

```rego
package method
import future.keywords.if
repos.get.endpoint[x] = y if {
x := "/v1/data"
y := "example"
}
```

Multi-value rules (previously referred to as "partial set rules") that are
nested like this need to use `contains` future keyword, to differentiate them
from the "last part is a variable" case mentioned just above:

```rego
package method
import future.keywords.contains
repos.get.endpoint contains x if x := "/v1/data"
```

This rule defines the same structure, but with multiple values instead of a key:
```json
{
"method": {
"repos": {
"get": {
"endpoint": ["/v1/data"]
}
}
}
}
}
```

## 0.45.0

This release contains a mix of bugfixes, optimizations, and new features.
Expand Down Expand Up @@ -319,7 +430,6 @@ This is a security release fixing the following vulnerabilities:
Note that CVE-2022-32190 is most likely not relevant for OPA's usage of net/url.
But since these CVEs tend to come up in security assessment tooling regardless,
it's better to get it out of the way.

## 0.43.0

This release contains a number of fixes, enhancements, and performance improvements.
Expand Down
106 changes: 105 additions & 1 deletion ast/annotations_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,37 @@ p[v] {v = 2}`,
},
},
},
{
note: "overlapping rule paths (different modules, rule head refs)",
modules: map[string]string{
"mod1": `package test.a
# METADATA
# title: P1
b.c.p[v] {v = 1}`,
"mod2": `package test
# METADATA
# title: P2
a.b.c.p[v] {v = 2}`,
},
expected: []AnnotationsRef{
{
Path: MustParseRef("data.test.a.b.c.p"),
Location: &Location{File: "mod1", Row: 4},
Annotations: &Annotations{
Scope: "rule",
Title: "P1",
},
},
{
Path: MustParseRef("data.test.a.b.c.p"),
Location: &Location{File: "mod2", Row: 4},
Annotations: &Annotations{
Scope: "rule",
Title: "P2",
},
},
},
},
}

for _, tc := range tests {
Expand Down Expand Up @@ -737,6 +768,78 @@ p = 1`,
},
},
},
{
note: "multiple subpackages, refs in rule heads", // NOTE(sr): same as above, but last module's rule is `foo.bar.p` in package `root`
modules: map[string]string{
"root": `# METADATA
# scope: subpackages
# title: ROOT
package root`,
"root.foo": `# METADATA
# title: FOO
# scope: subpackages
package root.foo`,
"root.foo.bar": `# METADATA
# scope: subpackages
# description: subpackages scope applied to rule in other module
# title: BAR-sub
# METADATA
# title: BAR-other
# description: This metadata is on the path of the queried rule, but shouldn't show up in the result as it's in a different module.
package root.foo.bar
# METADATA
# scope: document
# description: document scope applied to rule in other module
# title: P-doc
p = 1`,
"rule": `# METADATA
# title: BAR
package root
# METADATA
# title: P
foo.bar.p = 1`,
},
moduleToAnalyze: "rule",
ruleOnLineToAnalyze: 7,
expected: []AnnotationsRef{
{
Path: MustParseRef("data.root.foo.bar.p"),
Location: &Location{File: "rule", Row: 7},
Annotations: &Annotations{
Scope: "rule",
Title: "P",
},
},
{
Path: MustParseRef("data.root.foo.bar.p"),
Location: &Location{File: "root.foo.bar", Row: 15},
Annotations: &Annotations{
Scope: "document",
Title: "P-doc",
Description: "document scope applied to rule in other module",
},
},
{
Path: MustParseRef("data.root"),
Location: &Location{File: "rule", Row: 3},
Annotations: &Annotations{
Scope: "package",
Title: "BAR",
},
},
{
Path: MustParseRef("data.root"),
Location: &Location{File: "root", Row: 4},
Annotations: &Annotations{
Scope: "subpackages",
Title: "ROOT",
},
},
},
},
{
note: "multiple metadata blocks for single rule (order)",
modules: map[string]string{
Expand Down Expand Up @@ -824,6 +927,7 @@ p = true`,
chain := as.Chain(rule)

if len(chain) != len(tc.expected) {
t.Errorf("expected %d elements, got %d:", len(tc.expected), len(chain))
t.Fatalf("chained AnnotationSet\n%v\n\ndoesn't match expected\n\n%v",
toJSON(chain), toJSON(tc.expected))
}
Expand Down Expand Up @@ -1022,7 +1126,7 @@ func TestAnnotations_toObject(t *testing.T) {
}

func toJSON(v interface{}) string {
b, _ := json.Marshal(v)
b, _ := json.MarshalIndent(v, "", " ")
return string(b)
}

Expand Down
Loading

0 comments on commit 965301f

Please sign in to comment.