Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast: support dotted heads #4660

Merged
merged 72 commits into from
Oct 14, 2022

Conversation

srenatus
Copy link
Contributor

@srenatus srenatus commented May 4, 2022

With this change, we'll be able to define substructures using refs in rules. Now, we can do this in one file,

foo.rego:

package basket
small_numbers := {1, 2, 3}

fruit.apple.color = "green" { input in small_numbers }

and it will be the same as these two files:

foo.rego:

package basket

small_numbers := {1, 2, 3}

and apple.rego:

package basket.fruit.apple

color = "green" { input in data.basket.small_numbers }

Fixes #4653.

new TODOs 🏗️

  • resolve refs to dotted rules
  • formatter tests and adaptations
  • more docs

@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch 5 times, most recently from 5ab50c2 to dfdd89e Compare May 10, 2022 09:42
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch 2 times, most recently from d82a13a to bc01cd0 Compare May 12, 2022 12:34
@srenatus srenatus marked this pull request as ready for review May 12, 2022 12:51
carabasdaniel
carabasdaniel previously approved these changes May 13, 2022
Copy link
Contributor

@carabasdaniel carabasdaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice addition 👍

johanfylling
johanfylling previously approved these changes May 17, 2022
Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ast/compile.go Outdated
} else if v, ok := ref[0].Value.(Var); ok {
ref = Ref{StringTerm(string(v))}.Concat(ref[1:])
}
if childNode := node.find(ref); childNode != nil {
for _, childMod := range childNode.Modules {
msg := fmt.Sprintf("%v conflicts with rule defined at %v", childMod.Package, rule.Loc())
c.err(NewError(TypeErr, mod.Package.Loc(), msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not point to where the conflicting package was declared? I.e.: childMod.Package.Loc().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct: the compiler is complaining about mod, not about childMod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mod part of the complaint at all? Is it not rule that is in conflict with childMod, and rule just happens to be declared in mod? If so, then shouldn't the error location be either that of rule or childMod.Package, and if we complain about rule, then won't all errors be exactly the same on multiple conflicts?
I might be confused, though 😅 .

ast/compile.go Outdated Show resolved Hide resolved
"data.p.a.b.c": {"r", "s", "y"},
"data.p.a.b": {"x"},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a test case where the rule path's overlap

package a.b.c
r = 1
package a
b.c.r = 2

Or perhaps this is covered somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time linking stuff in this giant PR, but look for this below:

note: "simple: two modules, one using ref head, one package path",

ast/compile.go Outdated
}

return rules
}

func hashMapAdd(rules *util.HashMap, ref Ref, rvs []Var) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also drop duplicate rule names (rvs entries) here, or that is dealt with elsewhere (or doesn't matter at all)?

ast/compile.go Outdated Show resolved Hide resolved
} else if r, ok := lhs.Value.(Ref); ok { // groundness ?
if len(r) == 2 {
// NOTE(sr): interpret this as dot-less partial set rule
return nil, fmt.Errorf("TODO clean up error handling")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work still to be done here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. It'll never surface, but I'll clean up the name.

return nil, fmt.Errorf("%vs cannot be used as rule name", TypeName(r[0].Value))
}

// Ref: set only if we've got 2+ dots in the ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit fuzzy on why we require more than 2 dots.

Copy link
Contributor Author

@srenatus srenatus May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's only one dot, we've got to interpret it as set rule,

foo.bar { true }

is a "shortcut" for

foo["bar"] { true }

for backwards-compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. For some reason, I read this as there needs to be more than 2 dots .. which doesn't track with the subsequent logic 🙄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong here:

foo.bar

is the only shortcut we've got to support.

foo.bar { true }

this never worked. However, it might be the case that we aren't able to disambiguate that at this code location, I'm still getting back into this...

assertHeadsEqual(t, &Head{Name: Var("p")}, &Head{Name: Var("p")})
assertHeadsEqual(t, &Head{Ref: Ref{VarTerm("p"), StringTerm("r")}}, &Head{Ref: Ref{VarTerm("p"), StringTerm("r")}}) // TODO: string for first section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is StringTerm something we still want to support for the first term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my mind 😅


p {
q == 0
data.test.a.b.c.p == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that q can be accessed directly, without the data.test. prefix, within the same module; it's easy to assume that a.b.c.p could be too. But this is not the case, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not the case, right?

Exactly. I think this is in line with how we deal with multiple modules: when there's a mod with package a.b.c and one with in package a, the latter's rego can't use b.c.d to reference rule d in package a.b.c either. But we could discuss this decision!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it makes sense to be able to directly reference something in the same module/file as where it was declared. I can't say for certain, but I think a lot of people would try to ref a.b.c.p before they tried data.test.a.b.c.p.

Someone could of course declare an object in that file a.b := {"c": {"p": 42}}, which would make q := a.b.c.p ambiguous .. But I don't think the data.<module package> prefix of calling q := data.test.a.b.c.p instead actually helps with that, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double-check tomorrow, but some cases are captured by the conflict check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would be able to do that if you did

package test
import data.test.a

a.b.c.p = 1
p {
  a.b.c.p == 1
}

...but this never occurred to me before. I'll have to see if it works or just breaks. 🔍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as-is:

$ opa eval -fpretty -d x.rego data               
{
  "test": {
    "a": {
      "b": {
        "c": {
          "p": 1
        }
      }
    },
    "p": true
  }
}```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation was that a would be visible inside of package test just like any other rule/variable. I realize that the implementation means it doesn't work this way but I think we should change that.

package test

# these should behave the same
x := {"foo": {"bar": {"baz": 7}}}
y["foo"] := {"bar": {"baz": 7}}
z.foo.bar.baz := 7

p {
  x.foo.bar.baz == 7
  y.foo.bar.baz == 7
  z.foo.bar.baz == 7  # currently you can't say this, but IMO, that's confusing
}

I'd rather not require users add an import for this to work. I'll take a look at the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense. Also the same is true for x.foo.bar and x.foo and x (y, z respectively), where the scope of resolving z.foo or z.foo.bar would be broader, as it might include other files' and modules' contributions to the tree.

@srenatus srenatus dismissed stale reviews from johanfylling and carabasdaniel via 7824d1e May 18, 2022 09:34
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch from bc01cd0 to 7824d1e Compare May 18, 2022 09:34
@tsandall
Copy link
Member

The format package seems to be mishandling rule names that are refs:

torin:~/scratch/dotted-heads$ cat ~/x.rego
package x

foo.bar.baz := 7
torin:~/scratch/dotted-heads$ opa fmt ~/x.rego
package x

baz := 7

Comment on lines 281 to 309
For reasons of backwards-compatibility, partial sets containing _strings_ need
to use variables, i.e.

```live:rules/ref_heads/ps_complete:module:read_only
fruit.box["apples"] { true }
```

defines a _complete document rule_ `fruit.box.apples` with value `true`. To define
a partial set rule with a reference head, use

```live:rules/ref_heads/ps_set_of_strings:module:read_only
fruit.box[x] { x := "apples" }
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about the ambiguity between partial sets and the new syntax. I think we need to come up with a different solution to this because I would not want to preclude us from supporting variables in the refs in the future... e.g., foo.bar[x].baz[y] { ... } should define a nested object structure:

foo = {
  "bar": {x0: {"baz": {y0: true, y1: true, ..., yN: true}, x1: {...}, ..., xM: {...}}}
}

@srenatus srenatus marked this pull request as draft June 9, 2022 12:31
@srenatus
Copy link
Contributor Author

E_TOOMANYPRS 😉 I'll re-open this one when the preliminary stuff is in: #4776.

@srenatus srenatus closed this Jun 21, 2022
@srenatus srenatus reopened this Jun 22, 2022
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch 4 times, most recently from 422bbc9 to f3328bf Compare June 28, 2022 20:00
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch 3 times, most recently from d5e1e19 to 5102cff Compare July 1, 2022 09:10
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch from 5102cff to 9d9b856 Compare July 14, 2022 07:50
Nothing consequential here, just changing the structure of the tests. Now, they
each only have the imports they need, and locals for each expectation start
from zero.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
To know that it's the right path, we need to look up the rule refs before
consulting the indexer. We'll do that by using evalVirtual, and using the rule
tree for checking head ref groundness.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
@srenatus srenatus force-pushed the sr/ast/dots-in-the-head branch from f9fa5f6 to b5e6177 Compare October 10, 2022 09:05
@srenatus srenatus merged commit 965301f into open-policy-agent:main Oct 14, 2022
@srenatus srenatus deleted the sr/ast/dots-in-the-head branch October 14, 2022 08:16
byronic pushed a commit to byronic/opa that referenced this pull request Oct 17, 2022
This change allows rules to have string prefixes in their heads -- we've
come to call them "ref heads".

String prefixes means that where before, you had

    package a.b.c
    allow = true

you can now have

    package a
    b.c.allow = true

This allows for more concise policies, and different ways to structure
larger rule corpuses.

Backwards-compatibility:

- There are code paths that accept ast.Module structs that don't necessarily
  come from the parser -- so we're backfilling the rule's Head.Reference
  field from the Name when it's not present.
  This is exposed through (Head).Ref() which always returns a Ref.

  This also affects the `opa parse` "pretty" output:

  With x.rego as

    package x
    import future.keywords
    a.b.c.d if true
    e[x] if true

  we get

    $ opa parse x rego
    module
     package
      ref
       data
       "x"
     import
      ref
       future
       "keywords"

     rule
      head
       ref
        a
        "b"
        "c"
        "d"
       true
      body
       expr index=0
        true
     rule
      head
       ref
        e
        x
       true
      body
       expr index=0
        true

  Note that

    Name: e
    Key: x

  becomes

    Reference: e[x]

  in the output above (since that's how we're parsing it, back-compat edge cases aside)

- One special case for backcompat is `p[x] { ... }`:

    rule                    | ref   | key | value | name
    ------------------------+-------+-----+-------+-----
    p[x] { ... }            | p     | x   | nil   | "p"
    p contains x if { ... } | p     | x   | nil   | "p"
    p[x] if { ... }         | p[x]  | nil | true  | ""

  For interpreting a rule, we now have the following procedure:

  1. if it has a Key, it's a multi-value rule; and its Ref defines the set:

     Head{Key: x, Ref: p} ~> p is a set
     ^-- we'd get this from `p contains x if true`
         or `p[x] { true }` (back compat)

  2. if it has a Value, it's a single-value rule; its Ref may contain vars:

     Head{Ref: p.q.r[s], Value: 12} ~> body determines s, `p.q.r.[s]` is 12
     ^-- we'd get this from `p.q.r[s] = 12 { s := "whatever" }`

     Head{Key: x, Ref: p[x], Value: 3} ~> `p[x]` has value 3, `x` is determined
                                          by the rule body
     ^-- we'd get this from `p[x] = 3 if x := 2`
         or `p[x] = 3 { x := 2 }` (back compat)

     Here, the Key isn't used, it's present for backwards compatibility: for ref-
     less rule heads, `p[x] = 3` used to be a partial object: key x, value 3,
     name "p"

- The destinction between complete rules and partial object rules disappears.
  They're both single-value rules now.

- We're now outputting the refs of the rules completely in error messages, as
  it's hard to make sense of "rule r" when there's rule r in package a.b.c and
  rule b.c.r in package a.

Restrictions/next steps:

- Support for ref head rules in the REPL is pretty poor so far. Anything that
  works does so rather accidentally. You should be able to work with policies
  that contain ref heads, but you cannot interactively define them.

  This is because before, we'd looked at REPL input like

      p.foo.bar = true

  and noticed that it cannot be a rule, so it's got to be a query. This is no
  longer the case with ref heads.

- Currently vars in Refs are only allowed in the last position. This is expected
 to change in the future.

- Also, for multi-value rules, we can not have a var at all -- so the following
  isn't supported yet:

      p.q.r[s] contains t if { ... }

-----

Most of the work happens when the RuleTree is derived from the ModuleTree -- in
the RuleTree, it doesn't matter if a rule was `p` in `package a.b.c` or `b.c.p`
in `package a`.

As such, the planner and wasm compiler hasn't seen that many adaptations:

- We're putting rules into the ruletree _including_ the var parts, so

  p.q.a = 1
  p.q.[x] = 2 { x := "b" }

  end up in two different leaves:

  p
  `-> q
       `-> a = 1
       `-> [x] = 2`

- When planing a ref, we're checking if a rule tree node's children have
  var keys, and plan "one level higher" accordingly:

  Both sets of rules, p.q.a and p.q[x] will be planned into one function
  (same as before); and accordingly return an object {"a": 1, "b": 2}

- When we don't have vars in the last ref part, we'll end up planning
  the rules separately. This will have an effect on the IR.

  p.q = 1
  p.r = 2

  Before, these would have been one function; now, it's two. As a result,
  in Wasm, some "object insertion" conflicts can become "var assignment
  conflicts", but that's in line with the now-new view of "multi-value"
  and "single-value" rules, not partial {set/obj} vs complete.
* planner: only check ref.GroundPrefix() for optimizations

In a previous commit, we've only mapped

    p.q.r[7]

as p.q.r;  and as such, also need to lookup the ref

    p.q.r[__local0__]

via p.q.r

(I think. Full disclosure: there might be edge cases here that are unaccounted
for, but right now, I'm aiming for making the existing tests green...)

New compiler stage:

In the compiler, we're having a new early rewriting step to ensure that the
RuleTree's keys are comparible. They're ast.Value, but some of them cause us
grief:

- ast.Object cannot be compared structurally; so

      _, ok := map[ast.Value]bool{ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")}): true}[ast.NewObject([2]*ast.Term{ast.StringTerm("foo"), ast.StringTerm("bar")})]

  `ok` will never be true here.

- ast.Ref is a slice type, not hashable, so adding that to the RuleTree would
  cause a runtime panic:

      p[y.z] { y := input }

  is now rewritten to

    p[__local0__] { y := input; __local0__ := y.z }

This required moving the InitLocalVarGen stage up the chain, but as it's still
below ResolveRefs, we should be OK.

As a consequence, we've had to adapt `oracle` to cope with that rewriting:

1. The compiler rewrites rule head refs early because the rule tree expects
   only simple vars, no refs, in rule head refs. So `p[x.y]` becomes
   `p[local] { local = x.y }`
2. The oracle circles in on the node it's finding the definition for based
   on source location, and the logic for doing that depends on unaltered
   modules.

So here, (2.) is relaxed: the logic for building the lookup node stack can
now cope with generated statements that have been appended to the rule bodies.

There is a peculiarity about ref rules and extents:

See the added tests: having a ref rule implies that we get an empty object
in the full extent:

    package p
    foo.bar if false

makes the extent of data.p: {"foo": {}}

This is somewhat odd, but also follows from the behaviour we have right now
with empty modules:

    package p.foo
    bar if false

this also gives data.p the extent {"foo": {}}.

This could be worked around by recording, in the rule tree, when a node was
added because it's an intermediary with no values, but only children.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
srenatus added a commit to srenatus/opa that referenced this pull request Nov 28, 2022
With the introduction of ref heads in open-policy-agent#4660, the planned IR
still mostly worked, but it was bypassing the CallDynamic
optimization when it shouldn't have.

This commit re-works some of the rule planning to more robustly
handle ref heads.

Also adds a few test cases to get a grip on what should and
should not happen.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
srenatus added a commit that referenced this pull request Nov 28, 2022
With the introduction of ref heads in #4660, the planned IR
still mostly worked, but it was bypassing the CallDynamic
optimization when it shouldn't have.

This commit re-works some of the rule planning to more robustly
handle ref heads.

Also adds a few test cases to get a grip on what should and
should not happen.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
philipaconrad added a commit to philipaconrad/opa that referenced this pull request Dec 21, 2022
This commit fixes a panic from a utility function in the `opa test`
codepath. After the ref-heads change in open-policy-agent#4660, this particular
function could be fed a ref that it didn't know how to work with,
such as the innocuous line `a[0] := 1`, and it would then panic.

This was fixed by returning a dummy value instead of panic'ing.

Fixes: open-policy-agent#5496

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
philipaconrad added a commit that referenced this pull request Dec 21, 2022
This commit fixes a panic from a utility function in the `opa test`
codepath. After the ref-heads change in #4660, this particular
function could be fed a ref that it didn't know how to work with,
such as from the innocuous line `a[0] := 1`, and it would then panic.

This was fixed by returning a dummy value instead of panic'ing.

Fixes: #5496

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
philipaconrad added a commit to philipaconrad/opa that referenced this pull request Dec 21, 2022
…gent#5497)

This commit fixes a panic from a utility function in the `opa test`
codepath. After the ref-heads change in open-policy-agent#4660, this particular
function could be fed a ref that it didn't know how to work with,
such as from the innocuous line `a[0] := 1`, and it would then panic.

This was fixed by returning a dummy value instead of panic'ing.

Fixes: open-policy-agent#5496

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
(cherry picked from commit a08934d)
philipaconrad added a commit that referenced this pull request Dec 21, 2022
This commit fixes a panic from a utility function in the `opa test`
codepath. After the ref-heads change in #4660, this particular
function could be fed a ref that it didn't know how to work with,
such as from the innocuous line `a[0] := 1`, and it would then panic.

This was fixed by returning a dummy value instead of panic'ing.

Fixes: #5496

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
(cherry picked from commit a08934d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ast: support using refs as rules (or "dots in rule heads")
5 participants