Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add print function #3319

Closed
anderseknert opened this issue Mar 26, 2021 · 15 comments
Closed

Add print function #3319

anderseknert opened this issue Mar 26, 2021 · 15 comments

Comments

@anderseknert
Copy link
Member

anderseknert commented Mar 26, 2021

One of the most asked questions on the OPA Slack is how to print values to console. While there are many, often better ways, of debugging Rego policies, the quick feedback loop and the zero ceremony offered by a print/println function is sometimes just the right tool for the job. Since such a function exists in most regular programming languages this is a development/debugging style adopted by many, and naturally they'll want to bring that to OPA and Rego. We should let them!

While there is some support for doing this via the trace built-in already, using it simply for printing values is kind of a hack, and having to learn all the peculiarities like required flags and output format around it as someone who is just learning Rego isn't a great introduction to the language or its tooling.

Some things to consider:

  • Name: print vs println, echo, others? 😄
  • Overloading (see Investigate whether function overloading can support different arities #1921), like print("test", x, 1, y).
  • Output: stdout vs. stderr and how to play nice with OPA logging / decision logging.
  • Allow client application to override behavior when using OPA as a library.
  • Deprecating trace.
  • (ecosystem) Rego playground routing print calls to console.log.
  • (possibly) how to deal with undefined values. Common source of confusion for beginners.

Related: #2957

@srenatus
Copy link
Contributor

💭 Does this issue supersede #2957?

@srenatus
Copy link
Contributor

(ecosystem) Rego playground routing print calls to console.log.

Hrrm I'd rather have a "log output" pane in the playground, I think. Including the browser's dev tools as part of the UI doesn't seem right 😄 I suppose it could be collapsed by default, same as the data pane.

@anderseknert
Copy link
Member Author

Hrrm I'd rather have a "log output" pane in the playground, I think. Including the browser's dev tools as part of the UI doesn't seem right 😄 I suppose it could be collapsed by default, same as the data pane.

Perhaps you are right :) As long as it's dealt with in some way!

@yashtewari
Copy link
Contributor

yashtewari commented Mar 30, 2021

Hi! Since the goal here is debugging, I thought it might be relevant to share a patch for this basic builtin our team has been using for debugging Rego running on our dev/staging OPA servers: yashtewari@040285a

log(object) simply adds an object to the decision log whenever it is evaluated, along with a timestamp and a code location. While this is different from the stated objective of printing to the console, it serves the use case of finding problems in Rego code that is "live" in a staging environment, for example.

The style/usage is analogous to the "print statement" style that you've described. Do you think an option to enrich the decision log can be added in a future print function?

We're happy to help and contribute!

@tsandall
Copy link
Member

@yashtewari welcome back!

log(object) simply adds an object to the decision log whenever it is evaluated, along with a timestamp and a code location.

That's an interesting idea!

Do you think an option to enrich the decision log can be added in a future print function?

I think this relates to #1514 in that we need a way for values generated during evaluation to be included in the decision log.

We'll need to think about this a bit more.

@anderseknert
Copy link
Member Author

anderseknert commented Mar 30, 2021

Yeah, I like the idea of being able to enrich decisions with arbitrary data and I think a log function like that would be valuable. I don't think it necessarily needs to compete with the print built-in described here though. Ideally the print function should be as dumb as possible, requiring no configuration whatsoever and work in all kinds of environments where Rego runs (terminal, web browser, wasm, playground, etc).

@yashtewari
Copy link
Contributor

yashtewari commented Mar 31, 2021

Thanks a lot @tsandall!! And congratulations for OPA being cooler than ever 😁

I think this relates to #1514 in that we need a way for values generated during evaluation to be included in the decision log.

You mean in terms of implementation? Would be nice to see what you guys think about how log is done, maybe not in this thread. As I understand after looking at #1514, these are separate cases to include in the decision log -- 1) data generated during runtime that informs the decision and 2) arbitrary data that the policy author wants to specifically look at.

Ideally the print function should be as dumb as possible

Ah you're right, doing one thing well makes perfect sense. My thinking was that as a print debugger myself, I use log in just the same way that I would use print. Great to know that you like the idea, it's bare bones for now but we'll make it better over time. For example right now we haven't addressed caching behaviour, so there can be unnecessary evaluation and the decision log can get too big.

Thanks for your responses!

@tsandall tsandall changed the title Built-ins: add print function Add print function Sep 14, 2021
@tsandall
Copy link
Member

tsandall commented Sep 14, 2021

Thinking about print support a bit more...

For the sake of progress:

  • Call the function/statement print.
  • Ignore formatting specifications for now (we could add another printf statement later if needed.)
  • Assume that embedders will append newlines to output (i.e., print isn't intended to be used to write arbitrary bytes to a stream... it's intended to be used to output/echo values so that humans can read them).
  • Assume that users will expect variadic argument support.
  • Ignore embedding concerns (stdout/stderr, etc.) for now (i.e., let's only deal with how print is handled by type system, compiler, planner, evaluator, etc. for now.)
  • Ignore trace() for now.

The main issue I see with print is how to handle failures. For example:

p {
  a := 1
  b := 2
  a == b  # if the body is evaluated in-order, this statement will FAIL and the next will never be reached
  print(a, b)
}

If you look at the rule body as an imperative sequence of statements (which is incorrect, but common) then you would expect to see some output because there are no conditional statements. Of course, in Rego, statements in the body of the rule are ANDed together and there is no ordering, so there is no need to execute subsequent statements once a failure is encountered.

If we assume that print arguments are treated the same way as other function calls, then rewriting could make this a common occurence:

p {
  print(input.foo)
}

Gets rewritten to:

p {
  _1 = input.foo
  print(_1)
}

If input.foo is undefined, print(_1) will never execute. Surely users will expect to see output from a rule like the second to last one above.

To deal with this situation, I propose we make print execute on FAIL and EXIT evaluation events. In other words, anytime the rule succeeds or fails, the print statement will be executed. Duplicate messages can be condensed for clarity. This leaves the treatment of undefined values. To deal with that, I propose we treat print like any other built-in function except after plugging the arguments, any unbound variables will be replaced with some value, for example:

p {
  print("the value of input.foo is", input.foo)
}

Given input as {} (i.e., "foo" is missing), the output would be:

the value of input.foo is <undefined>

@timothyhinrichs
Copy link
Member

Seems like we're saying we want

  • no flattening of complex terms that are arguments to print()
  • print() to be a special function that evaluates even if its arguments are undefined. But the return value is undefined if any of its args are undefined.

Do we need some awareness of print() during body reordering so that the body can be reordered between print() statements but not across print() statements? Often people will want to use print() to know the location in the rule body that succeeded during execution. E.g. what do we do with this?

p {
   b := a.foo
   print("got here 1")
   a := input.bar
   print("got here 2")
}

@anderseknert
Copy link
Member Author

@tsandall I totally agree on the problem of undefined values, and really like the idea of letting print still print on fail before exiting.
@timothyhinrichs under what conditions do we currently reorder the body? Is it safe to assume print is used for debugging only, and we can skip reordering if it's encountered in the body, to accommodate that use case you're describing?

@timothyhinrichs
Copy link
Member

Example above is one where reordering will happen. We can't compute b until we compute a. OPA will reorder those rules so that it computes a and then b.

This is likely a corner-case, as I think more about it, because most people will write rules sensibly so they do not require reordering. Though if there are automated tools generating Rego code this could happen. Conceptually it would be fine if we ignored this case for now; however, if we later want to add restrictions that say reject rules like the one above then it would be a backwards incompatible change. Easier to relax restrictions than to add them.

@anderseknert
Copy link
Member Author

Ah, gotcha! Completely missed the dependency there. At least to me it seems fine to just print in the order of evaluation in this case.

@tsandall
Copy link
Member

Do we need some awareness of print() during body reordering so that the body can be reordered between print() statements but not across print() statements? Often people will want to use print() to know the location in the rule body that succeeded during execution. E.g. what do we do with this?

@timothyhinrichs this is interesting and it's not what I had in mind... I think there are two options:

  1. As you described, print arguments can be undefined and print statements are undefined if any of the arguments are undefined. This way, evaluation stops after the print statement if any arguments are undefined. Ordering of the print statement is important. print statements can affect the outcome of evaluation.

  2. print arguments can be undefined but print statements never affect evaluation. In other words, print statements are always executed when the rule exits or a failure occurs. Ordering of the print statement is not important.

If we go with (1) then print statements can be used to determine where evaluation stops. However, it can also cause evaluation to fail somewhat unexpectedly:

p {
  input.x == input.y
  printf("blah blah blah (z=%v)", input.z) # if input.z is undefined, p is undefined
}

Using print statements to discover where evaluation is stopping sounds useful but it's not exactly what I had in mind. What I had in mind was (2) because:

A) It gives users a quick-and-dirty alternative to inspect values during policy evaluation. Users don't have to care where the print statement is placed in the rule. Moreover, print statements can be completely ignored in production environments because they never affect the outcome of policy evaluation.

B) The implementation is quite simple because (i) arguments can be rewritten the same way as other calls (ii) there's no special reordering requirement for safety (iii) the evaluator just has to evaluate print statements on FAIL and EXIT events.

Any thoughts on this alternative option?

@timothyhinrichs
Copy link
Member

Now I understand the original proposal. Love the idea of being able to ignore print()s semantically for prod. I suspected I was missing something. I'd guess that most users will be surprised that print ordering isn't respected. If we want to do that, it feels like we'd want a different syntax that's attached to the rule head somehow--perhaps something in the metadata for a rule? That would indicate more accurately what's going to happen.

Maybe the higher level bit is that over time we've added features that make reordering the rule body more surprising: := and == come to mind immediately. I would bet that most people would even expect to get compile errors if the rule body needs to be reordered for top-down eval. Reordering now feels like a perf optimization that's hidden from users, rather than something that users know about.

@tsandall
Copy link
Member

Fixed by #3868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants