Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handlers refactoring #185

Merged
merged 14 commits into from
Jul 1, 2024
Merged

Handlers refactoring #185

merged 14 commits into from
Jul 1, 2024

Conversation

hadley
Copy link
Member

@hadley hadley commented Jun 26, 2024

This is the culmination of all the evaluate() refactoring I've been working on — we can now define the handlers once (instead of once per top-level expression) and evaluate_tle() becomes sufficiently simple that we can inline it, making the double-loop strategy more clear.

@hadley hadley requested review from lionel- and cderv June 26, 2024 21:40
@@ -1,5 +1,7 @@
# evaluate (development version)

* The `source` output handler is now parsed the entire top-level expression, not just the first component.
* `evaluate()` will now terminate on the first error in a top-level expression. This matches R's own behaviour more closely.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have actually changed this behaviour when I started using restarts, but it's now tested and documented. This is a bug that umpire works around in https://github.com/rstudio/umpire/blob/main/R/evaluate.R#L6-L11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this terminology quite confusing because to me 1\n2 and 1;2 both contain two top-level expressions. I.e. 1;2 is not a single expression.

parse(text = "1; 2")
#> expression(1, 2)

parse(text = "1\n 2")
#> expression(1, 2)

Can we improve the terms used for this? Maybe "parser inputs"? Parser inputs are broken down by line by the R REPL, so 1;2 is one input containing two TLE and 1\n2 is two inputs containing each one TLE?

To put it another way a top-level expression should correspond to one iteration of the evaluation loop rather than multiple iterations. Each TLE produces one piece of printed output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say 1;2 is one top-level expression consisting of two expressions. In your definition, what's the difference between a TLE and an expression?

I'd say each TLE generates one source statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether an expression is top-level is a property of where it's evaluated, at the top-level evaluation loop. It certainly makes sense to call 1;2 "top-level" but I find it confusing to also call it an "expression" because it's not an R expression stricto sensu. An expression is something that can be evaluated and thus must be representable as an AST node or leaf.

You could argue that 1;2 is parsed as an EXPRSXP vector and that you can evaluate it with the R-level eval() function, but I think it's the C-level function that should guide meaning here. And for the C-level function, EXPRSXP is a literal.

From this point of view foo(bar) consists of two expressions with bar nested in foo(bar). Whereas 1; 2 is not an expression but a sequence of two expressions managed by a top-level evaluation loop.

I'd say each TLE generates one source statement.

Sorry I'm not sure what that means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see where you're coming from. I'm going to merge this PR but I'll keep thinking about the vocab.

NEWS.md Outdated Show resolved Hide resolved
@@ -17,8 +17,23 @@ watchout <- function(handler = new_output_handler(),
push <- function(value) {
output[i] <<- list(value)
i <<- i + 1

switch(output_type(value),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watcher is now in charge of calling the handler when we push an output onto the stack.

@@ -67,6 +80,22 @@ watchout <- function(handler = new_output_handler(),
capture_output()
}

print_value <- function(value, visible) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird to have this here, but the watcher is the one object that has all the details to handle this correctly.

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love seeing this getting simpler and simpler!

NEWS.md Outdated Show resolved Hide resolved
@@ -1,5 +1,7 @@
# evaluate (development version)

* The `source` output handler is now parsed the entire top-level expression, not just the first component.
* `evaluate()` will now terminate on the first error in a top-level expression. This matches R's own behaviour more closely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this terminology quite confusing because to me 1\n2 and 1;2 both contain two top-level expressions. I.e. 1;2 is not a single expression.

parse(text = "1; 2")
#> expression(1, 2)

parse(text = "1\n 2")
#> expression(1, 2)

Can we improve the terms used for this? Maybe "parser inputs"? Parser inputs are broken down by line by the R REPL, so 1;2 is one input containing two TLE and 1\n2 is two inputs containing each one TLE?

To put it another way a top-level expression should correspond to one iteration of the evaluation loop rather than multiple iterations. Each TLE produces one piece of printed output.

tests/testthat/test-conditions.R Show resolved Hide resolved
R/conditions.R Outdated Show resolved Hide resolved
R/conditions.R Outdated Show resolved Hide resolved
@hadley
Copy link
Member Author

hadley commented Jun 29, 2024

A little info about speed. The I estimate that the CRAN version of evaluate adds about ~700µm of overhead to each TLE. (For reference eval + parse takes about 3µs) The current main branch brings that down to ~500µs, and this branch brings it down to ~400µm. Obviously unlikely to make much difference in practice, but it's nice that these changes also make evaluate a bit faster.

overhead <- function(x, n) {
  x <- rep(x, n)

  df <- bench::mark(
    evaluate(x),
    eval(parse(text = x)),
    time_unit = "us",
    check = FALSE
  )[1:3]
  df[2:3] <- df[2:3] / n
  df
}

overhead("1 + 1", 10)

@hadley hadley merged commit d8f00ea into main Jul 1, 2024
13 checks passed
@hadley hadley deleted the handlers-update branch July 1, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants