Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a RewriteControlFlow phase (return/continue/break) #393

Closed
1 of 2 tasks
W95Psp opened this issue Dec 20, 2023 · 14 comments · Fixed by #907 or #988
Closed
1 of 2 tasks

Add a RewriteControlFlow phase (return/continue/break) #393

W95Psp opened this issue Dec 20, 2023 · 14 comments · Fixed by #907 or #988
Assignees
Labels
engine Issue in the engine P1 Max priority

Comments

@W95Psp
Copy link
Collaborator

W95Psp commented Dec 20, 2023

We want a phase that eliminates explicit control flow (say a return or a break) by moving or duplicating code.

We want to rewrite the control flow so that every return, break or continue always appears in a return position in our AST. Returns in return positions can be dropped: e.g. if ... then {return 3;} else {return 5;} can be simplified in if ... then {3} else {5}.

  • rewrite control flow for returns appearing outside of a loop (this was PR Add RewriteControlFlow phase. #907);
  • rewrite control flow for return, break and continue within a loop.

See Details below for a lengthy explanation.

Details

Question marks (but also return and friends) used to be transformed in a monadic style.

In this style, computations are promoted to a Result or Option monad, and pure and bind nodes are inserted heavily.

For instance, let's consider the following function:

fn test(x: Option<u8>) -> Option<u8> {
    Some(f(x?) + 4)
}

This would be transformed to this monadic F* function, where (let*) is basically a bind node in the option monad:

let test (x: option u8) : option u8 =
  run_monad (
    let* hoist1:u8 = x in
      Some
      (let hoist2:u8 = f hoist1 in
        let hoist3:u8 = hoist2 + 4 in
        Some hoist3 <: option u8)
      <:
      option (option u8))

While this is a correct translation, lifting everything to a monad and running the monad is a bit heavy.

Instead, the control flow of the Rust function test can be rewritten into:

match x {
   Some(x) => Some(f(x) + 4),
   None => None
}

Rewriting control flow is about pushing ?s to return positions. When ? appears deep in a if/match node itself nested in a sequential expression (e.g. a sequence or a function call), then such rewriting can result in code duplication. Example:

fn fun(cond: bool, x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    let a = if b > 20 { x? + long_code } else { 0 };
    let b = other_long_code;
    Some(a + b + 3)
}

fn fun_rewrite_step(x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    if b > 20 {
        let a = x? + long_code;
        let b = other_long_code;
        Some(a + b + 3)
    } else {
        let b = other_long_code;
        Some(a + b + 3)
    };
}

fn fun_rewrited(x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    if b > 20 {
        match x {
            Some(x) => {
                let a = x? + long_code;
                let b = other_long_code;
                Some(a + b + 3)
            }
            None => None,
        }
    } else {
        let b = other_long_code;
        Some(a + b + 3)
    }
}
@W95Psp W95Psp added the engine Issue in the engine label Dec 20, 2023
@W95Psp W95Psp added the P1 Max priority label Dec 21, 2023
@W95Psp W95Psp changed the title Better support for question marks Add a "RewriteControlFlow" phase (?/return/continue/break...) Jan 29, 2024
@W95Psp W95Psp self-assigned this Jan 29, 2024
@W95Psp
Copy link
Collaborator Author

W95Psp commented Apr 30, 2024

PR #559 made some progress toward this issue, but we need to improve it to close this.

@maximebuyse
Copy link
Contributor

maximebuyse commented Sep 19, 2024

So if I understand well, to complete this issue we need this RewriteControlFlow phase that would come before DropNeedlessReturn and essentially deal with returns inside ifs (potentially duplicating some code).

Let's consider (a simplified version of) the example from the issue description:

fn question_mark(c: bool, x: Option<i32>) -> Option<i32> {
    let a = if c {x? + 10} else {0};
    let b = 20;
    Some(a + b)
}

And an example written with a return inside two ifs:

fn test(c1: bool, c2: bool) -> i32 {
    if c1 {
        if c2 {
            return 1;
        }
    }
    2
}

They should be rewritten respectively as:

fn question_mark_equivalent(c: bool, x: Option<i32>) -> Option<i32> {
    if c {
      let a = x? + 10;
      let b = 20;
      Some(a + b)
    } else {
       let a = 0;
       let b = 20;
       Some(a + b)
    };
    
}
fn test_equivalent(c1: bool, c2: bool) -> i32 {
    if c1 {
        if c2 {
            return 1;
        } else {
            2
        }
    } else {
        2
    }
}

The crucial rule to rewrite like this would be recognizing patterns like:

let lhs = if c then a (else b);
e

and rewrite it to:

if c {
  let lhs = a;
  e
} else {
 let lhs =  b; 
  e
}

In case there is no else originally we would insert it.
And this would be done recursively, only if there is a return statement somewhere in one of the branches of the if.

To avoid recursing twice, we would need to pass the following expression top-down in the recursive calls (in an argument) and propagate the information of whether there is a return statement bottom-up (in the return value of the recursive function). We should also take care to avoid inserting an expression after a return statement (otherwise it would not be simplified by drop_needless_returns.

@maximebuyse
Copy link
Contributor

We also need another phase (or a second visitor inside drop_needless_returns) to rewrite

let a = return b;
c

as

return b

As they can be nested it needs its own visitor.

@franziskuskiefer
Copy link
Member

I thought #907 is only partially implementing this?

@maximebuyse
Copy link
Contributor

I thought #907 is only partially implementing this?

We have a lot of issues around this. My understanding is that #907 implements this one. But it is only a part of this meta-issue: #15 . And the main part that is now missing is #196

@franziskuskiefer
Copy link
Member

ok, @W95Psp please clean up and close what's done 🙏🏻

@W95Psp W95Psp changed the title Add a "RewriteControlFlow" phase (?/return/continue/break...) Add a "RewriteControlFlow" phase (return/continue/break) Oct 2, 2024
@W95Psp W95Psp changed the title Add a "RewriteControlFlow" phase (return/continue/break) Add a RewriteControlFlow phase (return/continue/break) Oct 2, 2024
@maximebuyse
Copy link
Contributor

I am designing the part about loops. First attempt:

  • Apply the current RewriteControlFlow phase inside loops bodies but treat break and continue like return. After that, all of them should be in "return position" in the loop body.
  • Add an enum in Rust_primitives.Hax with 3 variants (break, continue and return) each carrying a value (the enum should be polymorphic to allow carrying values of any type)
  • Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of break and continue the value they should contain is the accumulator (loops are translated as a fold). A value in return position without any of these statements should be wrapped in a continue
  • Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a continue, and just pass along a break or return
  • Wrap the whole fold inside a pattern matching to extract the result. in case we find the result wrapped in a break or continue we should just unwrap it. In case we find a return we should return, that means we should also treat this new pattern matching with the current RewrtieControlFlow phase to make sure this return is treated like the others outside loops (of course we can analyze first to know if the body of the loop contains a return or not).

For all this to work well, I suspect that we should treat the body of the loop after phase LocalMutation. It might be better also for the rest of RewriteControlFlow. This means we should also move DropNeedlessReturns but there might be a reason they come in the other order.

@maximebuyse
Copy link
Contributor

I am designing the part about loops. First attempt:

* Apply the current `RewriteControlFlow` phase inside loops bodies but treat `break` and `continue` like `return`. After that, all of them should be in "return position" in the loop body.

* Add an enum in `Rust_primitives.Hax` with 3 variants (`break`, `continue` and `return`) each carrying a value (the enum should be polymorphic to allow carrying values of any type)

* Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of `break` and `continue` the value they should contain is the accumulator (loops are translated as a `fold`). A value in return position without any of these statements should be wrapped in a `continue`

* Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a `continue`, and just pass along a `break` or `return`

* Wrap the whole `fold` inside a pattern matching to extract the result. in case we find the result wrapped in a `break` or `continue` we should just unwrap it. In case we find a `return` we should return, that means we should also treat this new pattern matching with the current `RewrtieControlFlow` phase to make sure this `return` is treated like the others outside loops (of course we can analyze first to know if the body of the loop contains a `return` or not).

For all this to work well, I suspect that we should treat the body of the loop after phase LocalMutation. It might be better also for the rest of RewriteControlFlow. This means we should also move DropNeedlessReturns but there might be a reason they come in the other order.

A few additions:

  • I think we should rewrite the control flow in RewriteControlFlow, and also remove everything that may come after a break or continue (like we already do for return). Then LocalMutation will possibly introduce expressions after them that we can reuse to know the value we should pass.
  • We might want, instead of inserting a pattern matching in the loop body, to define a specific fold that would do the pattern matching. We will still need the pattern matching on the result of the loop (in case there is a return, otherwise we could also have a specific fold that would extract the result out of its break or continue wrapper).
  • In case there are continue but no break or return we don't need any wrapping
  • In the case of nested loops, break and continue may contain a label that allows to choose which loop they refer to. I think we should not support that (at least for now).

@maximebuyse
Copy link
Contributor

Here is an example of the rewriting:

fn f (v: &[i32]) -> i32 {
    let mut sum = 0;
    for i in v {
        if *i < 0 {
            return 0
        }
        sum += *i;
    }
    sum *= 2;
    sum
}

After the transformation (here with a classic fold but we will define a specific fold that will do this internal pattern matching on the accumulator, and we will also use std::ops::ControlFlow instead of this CF enum):

fn double_sum_eq (v: &[i32]) -> i32 {
    let sum = 0;
    match (
        v.fold(|sum, i| {
            match sum {
                CF::Continue(sum) => {
                    if *i < 0 {
                        CF::Return(0)
                    } else {
                        let sum = sum + *i;
                        CF::Continue(sum)
                    }
                },
                _ => sum
            }
            
        },
        sum
        )
    ) {
        CF::Return(v) => v,
        CF::Break(v) | CF::Continue(v) =>
         {
            let sum = v;
            let sum = sum * 2;
            sum
         }
    }
}

@franziskuskiefer
Copy link
Member

@maximebuyse @W95Psp what's the state here?

@W95Psp
Copy link
Collaborator Author

W95Psp commented Oct 17, 2024

We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).

@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.

Thus there is two things to do:

  • clean up / make sure the PR is up-to-date and ready in terms of engine phases (@maximebuyse I think you were not certain about the order of phases?);
  • write F*.

@maximebuyse
Copy link
Contributor

We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).

@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.

Thus there is two things to do:

* [ ]  clean up / make sure the PR is up-to-date and ready in terms of engine phases ([@maximebuyse](https://github.com/maximebuyse) I think you were not certain about the order of phases?);[ ]  write F*.

I am quite happy with the order of phases now. But for sure there will be some cleanup to do, and we might change a bit the code we emit while writing the f*.

@franziskuskiefer
Copy link
Member

@franziskuskiefer
Copy link
Member

#1056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine Issue in the engine P1 Max priority
Projects
None yet
3 participants