Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do-while results in redundant branching instructions when compiling via IR #15442

Open
Philogy opened this issue Sep 19, 2024 · 3 comments
Open
Labels
low impact Changes are not very noticeable or potential benefits are limited. medium effort Default level of effort nice to have We don’t see a good reason not to have it but won’t go out of our way to implement it. optimizer

Comments

@Philogy
Copy link

Philogy commented Sep 19, 2024

Description

When using a do-while control flow statement such as in the following code:

contract Sum {
    function sum(uint256[] calldata nums) public pure returns (uint256 total) {
        if (nums.length == 0) return 0;
        uint256 i = 0;
        do {
            assembly ("memory-safe") {
                total := add(calldataload(add(nums.offset, shl(5, i))), total)
                i := add(i, 1)
            }
        } while (i < nums.length);
    }
}

I expect solidity to generate the relatively straight-forward & maximally efficient assembly:

tag_doWhileBodyEntry:
    <doWhileBody>
    <whileCondition>
    tag_doWhileBodyEntry
    jumpi    

This is the case when compiling with the legacy pipeline but using the IR pipeline gives you roughly the following structure:

    0x01
tag_doWhileEntry:
    iszero
    tag_whileCondition
    jumpi
tag_doWhileBody:
    <doWhileBody>
    0x00
    tag_doWhileEntry
    jump
tag_whileCondition:
    <whileCondition>
    tag_doWhileBody
    jumpi

This structure, while correct is not efficient. Digging deeper the origin of this structure becomes more apparent when looking at the generated IR:

function fun_sum(var_nums_offset, var_nums_length) -> var_total {
    var_total := 0
    if iszero(var_nums_length) {
        var_total := 0
        leave
    }
    let var_i := 0
    let _1 := 1
    for { } 1 { } {
        if iszero(_1) {
            if iszero(lt(var_i, var_nums_length)) { break }
        }
        _1 :=  0
        var_total := add(calldataload(add(var_nums_offset, shl(5, var_i))), var_total)
        var_i := add(var_i, 1)
    }
}

Environment

  • Compiler version: 0.8.27
  • Target EVM version (as per compiler settings): cancun
  • Framework/IDE (e.g. Truffle or Remix): foundry
  • EVM execution environment / backend / blockchain client: viaIR
  • Operating system: macOS

Steps to Reproduce

Compile the following contract with viaIR

contract Sum {
    function sum(uint256[] calldata nums) public pure returns (uint256 total) {
        if (nums.length == 0) return 0;
        uint256 i = 0;
        do {
            assembly ("memory-safe") {
                total := add(calldataload(add(nums.offset, shl(5, i))), total)
                i := add(i, 1)
            }
        } while (i < nums.length);
    }
}
@Philogy Philogy changed the title do-while results in redundant branching instructions when compiling with viaIR do-while results in redundant branching instructions when compiling via IR Sep 19, 2024
@cameel
Copy link
Member

cameel commented Sep 20, 2024

Yes, I can confirm that this is how it works, but unfortunately I don't think we can do much about it, at least in the short term. This a case where Yul not having jumps makes it impossible to implement it more efficiently in a general case.

Since Yul has no native do...while loop, we simulate it with a for loop. Naively one could do it like this:

for {} true {} {
    <body>
    if iszero(<condition>) { break }
}

This should give you the assembly you want, however, it will break if you put continue in the body. The codegen translates it to Yul's continue, which will not jump to the right condition here. The legacy codegen does not have this problem, since it can use arbitrary jumps and just inserts a tag before the right condition, making continue jump there.

To deal with this problem, we generate this instead, which is the reason for the less efficient assembly you get via IR:

let firstRun := 1
for {} true {} {
    if iszero(firstRun) {
        if iszero(<condition>) { break }
    }
    firstRun := 0
    <body>
}

Since this affects only code that contains continue, one option would be to detect its presence and generate the simpler code when it's not there. But I'm not convinced that the extra special-casing is worth it as it goes hard against the general philosophy of keeping the IR codegen simple. It is technically an option, as we did it allow one such case in the past (Optimizer > Unchecked Loop Increment), but this one does not seem to be on the same level in terms of how common it is and what impact it has.

We'd rather do things like this in the optimizer, and we may eventually do it, but we have many potential optimizations in the pipeline and this one does not seem like it would be high on the priority list.

@cameel cameel added optimizer medium effort Default level of effort low impact Changes are not very noticeable or potential benefits are limited. nice to have We don’t see a good reason not to have it but won’t go out of our way to implement it. and removed bug 🐛 labels Sep 20, 2024
@Philogy
Copy link
Author

Philogy commented Sep 27, 2024

but unfortunately I don't think we can do much about it

Why couldn't Yul be expanded/improved?

If the goal of Yul is to be simple to compile then forcing the addition of such specialized passes in the backend seems like it'd be going against that motto? Maybe wrong venue but I'd love to learn more about how the solc team thinks about language & compiler design.

@cameel
Copy link
Member

cameel commented Oct 16, 2024

It could be extended, but at this point we're not too keen on doing that unless there is no other way to address the issue. We want Yul to be a stable target for intermediate compilation. In this particular case the benefits are not that clear-cut. It would help only this single case, and at a relatively high cost of adding a completely new construct that has to be accounted for by every tool that processes the language. Yul is very simple and we want it to stay that way. It's actually already sub-optimal that we ended up with for as the only looping construct rather than much simpler while.

Addressing it in the optimizer is the much preferred option. After talking it over internally, it looks like it might be doable even in the general case - if done as control flow graph simplification rather than as a transformation of Yul code. I added it to our list of Assorted Optimizer Ideas.

Having said that, there's actually a different potential Yul feature we've been considering recently, which might allow us to fix this issue as well: multi-level break statement. This came up in our language research group in the context of translating from some other language to Yul. Something like this has many more useful applications so it might be actually worth it. But we'll see. For now it's just a fresh idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low impact Changes are not very noticeable or potential benefits are limited. medium effort Default level of effort nice to have We don’t see a good reason not to have it but won’t go out of our way to implement it. optimizer
Projects
None yet
Development

No branches or pull requests

2 participants