Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the constant optimizer to make use of PUSH0 #14117

Merged
merged 4 commits into from
Aug 5, 2024

Conversation

axic
Copy link
Member

@axic axic commented Apr 13, 2023

Part of #14073.

This change only affects large copies using codecopy (which are supposedly rare).

(It will be fun changing this code for EOF, i.e. use DATACOPY or replace it entirely with a single DATALOADN.)

@axic axic force-pushed the push0-optimisations branch from b62a72e to 2c1d785 Compare April 13, 2023 00:25
AssemblyItems static copyRoutine{
// back up memory
// mload(0)
Instruction::PUSH0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use u256(0) in these places, but the explicit instruction felt better for readability and ensuring the actualCopyRoutine[3] position is fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should use u256(0) in these places. The constructor of AssemblyItem should technically really assert against the instruction being any push or forward to the Push constructor.
Currently, the convention is "never use push as instruction in AssemblyItem", which was kind of hard to violate since all pushes had push data - but now we have AssemblyItem(u256(0)) != AssemblyItem(Instruction::PUSH0) - and should only ever use the former for consistency and for the other optimization steps to work properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, changed it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I changed to u256(0) a few test cases fail because codecopy does not seem to be selected. That suggest the cost is more than expected, i.e. u256(0) generates larger code than push0. Need to investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekpyron I won't have time to debug this for the next 2 weeks. Feel free to take it over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright - even though it'd be nicer to have it, we don't absolutely need to change this in the immediate release with Shanghai support anyways, so we'll just postpone it for now (we aim to release tomorrow or Wednesday).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this right, codecopy cost is more than expected and thus is not being selected because when computing the byte size, PUSH0 is not taken in account.
At AssemblyItem::bytesRequired, the return value for any PUSHN is at minimum 2 but for PUSH0 should be always 1:

case Push:
return 1 + max<size_t>(1, numberEncodingSize(data()));

libevmasm/ConstantOptimiser.cpp Outdated Show resolved Hide resolved
if (m_params.evmVersion.hasPush0())
{
// This costs ~29 gas.
AssemblyItems static copyRoutine{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I had a bug in this version some tests failed, so the CodeCopyMethod gets triggered, but someone should double check this before merging.

@axic axic force-pushed the push0-optimisations branch 2 times, most recently from 25ac63f to 6c756d2 Compare April 17, 2023 16:44
@axic axic mentioned this pull request Apr 17, 2023
12 tasks
@NunoFilipeSantos NunoFilipeSantos added this to the 0.8.20 milestone Apr 26, 2023
@ekpyron ekpyron modified the milestones: 0.8.20, 0.8.21 May 8, 2023
Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix does look like the right direction to go here. AssemblyItem:::bytesRequired should generally be fixed to account for PUSH0 and we should make sure it gets a correct (or reasonable) EVM version passed in at all times.

libevmasm/AssemblyItem.cpp Outdated Show resolved Hide resolved
libevmasm/ConstantOptimiser.cpp Outdated Show resolved Hide resolved
libevmasm/AssemblyItem.h Outdated Show resolved Hide resolved
Comment on lines 161 to 160
if (m_params.evmVersion.hasPush0())
actualCopyRoutine[3] = _assembly.newData(data);
else
actualCopyRoutine[4] = _assembly.newData(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it would be relatively easy for someone to change copyRoutine() and either forget the number or use a wrong one. I'd add an assert that the item we're replacing is really PushData just to be safe.

Actually, it looks like it would be even better to pass the number via an argument (with the u256(1) << 16 placeholder being the default) and avoid this brittle construct altogether. I wonder if there's any good reason it was done this way.

Copy link
Collaborator

@matheusaaguiar matheusaaguiar Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a parameter to copyRoutine. Since AssemblyItem is forward declared in ConstantOptimiser.h, we need to use a pointer.

libevmasm/ConstantOptimiser.cpp Outdated Show resolved Hide resolved
@matheusaaguiar matheusaaguiar self-assigned this Jun 22, 2023
@r0qs r0qs force-pushed the push0-optimisations branch from 3d2fb1f to d6792cf Compare July 17, 2023 08:47
@NunoFilipeSantos NunoFilipeSantos modified the milestones: 0.8.21, 0.8.22 Jul 17, 2023
@matheusaaguiar matheusaaguiar force-pushed the push0-optimisations branch 2 times, most recently from 9df26c1 to 0831f1b Compare August 24, 2023 18:34
Comment on lines 45 to 47
0x00
dup1
revert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing is that this is no longer de-duplicated with tag_2 below for some reason (may be that the 0x00 is a PUSH0 in one case and a PUSH1 0x00 in the other?). Still a bit strange how the changes in this PR are causing this.

But another thing is: I'd have assumed that the libevmasm/CommonSubexpressionEliminator should actually be clever enough to transform 0x00 dup1 revert to 0x00 0x00 revert, if it gets the correct push0 pricing.

Copy link
Collaborator

@matheusaaguiar matheusaaguiar Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here's what happens in this case. The inliner sees that the cost of inlining is less than the pushtag 2, which was not the case before push0. So the code of tag_2 gets inlined. After that, the peephole optimizer finds that it can replace the code with revert dup1 push0, because its method DoublePush is not prepared to handle push0. Another problem is that the peephole optimizer was able to remove the pushtag 2 , jump next to tag 2, but it cannot do it now, because of the inlining.
Finally, the CSE is not able to transform dup1 push0 to push0 push0. I guess it needs a rule for that in SimplificationRules.
Not sure about what to do regarding the inliner situation.
I will try to make the CSE and peephole optimizer consider push0, although for the latter that depends on getting the evm version in a static function of the optimizer (DoublePush::applySimple).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, that explains things - I hadn't considered that inlining would kick in since assuming a tag size of two bytes PUSH0 PUSH0 REVERT actually now is already the same size as PUSH<tag> JUMP! I mean, letting the inlining happen is just fine then...

The peephole optimizer should definitely consider push0 - and it also looks like some deduplication still doesn't happen correctly (but in cases in which it also didn't happen before), so we don't necessarily need to fix it here directly.

@ekpyron ekpyron modified the milestones: 0.8.22, 0.8.23 Oct 16, 2023
@ekpyron ekpyron removed this from the 0.8.24 milestone Dec 20, 2023
@ekpyron ekpyron mentioned this pull request Dec 20, 2023
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Jun 26, 2024
@matheusaaguiar
Copy link
Collaborator

matheusaaguiar commented Jun 27, 2024

I investigated the last 6 problematic cases of the hardhat tests and all cases are related to the introduction of the DeduplicateNextTagSizeX methods in 9b1792f.
They open up some possible optimizations that result in a different enough generated code that causes the failings in the hardhat stack trace tests since they rely on certain patterns being produced.
The first one that appears on the logs, for example, checks function test with b=false in the following solidity code:

pragma solidity ^0.8.0;

contract C {

  modifier m2(bool b)  {
    require(b);
    _;
  }

  function test(bool b) m1(b) m2(b) public {
    revert();
  }

  modifier m1(bool b)  {
    _;
  }

}

The stack trace expectation shows it expected the revert error raised from the require(b); at line 6:

{
  "transactions": [
    {
      "file": "c.sol",
      "contract": "C"
    },
    {
      "to": 0,
      "params": [false],
      "function": "test",
      "stackTrace": [
        {
          "type": "CALLSTACK_ENTRY",
          "sourceReference": {
            "contract": "C",
            "file": "c.sol",
            "function": "test",
            "line": 10
          }
        },
        {
          "type": "REVERT_ERROR",
          "sourceReference": {
            "contract": "C",
            "file": "c.sol",
            "function": "m2",
            "line": 6
          }
        }
      ]
    }
  ]
}

The code generated by the compiler before this PR contains the part related to that:

    tag_7:
        /* "c.sol":119:120  b */
      dup1
        /* "c.sol":125:126  b */
      dup2
        /* "c.sol":76:77  b */
      dup1
        /* "c.sol":68:78  require(b) */
      tag_10
      jumpi
      0x00
      dup1
      revert
    tag_10:
        /* "c.sol":141:149  revert() */
      0x00
      dup1
      revert

But the code generated with the introduction of this PR is optimized further and removes the asm related to the require, according to these steps:
(Already assuming all revert dup1 0x00 transformed to revert 0x00 0x00 because it is cheaper than dup1)

  1. Remove revert 0x00 0x00 before tag_10 which does exactly the same (PeepholeOptimiser::DeduplicateNextTagSize3)
  2. Remove jumpi tag_10 because now it is right before tag_10. Also note that as a consequence dup1 is also removed since it is the condition of the jumpi (PeepholeOptimiser::JumpToNext)
  3. Remove tag_10 (only the tag itself, not the instructions following it) because its only reference was removed in step 2 (JumpdestRemover::optimise)
  4. The remaining dup2 dup1 before revert 0x00 0x00 are removed (PeepholeOptimiser::OpReturnRevert)

The final result contains only the code corresponding to the revert in line 11:

    tag_7:
        /* "c.sol":141:149  revert() */
      revert(0x00, 0x00)

The require was completely removed from the asm, which makes sense from an optimization point of view, and that causes a mismatch with the expected stack trace.
The other failing tests are similar cases where a specific snippet of code is optimized out while that was not the case before.

Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this comment a soft-approval. The logic looks sound now, all testing issues seem to be resolved. Missing is only changelog entries (for the constant optimiser taking into account push0 and for the peephole optimizer optimizing identical code snippets that terminate if they occur one after the other), and potentially a bit of commit cleanup.

@matheusaaguiar matheusaaguiar force-pushed the push0-optimisations branch 2 times, most recently from 7924c39 to d7fb2a3 Compare July 3, 2024 23:24
ekpyron
ekpyron previously approved these changes Jul 4, 2024
@matheusaaguiar
Copy link
Collaborator

Gas cost benchmarks

ir-no-optimize

project bytecode_size deployment_gas method_gas
brink +0%
colony +0.04% ❌
elementfi +0%
ens +0%
euler
gnosis
gp2 -0.02% ✅
pool-together 0%
uniswap 0%
yield_liquidator +0% +0% 0%
zeppelin

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink 0%
colony +0.02% ❌
elementfi -0%
ens -0% 0% -0%
euler -0%
gnosis
gp2 -0.03% ✅
pool-together +0%
uniswap -0%
yield_liquidator +0.27% ❌ +0.3% ❌ -0.01% ✅
zeppelin -0.01% ✅ -0.01% ✅ +0.13% ❌

ir-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink +0.01% ❌
colony +0.01% ❌
elementfi -0.01% ✅
ens -0% +0.01% ❌ 0%
euler
gnosis
gp2 +0.01% ❌
pool-together +0.01% ❌
uniswap -0.01% ✅
yield_liquidator +0% +0% 0%
zeppelin -0.01% ✅

legacy-no-optimize

project bytecode_size deployment_gas method_gas
brink +0.03% ❌
colony +0.04% ❌
elementfi +0.1% ❌
ens +0.04% ❌
euler +0.04% ❌
gnosis +0.05% ❌
gp2 -0%
pool-together +0.06% ❌
uniswap +0.02% ❌
yield_liquidator +0.04% ❌ +0.04% ❌ -0.01% ✅
zeppelin +0.06% ❌ +3.36% ❌ +0.01% ❌

legacy-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink 0%
colony +0.02% ❌
elementfi +0.01% ❌
ens +0.02% ❌ +0% -0%
euler +0.02% ❌
gnosis 0%
gp2 -0.06% ✅
pool-together +0.01% ❌
uniswap -0%
yield_liquidator 0% +0% -0%
zeppelin -0.02% ✅ -0% +0.07% ❌

legacy-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink 0%
colony +0.01% ❌
elementfi +0.01% ❌
ens +0.02% ❌ -0% -0%
euler +0.04% ❌
gnosis 0%
gp2 -0.05% ✅
pool-together -0%
uniswap -0%
yield_liquidator 0% -0% -0%
zeppelin -0.01% ✅ -0% -0%

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Jul 26, 2024
@matheusaaguiar matheusaaguiar removed the stale The issue/PR was marked as stale because it has been open for too long. label Jul 26, 2024
matheusaaguiar
matheusaaguiar previously approved these changes Aug 1, 2024
@ekpyron ekpyron merged commit 5dbaa13 into develop Aug 5, 2024
72 checks passed
@ekpyron ekpyron deleted the push0-optimisations branch August 5, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

7 participants