[codegen] Enforce Re-Entrancy Protections by Default (Solidity `0.9.X`) #13845

alex-ppg · 2023-01-04T18:33:53Z

Abstract

As the security space around EVM-based smart contracts matures, we can observe a recurring pattern in security vulnerabilities across all EVM spectrums; a significant portion of them arise from re-entrancy vulnerabilities,

With this issue, we aim to introduce a built-in re-entrancy check across all functions of a generated contract by default, permitting programmers to explicitly mark their functions as re-entrant via the newly introduced reentrant keyword, being declared akin to override and co.

Motivation

Rationale

Re-entrancy attacks are one of the most common root causes of multi-million dollar exploits we can observe by going through historical exploits. Additionally, the concept of a re-entrancy is very hard to grasp when coming from a traditional programming background due to the unique nature of the EVM.

Solidity is an evolving language that attempts to cater to the wider EVM development community and has historically introduced tools to aid in developing using the language more securely, such as built-in arithmetic checks introduced in 0.8.X.

With this change, we aim to introduce built-in re-entrancy protections by default with the ability to bypass these protections explicitly, empowering seasoned developers with maximum flexibility while protecting newcomers from EVM-related caveats they may not be aware of.

Proposal

The proposed keyword (reentrant) is meant to mark a function explicitly re-entrant. As a result, code generation of Solidity would need to introduce a breaking change that will cause the "entrypoint" of the bytecode to evaluate the reentrancy flag.

For the proposal to function properly, a NON_REENTRANT_FLAG_OFFSET compiler-literal would need to be introduced that signifies a storage slot's offset that is meant to indicate the re-entrant flag that is validated. This offset should be preferrably located in the upper-half of the type(uint256).max range that a smart contract's storage slot space supports to ensure no conflicts with existing implementations.

Keyword vs. Existing Syntax

Upon additional feedback from @pcaversaccio and issue #12996, I would like to add some additional insight as to why a new keyword was chosen over the existing syntax. The original issue revolved around the concept of a new keyword and switched over to the idea of using unchecked to perform external calls without triggering any re-entrancy safety checks.

The unchecked keyword is meant to be utilized in the locale it is declared in (i.e. an upper-most unchecked block will not affect the statements of internal calls it makes), as such, such a solution is not viable if we want the new re-entrancy feature to be compatible with existing programming paradigms such as inheritance.

We are faced with either breaking the existing behaviour of unchecked to apply to internal call chains or introducing a new keyword. The latter appears to be more explicit and easier to grasp for security auditors and developers alike, however, feedback is appreciated.

Example Showcase

To illustrate how the generated bytecode would be altered, let us take a subset of the WETH9 contract:

contract MockWETH9 {
    event  Deposit(address indexed dst, uint wad);

    mapping (address => uint) public  balanceOf;
    
    fallback() public payable {
        deposit();
    }
    
    function deposit() public payable {
        balanceOf[msg.sender] += msg.value;
        emit Deposit(msg.sender, msg.value);
    }
}

The above contract contains two functions that do mutate the state (fallback & deposit) as well as one function that does not mutate the state (balanceOf).

Its compilation with current tools would result in the following bytecode in pseudo-code format:

contract MockWETH9Bytecode {
    function main() {
        memory[0x40:0x60] = 0x60;
        
        // Fallback Function
        if (msg.data.length < 0x04) {
        label_00AF:
            var var0 = 0x00b7;
            deposit();
            stop();
        } else {
            // Extract Function Signature
            var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;

            // Signature Comparison
            if (var0 == 0xd0e30db0) {
                // Pseudo-Code of deposit()
                var1 = 0x03d2;
                deposit();
                stop();
            } else if (var0 == 0x70a08231) {
                // Pseudo-Code of balanceOf(address)

                if (msg.value) { revert(memory[0x00:0x00]); }
            
                var1 = 0x02cc;
                var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
                var2 = balanceOf(var2);
                var temp17 = memory[0x40:0x60];
                memory[temp17:temp17 + 0x20] = var2;
                var temp18 = memory[0x40:0x60];
                return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
            } else { goto label_00AF; }
        }
    }

    function deposit() {
        memory[0x00:0x20] = msg.sender;
        memory[0x20:0x40] = 0x03;
        var temp0 = keccak256(memory[0x00:0x40]);
        storage[temp0] = storage[temp0] + msg.value;
        var temp1 = memory[0x40:0x60];
        memory[temp1:temp1 + 0x20] = msg.value;
        var temp2 = memory[0x40:0x60];
        log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
    }

    function balanceOf(var arg0) returns (var arg0) {
        memory[0x20:0x40] = 0x03;
        memory[0x00:0x20] = arg0;
        return storage[keccak256(memory[0x00:0x40])];
    }
}

Given that the compiler can detect which parts of the generated bytecode will mutate the state and which will not (based on the view / pure keywords), we have two cases of code generation:

No reentrant Functions Defined
n > 1 reentrant Functions Defined

No `reentrant` Function Case

For the first case, the bytecode generation module would inject a blanket check at the beginning of main that would validate the re-entrant state of the contract. Afterwards, the bytecode generation module would introduce an assignment to the re-entrant flag solely in the if-else clauses that execute code mutating the state.

Identifying the correct points of injection should be trivial as the compiler is already aware of which functions mutate the state via the view and pure keywords. To illustrate how the bytecode generated would look like, let us take the first case with the original MockWETH9 smart contract code:

contract MockWETH9Bytecode {
    function main() {
        memory[0x40:0x60] = 0x60;

        // INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy
        if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }
        
        // Fallback Function
        if (msg.data.length < 0x04) {
        label_00AF:
            // INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
            storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;

            var var0 = 0x00b7;
            deposit();

            // INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
            storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;

            stop();
        } else {
            // Extract Function Signature
            var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;

            // Signature Comparison
            if (var0 == 0xd0e30db0) {
                // Pseudo-Code of deposit()

                // INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
                storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;

                var1 = 0x03d2;
                deposit();

                // INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
                storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;

                stop();
            } else if (var0 == 0x70a08231) {
                // Pseudo-Code of balanceOf(address)
                // Code Injection not necessary as function cannot mutate state

                if (msg.value) { revert(memory[0x00:0x00]); }
            
                var1 = 0x02cc;
                var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
                var2 = balanceOf(var2);
                var temp17 = memory[0x40:0x60];
                memory[temp17:temp17 + 0x20] = var2;
                var temp18 = memory[0x40:0x60];
                return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
            } else { goto label_00AF; }
        }
    }

    function deposit() {
        memory[0x00:0x20] = msg.sender;
        memory[0x20:0x40] = 0x03;
        var temp0 = keccak256(memory[0x00:0x40]);
        storage[temp0] = storage[temp0] + msg.value;
        var temp1 = memory[0x40:0x60];
        memory[temp1:temp1 + 0x20] = msg.value;
        var temp2 = memory[0x40:0x60];
        log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
    }

    function balanceOf(var arg0) returns (var arg0) {
        memory[0x20:0x40] = 0x03;
        memory[0x00:0x20] = arg0;
        return storage[keccak256(memory[0x00:0x40])];
    }
}

The bytecode generator can further optimize the gas cost of the injection by performing the re-entrant flag assignment conditionally via a temporary variable which will hold the value of storage[NON_REENTRANT_FLAG_OFFSET] that is evaluated at the very start of the main block.

`reentrant` Function Case

This case would simply require the blanket check in the main code block showcased above to be relocated to all if-else bodies that do NOT have the reentrant modifier set. To illustrate how the reentrant keyword would be used, let us adjust the original MockWETH9 code to now permit re-entrancy solely for the deposit function:

contract MockWETH9 {
    event  Deposit(address indexed dst, uint wad);

    mapping (address => uint) public  balanceOf;
    
    fallback() public payable {
        deposit();
    }
    
    function deposit() public payable reentrant {
        balanceOf[msg.sender] += msg.value;
        emit Deposit(msg.sender, msg.value);
    }
}

The bytecode generation would look like the following:

contract MockWETH9Bytecode {
    function main() {
        memory[0x40:0x60] = 0x60;
        
        // Fallback Function
        if (msg.data.length < 0x04) {
        label_00AF:
            // INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy, else set flag
            if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }
            else { storage[NON_REENTRANT_FLAG_OFFSET] = 0x02; }

            var var0 = 0x00b7;
            deposit();

            // INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
            storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;

            stop();
        } else {
            // Extract Function Signature
            var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;

            // Signature Comparison
            if (var0 == 0xd0e30db0) {
                // Pseudo-Code of deposit()

                // INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
                storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;

                var1 = 0x03d2;
                deposit();

                // INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
                storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;

                stop();
            } else if (var0 == 0x70a08231) {
                // Pseudo-Code of balanceOf(address)

                // INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy
                if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }

                if (msg.value) { revert(memory[0x00:0x00]); }
            
                var1 = 0x02cc;
                var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
                var2 = balanceOf(var2);
                var temp17 = memory[0x40:0x60];
                memory[temp17:temp17 + 0x20] = var2;
                var temp18 = memory[0x40:0x60];
                return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
            } else { goto label_00AF; }
        }
    }

    function deposit() {
        memory[0x00:0x20] = msg.sender;
        memory[0x20:0x40] = 0x03;
        var temp0 = keccak256(memory[0x00:0x40]);
        storage[temp0] = storage[temp0] + msg.value;
        var temp1 = memory[0x40:0x60];
        memory[temp1:temp1 + 0x20] = msg.value;
        var temp2 = memory[0x40:0x60];
        log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
    }

    function balanceOf(var arg0) returns (var arg0) {
        memory[0x20:0x40] = 0x03;
        memory[0x00:0x20] = arg0;
        return storage[keccak256(memory[0x00:0x40])];
    }
}

A yet-to-be defined behaviour arises if we declare the fallback function as reentrant when it invokes the deposit function which has not been declared so. To ensure maximal compatibility with existing programming paradigms, we believe that the reentrant keyword should mark a function as re-entrant regardless of its internal call chain. As a result, if we have the following code:

contract MockWETH9 {
    event  Deposit(address indexed dst, uint wad);

    mapping (address => uint) public  balanceOf;
    
    fallback() public payable reentrant {
        deposit();
    }
    
    function deposit() public payable {
        balanceOf[msg.sender] += msg.value;
        emit Deposit(msg.sender, msg.value);
    }
}

The function fallback will be re-entrant even if it invokes deposit which we have not marked so. This ensures compatibility with libraries / smart contract dependencies as otherwise users who wish to set their functions as reentrant deliberately would have to reflect that modifier to the full call-chain. Additionally, given that the introduction of the keyword is a concious and deliberate choice by the developer(s), we consider them to be fully aware of the implications of the reentrant keyword.

Advanced Usage

We are aware that re-entrancy is indeed desirable in a set of limited use cases, the most common being proxy implementations that follow a fragmented logic pattern and thus invoke themselves externally (i.e. Diamond standard cross-facet invocations). To accommodate for these implementations, we propose the introduction of an argument to the reentrant keyword similarly to how arguments are present for the override keyword.

In detail, we advise the introduction of a single, optional address argument which marks a function as reentrant but solely for a particular address. In the case of the Diamond standard, for example, we can introduce the reentrant(address(this)) syntax to ensure that the facets of the Diamond can invoke each other without compromising the wider security guarantees of the system.

Additionally, this syntax permits complex smart contract systems that are meant to invoke one-another mid-execution to still function post-0.9.X securely. Multi-address support can be introduced, however, it should be delayed until a sufficient use-case is illustrated by the development community that cannot be solved by better programming practices.

Specification

While the specification of how the new reentrant keyword will operate can be extracted from the above text, we would also like to highlight which sections of the official Solidity documentation would require adjustments to accommodate for this change. Reference specification can be produced upon request for all chapters outlined below should this feature request gain traction.

Contracts Section

A new "Re-Entrancy" chapter would need to be introduced that describes how re-entrancy behaves post-0.9.X (in that it is prohibited) and how developers can make use of the reentrant keyword to bypass this security measure. A warning chapter should be introduced as well ensuring that the developers are well aware of the security guarantees they are nullifying by using the keyword.

Cheatsheet

The Modifiers section would need to be expanded with the new reentrant keyword and how it is meant to be used.

Language Grammar

An identical rule to the override specifier would need to be introduced specifying how the new reentrant keyword is meant to be parsed when reading Solidity code using machines.

Layout of State Variables In Storage

This chapter should specify the newly reserved NON_REENTRANT_FLAG_OFFSET as a matter of specification. Overwriting the storage area of the flag via overlap or a storage slot hash collision due to the usage of upgradeable patterns and standards such as EIP-1967 should be of negligible concern with a likelihood akin to that of general hash collisions.

Solidity v0.9.0 Breaking Changes

This chapter should, as its namesake indicates, highlight the breaking change of how reentrant behaves and how contracts compiled in pragma solidity ^0.9.0 will have re-entrancy protections enforced by default.

Backwards Compatibility

As the code generation's behaviour will change to enforce re-entrancy protection by default, this is a breaking change requiring a minor semver bump to prompt developers to get accustomed to the new security measure.

General Concerns

Security Bypass

Given that the change illustrated by this issue would rely on the storage space of the smart contract, developers will be able to explicitly unset the re-entrant flag via assembly blocks that access the low-level nature of the EVM and write to the NON_REENTRANT_FLAG_OFFSET storage slot. Such code is considered malicious by nature and should be flagged by auditors as well as potential static analyzers that aid them.

Bytecode & Gas Increase

The issue attempts to explain the proposed change in the Solidity language in a way that minimizes the gas footprint as well as bytecode size impact. Nevertheless, both of these numbers will increase for all contracts compiled beyond 0.9.X.

We believe the security guarantees achieved by this change to be worth the extra units of gas and size, which is evidenced by the developer community itself via the common usage of libraries implementing this trait such as ReentrancyGuard by OpenZeppelin.

Resources

The pseudo-code of the bytecode was generated by the ethervm decompiler and was consequently manually edited to illustrate the smaller subset of MockWETH9 as well as the adjustments that reentrant and post-0.9.X compilation would cause.

The text was updated successfully, but these errors were encountered:

pcaversaccio · 2023-01-04T22:12:09Z

Linking my issue here for obvious reasons #12996.

alex-ppg · 2023-01-04T23:51:53Z

Thanks @pcaversaccio for the additional context, updated original issue with additional background.

cameel · 2023-01-06T21:12:36Z

Unfortunately, I don't think this is the right approach at the language level. Reentrancy is an important problem but there are multiple ways to deal with it and I think that this particular solution is too opinionated towards only one of them. For example:

It mandates a single reentrancy lock for the whole contract, even though in some use cases more granularity might be needed.
It makes assumptions about how much flexibility is needed - i.e. specifying an address but only a single one, set at compilation time.
It defines one specific place to put the flag for all contracts (NON_REENTRANT_FLAG_OFFSET), which is bound to create conflicts.
Storage may not necessarily end up being the best location to store the flag. If we implemented it now, we'd have to use it but there are several pending EIPs (EIP-1153, EIP-5283) that might provide better and cheaper locations and changing it after the fact would be problematic.

Making these decisions for the user is perfectly fine for a library like ReentrancyGuard but I don't think it's acceptable for a generic language mechanism. We're generally moving away from enshrining these kinds of opinionated mechanisms in the language. Some of them may become part of the standard library but this is really too high-level even for the one we're currently designing.

In fact, I'd say that for use cases where this mechanism would be a good fit using ReentrancyGuard is a better approach. It does pretty much what you specified but is also a part of a library rather than the language itself so you can extend it or swap it for something completely different if it does not match your use case. The biggest difference is that it cannot make functions non-reentrant by default. If there's anything to address at the language level it's this. We actually have #4990, with which you'd be able to apply a modifier to a whole contract to make sure it's not omitted by accident. What do you think about solving it this way?

I'm going to close the issue since I don't think we'll implement it the way presented there. This of course does not close the discussion on the topic, though the forum is probably a better place for that. Overall, I don't think we have consensus to go for any particular solution right now so we need more discussion and feedback from the community. Feel free to also drop in on one of our team calls if you want to discuss possible solutions with the team.

alex-ppg · 2023-01-06T22:09:04Z

Thanks a lot for the detailed feedback @cameel. Here are a couple of thoughts:

Multiple reentrancy locks being utilized within the same transaction that are separate and required seems like a very limited use case (if at all present as it seems moreso a design flaw). Additionally, nonReentrant modifiers (which is the most common application method of re-entrancy guards) do utilize a single mutex per contract. The new language would also help clearly define areas of the code that are meant to be reentrant (i.e. onERC721Received hooks that are marked reentrant).
As address(this) is an instruction (rather than literal), the intention was to use a dynamic evaluation rather than a literal, which could include input addresses, storage addresses, and any such feature. Its syntax is similar to how modifier implementations can accept arguments. Additionally, this feature was solely introduced to cater to Diamond patterns and other proxy-related re-entrancy patterns which are one of the few real use cases of re-entrancy being deliberate. This type of flexibility can be omitted from the first introduction of the keyword as it affects the complexity of introducing it.
This is indeed a significant flaw I do see in the EIP. However, it could be solved by simply utilizing the last available storage slot or any other similar "flag" whose conflict would be low (users would be well aware of how the flag works and as such would not deliberately use a hash resulting in the 0xFF...FF key)
While I do see how these EIPs could affect it (transient storage being the best way to actually implement this), I fail to see how it could become problematic. The EVM is an evolving standard and as such opcodes are bound to be introduced and/or changed which in turn would require changes to the Solidity compiler. Moving on from SLOAD / SSTORE instructions to a different paradigm does not seem like a breaking change to me (even more so it would free up one additional slot for this particular implementation).

After inspecting issue #4990, its proposed solution would be inadequate. The issue lies in that we need conditional behavior for re-entrancy guards to work properly across both non-mutating and mutating functions of a contract, as showcased in the exhibit above. As such, I think this exhibit is separate from #4990 for which it was closed for.

Read-only re-entrancy vulnerabilities are real and have occurred in the past, including a now-patched vulnerability in Curve Finance as well as the Market.XYZ hack which resulted from code suffering from the same flaw.

I understand the aversion to this change, however, it would benefit the developer community significantly and increase the collective bar of security, in turn increasing the adoption of the Solidity language (as well as smart contract development) even more.

cameel closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2023

cameel added breaking change ⚠️ language design Any changes to the language, e.g. new features labels Jan 6, 2023

cameel mentioned this issue Jan 6, 2023

Feature to apply modifier to all functions #4990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codegen] Enforce Re-Entrancy Protections by Default (Solidity `0.9.X`) #13845

[codegen] Enforce Re-Entrancy Protections by Default (Solidity `0.9.X`) #13845

alex-ppg commented Jan 4, 2023 •

edited

Loading

pcaversaccio commented Jan 4, 2023

alex-ppg commented Jan 4, 2023

cameel commented Jan 6, 2023 •

edited

Loading

alex-ppg commented Jan 6, 2023 •

edited

Loading

[codegen] Enforce Re-Entrancy Protections by Default (Solidity 0.9.X) #13845

[codegen] Enforce Re-Entrancy Protections by Default (Solidity 0.9.X) #13845

Comments

alex-ppg commented Jan 4, 2023 • edited Loading

Abstract

Motivation

Rationale

Proposal

Keyword vs. Existing Syntax

Example Showcase

No reentrant Function Case

reentrant Function Case

Advanced Usage

Specification

Contracts Section

Cheatsheet

Language Grammar

Layout of State Variables In Storage

Solidity v0.9.0 Breaking Changes

Backwards Compatibility

General Concerns

Security Bypass

Bytecode & Gas Increase

Resources

pcaversaccio commented Jan 4, 2023

alex-ppg commented Jan 4, 2023

cameel commented Jan 6, 2023 • edited Loading

alex-ppg commented Jan 6, 2023 • edited Loading

[codegen] Enforce Re-Entrancy Protections by Default (Solidity `0.9.X`) #13845

[codegen] Enforce Re-Entrancy Protections by Default (Solidity `0.9.X`) #13845

alex-ppg commented Jan 4, 2023 •

edited

Loading

No `reentrant` Function Case

`reentrant` Function Case

cameel commented Jan 6, 2023 •

edited

Loading

alex-ppg commented Jan 6, 2023 •

edited

Loading