Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying storage locations #597

Open
pipermerriam opened this issue May 25, 2016 · 82 comments
Open

Allow specifying storage locations #597

pipermerriam opened this issue May 25, 2016 · 82 comments
Labels
language design :rage4: Any changes to the language, e.g. new features must have eventually Something we consider essential but not enough to prevent us from releasing Solidity 1.0 without it.
Milestone

Comments

@pipermerriam
Copy link
Member

Inline assembly has now made fully up upgradable contracts possible. One of the main hangups with this is that the storage locations have to stay the same across upgrades. Would it be possible to introduce support for specifying the storage locations for storage variables?

@VoR0220
Copy link
Member

VoR0220 commented May 25, 2016

not so!

@VoR0220
Copy link
Member

VoR0220 commented May 25, 2016

See Nick Johnson's Library on upgradeability :)

https://gist.github.com/Arachnid/4ca9da48d51e23e5cfe0f0e14dd6318f#file-upgradeable-sol

@chriseth
Copy link
Contributor

Especially with contract upgrades in mind, wouldn't it be better to copy the storage layout and "disable" unused state variables by e.g. prefixing them? Otherwise I don't see how you would practically verify that the storage layout is consistent between upgrades.

@pipermerriam
Copy link
Member Author

Is there documentation on how storage layout is determined?

On Wed, May 25, 2016, 1:54 AM chriseth notifications@github.com wrote:

Especially with contract upgrades in mind, wouldn't it be better to copy
the storage layout and "disable" unused state variables by e.g. prefixing
them? Otherwise I don't see how you would practically verify that the
storage layout is consistent between upgrades.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#597 (comment)

@chriseth
Copy link
Contributor

@pipermerriam
Copy link
Member Author

Ok, so after reading up on storage layouts...

contract MyContractV1 {
    uint a;
    bytse32 b;
}

In this example, a should be stored in slot 0 and b in slot 1.

Now, consider I upgrade it to the following.

contract MyContractV2 {
    int c;
    uint a;
    bytes32 b;
}

This would end up with c stored in slot 0, a in 1, and b in 2 which would break things.

So, instead, I propose being able to do the following.

contract MyContractV2 {
    int c;
    uint a @ 0x0;
    bytes32 b @ 0x1;
}

The solidity compiler would see that a and b are designated for storage slots 0 and 1 respectively, and would then place c at the next available location, slot 2.

Does that make sense? Is this possible?

@axic
Copy link
Member

axic commented May 25, 2016

I was looking for a complementary/similar feature: the ability to disable packing. (i.e. currently if two storage parameters are each < 256 bits and together they fit into one slot, they are packed together.)

Ultimately the compiler could optimise the packing based on the frequency of changes to one ore more variables within.

With your suggestion this is a given, each marked variable gets its own slot. I would use a different markup though:

storage(0) int a;
storage(1) bytes32 b;

@pipermerriam
Copy link
Member Author

I would use a different markup though

the @ was just the first thing that came to mind. I like storage(...) better.

@chriseth
Copy link
Contributor

I think the tradeoff between introducing errors and decreasing readability is much better when just adding int c at the end. If you want, you can also use inheritance (let the upgraded contract inherit from the old contract).

@VoR0220
Copy link
Member

VoR0220 commented May 25, 2016

^ 👍 for the inheritance structure...it overall is cheaper and more cost effective to do it that way. I envision a lot of modularity around dapps in the future in regards to storage to better handle updates and save gas.

@chriseth chriseth closed this as completed Aug 5, 2016
@axic
Copy link
Member

axic commented May 25, 2018

This came up again as a discussion with @federicobond and I think a good middle ground could be to have an annotation (as proposed in #597 (comment)), but instead of marking a storage slot, it would rather have a string literal as a key, which is hashed to produce a 256-bit key for storage.

This would be more expensive (due to the fact of using 32-byte long constants and one couldn't combine multiple variables into a single slot), but might be justified for some.

When this annotation is missing, it would default to the current behaviour.

For syntax I propose:

int256 a storage_key("this_is_my_variable");
bytes32 b storage_key("and_this_too");

@axic axic reopened this May 25, 2018
@axic axic added the feature label May 25, 2018
@pipermerriam
Copy link
Member Author

@axic

  1. I like the hashed key approach
  2. reasoning for not allowing specific slot to be specified? foot gun?

@chriseth
Copy link
Contributor

I really don't think solidity should have such low-level impact on the storage location. If you want to dislocate storage variables, why not use structs or a mapping to structs?

@gnidan
Copy link
Member

gnidan commented Jul 20, 2018

One more possible other solution:

contract MyContract {
  storage("some-collection") {
    uint foo;
    uint bar;
  }

  storage("other-collection") {
    mapping (uint => bool) qux;
    MyStruct baz;
  }
}

The advantage of this is that contracts could define blocks of variables that are colocated in storage, but providing gaps, to extend structs later, etc.

@axic axic added the language design :rage4: Any changes to the language, e.g. new features label Jul 28, 2018
@spalladino
Copy link

spalladino commented Sep 5, 2018

Just throwing this as an idea: given that this need arises from avoiding clashes when working with upgradeability, wouldn't it make sense to just avoid clashing by storing all variables in a hashed location, similar to how a mapping works? We could either store all variables from the same contract/struct together (the hash being a contract identifier, and variables are stored at offsets of that hash), or all individual variables in sparse hashed locations.

The issue remains on how to generate an identifier for a contact, to ensure there are no clashes between different contracts, but that identifier is more robust than a simple name. Maybe requiring a special constant with a random value for every contract that will use this approach, similar to old Java's serialVersionUID?

axic pushed a commit that referenced this issue Nov 20, 2018
Adds an EIP for getting logs by a block hash.
@axic
Copy link
Member

axic commented Oct 29, 2019

There was also a lengthy related discussion in #4017.

@ekpyron
Copy link
Member

ekpyron commented Jan 14, 2020

This came up again with #7891 .

If we want to expose really general control we need three components:

  • storage slot
  • offset in the storage slot
  • number of bytes reserved

Natural restrictions would apply (violating would result in compile time errors):

  • offset + sizeOfType <= 32
  • numberOfBytesReserved >= sizeOfType
  • we could in a first version have (offset + numberOfBytesReserved) % 32 == 0 and only later decide whether to lift that
  • no overlap with a previously declared variable is possible

I would suggest to make all such specifiers optional. Variables without specifiers before any variables with specifiers will be assigned slots as before.
For variables without specifiers after any variables with specifiers there are two options:

  • continue to put them after the last variable without specifier unless this is in conflict with another variable - if that's the case, move past it (I dislike this)
  • continue assigning storage locations after the last occupied storage location so far (including variables with specifiers) (I prefer this)

For the purpose of inheritance: locations are assigned just as if it was one flat contract containing all variables in the order of C3 linearization.

Example (we can always decide on a different syntax):

contract A {
  uint256 a; // will occuply full slot 0
  // slots 1 and 2 will remain unused
  storage{slot: 3, offset: 0, reserved: 32} bool b; // will occupy full slot 3

  storage{slot: 4, offset: 1} bool c; // will occupy the second byte in slot 4
  storage{slot: 4, offset: 0} bool d; // will occupy the first byte in slot 4
  storage{slot: 4, offset: 16} uint128 d; // will occupy the second half of slot 4

  uint128 e; // will occupy the first half of slot 5

  storage{slot: 5, offset: 16} uint128 f; // will occupy the second half of slot 5

  storage{slot: 6, offset: 0} bool g; // will occupy first byte in slot 6
  bool h; // will occupy second byte in slot 6
  storage{slot: 6, offset: 2, reserved: 2} bool i; // will occupy third byte in slot 6
  bool j; // will occupy fifth byte in slot 6
  storage{slot: 6, offset: 16, reserved: 48} uint128 k; // will occupy second half of slot 6
  // slot 7 will remain unused
  uint128 l; // will use the first half of slot 8
}

An alternative notation-wise would be to merge slot and offset into a single byte offset that is then split into slot = byteOffset/32 and offset = byteOffset%32 (to which the same restrictions would apply). A copy of the example above using this notation:

contract A {
  uint256 a; // will occuply full slot 0
  // slots 1 and 2 will remain unused
  storage{offset: 96, reserved: 32} bool b; // will occupy full slot 3

  storage{offset: 129} bool c; // will occupy the second byte in slot 4
  storage{offset: 128} bool d; // will occupy the first byte in slot 4
  storage{offset: 144} uint128 d; // will occupy the second half of slot 4

  uint128 e; // will occupy the first half of slot 5

  storage{offset: 160} uint128 f; // will occupy the second half of slot 5

  storage{offset: 192} bool g; // will occupy first byte in slot 6
  bool h; // will occupy second byte in slot 6
  storage{offset: 194, reserved: 2} bool i; // will occupy third byte in slot 6
  bool j; // will occupy fifth byte in slot 6
  storage{offset: 208, reserved: 48} uint128 k; // will occupy second half of slot 6
  // slot 7 will remain unused
  uint128 l; // will use the first half of slot 8
}

@ekpyron
Copy link
Member

ekpyron commented Jan 14, 2020

Another alternative would be to require specifying the location for all variables, if the location is specified for any variable.

Also we could at a later point allow compile time evaluated expressions in the specifier, i.e.:

storage{slot: keccak256("some_key")} uint256 some_key;

Although we'd need to consider that one could construct those to specifically collide with some mapping key, so this would be dangerous.

Although that's also true for choosing some specific value for slot: that happens to be the location of some mapping element.

@chriseth
Copy link
Contributor

Maybe we should gather some data about how this feature would be used. One use is avoiding clashes during upgrades, another is having more efficient use of storage by combining small variables in a certain way. I think just providing full flexibility all the time might not be the way to go as it is too easy to get wrong. So it could already be enough to only allow hashed locations and another way to specify which variables to combine (without specifying the offset exactly) or when to insert "start a new slot here".

@ekpyron
Copy link
Member

ekpyron commented Jan 14, 2020

What can "go wrong"? Or in particular, what can go wrong that we can't easily detect at compile time?
I'd argue that it makes more sense to provide a general solution and, if deemed necessary, restrict it to simple cases (as in restrict to some particular kinds of values for slot, etc. - e.g. restricting to only supporting "start a new slot here" would be to require slot to be the "current slot" plus one and require offset to be zero).

That way we can always extend the very same solution to support more cases, instead of needing breaking changes and new language features...

@spalladino
Copy link

One use is avoiding clashes during upgrades

For the sake of upgrades, it'd seem that the only requirement is to be able to assign an immutable id to a variable, which should be deterministically mapped to a slot (like the storage{slot: keccak256("some_key")} proposed above). It's not really important where in the storage the variable is kept.

As for EIP2330 linked above, the requirements are pretty much the same. As long as there is a deterministic process for calculating the storage slot, the actual slot can then be just exposed in the ABI for any consumers.

@ekpyron
Copy link
Member

ekpyron commented Jun 10, 2024

An alternative, maybe more complex, approach we've been wondering about would be to already move further towards generally decoupling storage from C3 linearization.

In particular, the idea is based on the fact that conceptually the https://eips.ethereum.org/EIPS/eip-7201 pattern could be adjusted to

abstract contract Example {
    struct MainStorage {
        uint256 x;
        uint256 y;
    }

    function _getExampleMainStorage() internal virtual view returns (MainStorage storage);

    function _getXTimesY() internal view returns (uint256) {
        MainStorage storage $ = _getMainStorage();
        return $.x * $.y;
    }
}

contract FinalContract is Example {
    // keccak256(abi.encode(uint256(keccak256("example.main")) - 1)) & ~bytes32(uint256(0xff));
    bytes32 private constant EXAMPLE_MAIN_STORAGE_LOCATION =
        0x183a6125c38840424c4a85fa12bab2ab606c4b6d0e7cc73c0c06ba5300eab500;

    function _getExampleMainStorage() internal override view returns (MainStorage storage $) {
        assembly {
            $.slot := EXAMPLE_MAIN_STORAGE_LOCATION
        }
    }
}

Now we're not suggesting to do just that, since it has the same issues of the compiler being oblivious to the actual storage layout.
But, minimally, that could be alleviated by allowing explicit storage locations to be specified only in the most derived contract and only if no base contracts have any storage variables, as in:

contract FinalContract is Example {
    storage{slot: keccak256(abi.encode(uint256(keccak256("example.main")) - 1)) & ~bytes32(uint256(0xff))}
    Example.MainStorage exampleStorage;

    function _getExampleMainStorage() internal override view returns (MainStorage storage) {
        return exampleStorage;
    }
}

However, having to access storage via indirection of internal functions is still generally inconvenient, but the concept could be further turned into compiler-builtin mechansisms by introducing virtual storage variables. As in:

abstract contract ExampleA {
    struct MainStorage {
        uint256 x;
        uint256 y;
    }
    // A virtual storage variable does *not* get a storage location itself directly.
    // It forces the contract to become `abstract` and requires an override
    // providing the actual location in the most-derived contract
    MainStorage virtual $;

    function _getXTimesY() internal view returns (uint256) {
        return $.x * $.y;
    }
}
// Alternatively even:
abstract contract ExampleB {
    uint256 virtual x;
    uint256 virtual y;

    function _getXTimesY() internal view returns (uint256) {
        return x + y;
    }
}

In turn, we could allow and require the most-derived (and - at least for now - only the most derived contract) to "override" the inherited state variables, providing them with a particular location, while allowing to specify explicit storage locations. As in:

function erc7201_slot(string memory _id) pure returns (bytes32) {
    return keccak256(abi.encode(uint256(keccak256(_id)) - 1)) & ~bytes32(uint256(0xff));
}
contract FinalContractA is ExampleA {
    storage{ slot: erc7201_slot("example.main") }
    MainStorage override ExampleA.$;
}
// or in the alternative version:
contract FinalContractB is ExampleB {
    storage{ slot: erc7201_slot("example.main") }
    uint256 override ExampleB.x;
    uint256 override ExampleB.y; // would be laid out after `Example.x` following the usual rules
}

To make this safer, the logic could be that if any inherited state variable is virtual, all of them have to be virtual (except in the most-derived contract) and that explicit storage slot specifiers are only allowed if all inherited state variables are virtual.

However, this would mean that each and every state variable would have to be laid out explicitly in the most derived contract, which may be overly verbose.

Conversely, we could consider doing something similar on the contract-level instead of the level of state variables, along the lines of

contract FinalContractB2 is ExampleB {
    storage{ slot: erc7201_slot("example.main") }
    storage(ExampleB) override; // good syntax is a bit tricky here, though
    /// storage ExampleB.* override; // there may be better options or variants syntactically
    // TODO: what about bases of bases?
}

This would lead to something similar as already suggested: contracts could declare their entire storage "virtual" (pending syntax - whether reusing virtual is a good choice is debatable, especially for an entire contract), and the most derived contract could explicitly lay out storage for each base contract (that has any state variables). That'd be close to @frangio's proposal in #597 (comment) (modulo syntax and some restrictions, e.g. that base contracts are required to be explicit about deferring the storage layout, which in turn forces the derived contract to specify the locations).
However, the issue there is: what to do about base contracts that themselves inherit state variables? Especially since the C3-linearization of the base contract is not a good basis in multi-inheritance settings. Some options here:

  • Specifying an entire contract's storage in bulk could only be allowed if that contract doesn't itself inherit state variables - or if its C3-linearization occurs linearly in the C3-linearization of the most-derived contract - but then the rules are quickly getting complicated there.
  • Simpler may be: specifying storage for a base contract only extends to the storage of that contract itself, not its own bases. So specifying the location for a base contract, requires also specifying the locations for all its bases (that have state variables). (So specifying storage for a base contract in the most derived contract is literally just shorthand for listing/overriding its own state variables in order.)
  • For the simplest case there could be, as suggested before, catch-all syntax for the entire C3 linearization of the most-derived contract (while still requiring all storage in the bases to be explicitly virtual or similar)

So, in particular, we're wondering about more opinions on the following:

  • Is finer-grained control over the layout (as in per-state-variable instead of per-contract) desirable?
  • Should we require a state variable in a base contract - or the base contract itself - to explicitly state that the storage locations are to be defined elsewhere (by use of virtual or similar) to avoid incorrect assumptions on that contract's storage layout in isolation (not too big of a danger, since inheritance will potentially move state variables anyways), but more importantly to force to specify explicit locations in the derived contract?
  • Would you consider having to lay out the entirety of storage in a most-derived contract a feature in the sense that it makes the layout unquestionably explicit, or a burden due to its necessary verbosity compared to a global base storage position for the entire C3 linearization? Especially if done per state-variable, but also if just done per base-contract (probably then requiring to also do it for bases-of-bases if they have state variables)?

@frangio
Copy link
Contributor

frangio commented Jun 10, 2024

Is finer-grained control over the layout (as in per-state-variable instead of per-contract) desirable?

Yes it is desirable to declare a variable and specify where it should be placed in storage. ERC-1967 is the simplest example where this would be used.

Ideally the compiler should be able to check for clashes though, and I don't know if this is possible in the general case. It is possible if you restrict to locations of the form of ERC-1967 (for single-slot values) or ERC-7201, and I think these are general enough for EIP-7702 purposes, but there are other conventions that people might want to use.

One point about per-state-variable control of layout is that it should not be applicable to private state variables, so it is not sufficient as a mechanism in itself.

to avoid incorrect assumptions on that contract's storage layout in isolation (not too big of a danger, since inheritance will potentially move state variables anyways)

I agree on the comment in parentheses. This incorrect assumption should not be a motivation for any changes.

Should we require a state variable in a base contract - or the base contract itself - to explicitly state that the storage locations are to be defined elsewhere (by use of virtual or similar) [...] to force to specify explicit locations in the derived contract

Should virtual storage variables be available? Sure, I can see how they might be useful. Should they be a prerequisite to be able to override the storage location of base contracts? Probably not. I think the use case for EIP-7702 is to take a contract off the shelf and deploy it at a storage offset other than 0, and I think this should not require changes to the base contract source code.

or a burden due to its necessary verbosity compared to a global base storage position for the entire C3 linearization?

I would consider it a burden but because it leaks implementation details of base contracts. Authors of reusable contracts should be able to ship changes to their code, including to storage variables, without breaking downstream code!

also if just done per base-contract (probably then requiring to also do it for bases-of-bases if they have state variables)

This also leaks implementation details (especially bases-of-bases), although it is something library authors may already be familiar with (via forced override in functions, see Thoughts on override(A, B) syntax).


  • Specifying an entire contract's storage in bulk could only be allowed if that contract doesn't itself inherit state variables - or if its C3-linearization occurs linearly in the C3-linearization of the most-derived contract - but then the rules are quickly getting complicated there.

Not sure if this is what you meant by "occurs linearly" but it may be intuitive to say that if a base contract and its own base contracts form an independent subtree in the inheritance DAG (i.e., there are no edges into this subtree other than to its root) it should be possible to remove it from storage linearization and put it elsewhere in storage.

You could refine this further by saying that you only consider contracts that have linear/non-relocated storage. I've written too much at this point but I can elaborate later if this sounds interesting.

@ekpyron
Copy link
Member

ekpyron commented Jun 11, 2024

Yeah, I definitely understand the desire of not leaking implementation details - there is some friction, though, between having 1. fine-grained control over individual state variables, 2. safely ensuring that all state variables are at "safe" locations and 3. hiding implementation details. E.g. if two bases, which are supposed to be located at distinct storage areas, add a shared base with a state variable (as an implementation detail), I don't see a choice other than being explicit about the location of that shared base (resp. its state variables) similarly to the existing disliked override logic for virtual functions (or just disallowing the relocation in that case). Similarly, relocating individual state variables while keeping others private may be dangerous - there may be sane solutions to this (just as an example, relocation of specific storage variables could only be allowed if simultaneously being explicit about the storage location of the containing structure, i.e. contract or inheritance subtree).

However, we don't need to fully solve this for an initial version that allows for EIP-7702 safety (for which relocating the entire inheritance graph in bulk in (only) the most-derived contract should be enough initially), so for the time being this is mainly a concern for ensuring that our method/syntax ideally remains future-proof.

I think the use case for EIP-7702 is to take a contract off the shelf and deploy it at a storage offset other than 0, and I think this should not require changes to the base contract source code.

Here the main question is whether we should make this a hard requirement or whether the expectation is rather to ship explicitly adjusted EIP-7702-versions of contracts. I'd assume that for use in the context of 7702 contract code would need to be at least reinspected (e.g. against potential clashes due to storage use via inline assembly, resp. the 7201 pattern) in any case - however, if there is a strong desire to be able to use fully unmodified base contracts that are merely relocated via inheritance, we can take that into account.

if a base contract and its own base contracts form an independent subtree in the inheritance DAG

It's clear what you mean there and yes, under that condition relocating a full subtree is safe.

So for the time being, I'd summarize our state of discussion as follows:

  • We should allow relocating the entirety of storage of a most-derived contract (including inherited state variables) now (i.e. at the latest in time for Prague) for initial EIP-7702-safety.
  • Ideally, we should have a method that can be extended to also allow specifically relocating particular bases or inheritance subtrees in the future.
  • Ideally, the latter method is robust against changes in implementation details as much as safely possible.
  • Ideally, a further extension should allow for even finer-grained control over the locations of specific state variables.
  • Potentially, the method should allow for simply reusing existing contracts and merely relocating them e.g. via inheritance

@Zer0dot
Copy link

Zer0dot commented Jun 11, 2024

Hey all, I'm not sure how wildly incorrect I might be (fairly new to this part of ethdev), but I guess it's worth floating this idea I had, although it breaks account storage persisting through different code delegations. Let me know if that's an issue.

Would it be possible to slightly modify what SLOAD/SSTORE does for accounts with set code (7702), such that instead of affecting slot X for the account, it affects slot x for a fictional account at address bytes20(uint160(keccak256(account . 7702_address_field)))?

This way, we maintain the benefit that contracts don't need to be rewritten with namespaced storage, and app devs don’t have to worry about it. All the while maintaining guarantees that whatever contract a user delegates control to won't have any storage collisions regardless of how other contracts the user may have delegated to managed storage (i.e. you don't have to be aware of previous contracts' potential messups or users accidentally corrupting their storage).

@nlordell
Copy link

Would you consider having to lay out the entirety of storage in a most-derived contract a feature in the sense that it makes the layout unquestionably explicit, or a burden due to its necessary verbosity compared to a global base storage position for the entire C3 linearization?

I think there are pros and cons to both ways of implementing it:

  • global base storage position: means that existing contracts in commonly used libraries can also use this feature without having to change each slot in each contract to be virtual (which could cause a lot of code to change everywhere).
  • lay out the entirety of storage in a most-derived contract: This is the most expressive of the two options (as you can have arbitrary layouts and are not limited to allocating storage slots from a specific offset like in the other example). You could also argue that the number of contracts that want to make use of this feature, and the number of storage slots that want to make use of this feature is low enough that it will not cause a large cascade of code changes.

The global storage position feels "easier", but it also feels a bit more like a temporary solution to a very specific usecase rather than an end goal, which makes the engineer in me feel icky.

@zerosnacks
Copy link

Given that ERC-7201 is now well defined, in the standards track and has accessible resources on it (https://www.rareskills.io/post/erc-7201), developers expect that their tooling has support for it. It would be great if this could be added as part of the artifacts generated by the compiler.

Ref: foundry-rs/foundry#7662

@cameel
Copy link
Member

cameel commented Jul 10, 2024

Thank you everyone for the input so far. Looks like we have more clarity on the possible use cases so it's time to start moving things forward on getting the syntax finalized. The biggest obstacle here is still choosing something that will be easy to extend in the right direction without making contract definitions misleading, unreadable or too repetitive. We discussed it at length and all the options so far have drawbacks. Since we need to decide something anyway, here's my proposal that I wanted to put up as a more concrete basis for further discussion:

https://notes.ethereum.org/@solidity/explicit-storage-layout-syntax

This is by no means final, but I think it reconciles our own concerns with many of the requirements posted here so far. Not all of them could be incorporated though and we're still open to alternative proposals or amendments.

The proposal presents 3 variants, with the first one being only just enough to cover EIP-7702 and the other two trying to accommodate other use cases. Note that in either case we'd only implement enough to allow specifying the location for the most derived contract, which would basically be this, respectively:

  1. contract C at <location> {}
  2. contract C layout at <location> {}
  3. contract C { layout at <location>; }

@nlordell

This comment was marked as resolved.

@ekpyron

This comment was marked as resolved.

@frangio
Copy link
Contributor

frangio commented Jul 10, 2024

Thanks for sharing the proposal! I like variant 3, specifically layout blocks, but layout at as a shorthand seems fine too.

What do you think is the subset that should be implemented initially?

One thing that I see listed as a potential requirement (great summary, by the way) but not addressed in the proposals is the issue of collisions. Ideally the compiler would be able to guarantee that a layout is collision-free. I don't think this can be done for arbitrary expressions, but it should be possible if restricted to known formulas, e.g. native Solidity formulas for arrays and mappings, or ERC-7201. Constants pose a problem because they could be hash outputs. Do others agree that this is an important requirement?

@sakulstra
Copy link

I think collision detection would be quite valuable.

It's not clear to me why 1) would not support ERC-7201?
Shouldn't:

contract FinalContract
    at keccak256(abi.encode(uint256(keccak256("FinalContract")) - 1)) & ~bytes32(uint256(0xff))
    is BaseA, BaseB
{

be exactly enough to conform to 7201? Could someone elaborate?

@frangio
Copy link
Contributor

frangio commented Jul 11, 2024

All of the variants support ERC-7201 but also support other expressions and that's one way that collisions could come in. But I don't think we'd want to restrict to ERC-7201 only, for example there is ERC-1967, and there may be other schemes people want to use.

That said this is something that was considered in ERC-7201 for the design of the annotation, whose format is @custom:storage-location <FORMULA_ID>:<NAMESPACE_ID> where FORMULA_ID can be erc7201 or something else.

@cameel
Copy link
Member

cameel commented Jul 11, 2024

What do you think is the subset that should be implemented initially?

Just specifying the location for the whole hierachy.

If you mean beyond that, then as @ekpyron said above, the next step would be to allow relocating specific base contracts or subtrees. But how quickly we proceed with that strongly depends on how certain we feel that this design is robust and extensible enough. For now we need anyone interested to try to poke holes in it and bring up stuff that we may be overlooking :)

Ideally the compiler would be able to guarantee that a layout is collision-free.

TBH we assumed that people would actually be pushing for more expressivity here rather than for restricting it. This is exactly the kind of feedback we need here :)

This is something we could consider if there's clear consensus that this is what everyone wants, but for now it seems to me like it might be a bit too restrictive. I'd be worried about missing some common formulas from pre-ERC-7201 schemes being in active use.

So far we were not planning to enshrine any particular method of selecting locations, rather leaving that up to ERCs/conventions/libraries. The idea was to allow any expression that can be evaluated at compilation time (as long as the resulting locations don't overlap in an obvious way within a single contract hierarchy). Currently that set is pretty limited, but still includes constants. With time it would be extended to include enough things to cover ERC-7201 (keccak, conversions, some subset of ABI encoding), and eventually we'd get full compile-time evaluation support (but that might only ship on top of the new type system). With that, common formulas for locations could be considered for inclusion in the standard library.

It's not clear to me why 1) would not support ERC-7201?

Because it will relocate the whole inheritance hierarchy, including BaseA and BaseB not just FinalContract. While ERC-7201 does not mention inheritance at all, it seems to me that part of the motivation behind it was to give the contract a storage location completely independent of inheritance hierarchy. So if you do happen to inherit from a contract that follows ERC-7201, its storage should not impact yours in any way. Variant 1) does not let you do that. You could say it technically works if you do not use inheritance, but if you do use it, storage of all contracts in the hierarchy is still coupled.

@cameel
Copy link
Member

cameel commented Jul 17, 2024

@frangio We discussed the idea of limiting the available expressions on the design call today, but so far we're not really convinced this is something that should be enforced by the compiler. We're open to being convinced otherwise, but for now we'd still rather leave that up to convention.

Guaranteeing safety against collisions may be hard to reconcile with allowing more than one scheme and we really don't want to limit this feature to a single convention, even if it's ERC-7201. Also, this would have to work nicely not only with the current scheme used by the compiler but also the future one, which is likely to be introduced in response to Verkle Trees. The current scheme will need to change in certain ways because Verkle will make not colocating data together (as is the case for dynamic types now) more expensive.


I also updated the proposal to address your comments there. I removed the override from the ERC-7201 example. In place of an annotation I added a erc7201() function, which could potentially end up in stdlib in the future (meant to be evaluated at compilation time). I also updated the detailed rules with what I already said here about allowed expressions and collision prevention.

@cameel
Copy link
Member

cameel commented Sep 19, 2024

We decided to go with variant 2. Here's the spec for what exactly will be implemented: Initial syntax for explicit storage locations. As I said before, it's just the minimal version of it, though we did include a compile-time helper for the ERC-7201 formula to nudge people towards using the established conventions rather than arbitrary expressions.

This is the new syntax in a nutshell:

contract C layout at 0x1234 {}
contract FinalContract
    layout at erc7201("FinalContract")
    is BaseA, BaseB
{}

From the feedback we got here so far, it seems to me that it should be a generally acceptable base to extend later as needed. We're planning to start working on it after we're done with some other features which have kept us busy so far. If anyone has some reservations and can provide strong arguments for why this may be bad or insufficient, this is the moment to speak up.

@frangio
Copy link
Contributor

frangio commented Jan 2, 2025

A potential issue is that contracts that manually implement custom storage locations will be unaffected by Solidity's native layout rerooting.

For example: https://github.com/OpenZeppelin/openzeppelin-contracts-upgradeable/blob/dfe0973e94e137277cae220ef54eb66df60cbf92/contracts/access/OwnableUpgradeable.sol#L27-L34

In a sense this should be expected because it's a very low level operation. But it would be good if Solidity provided some way for this kind of logic to adapt to a potentially rerooted layout... For example, if the base of the layout was available as a constant. Maybe something like type(Contract).layoutBase that could be used in constants.

@ekpyron ekpyron moved this to ❄️ Q1 2025 in Solidity Roadmap Jan 7, 2025
@cameel
Copy link
Member

cameel commented Jan 13, 2025

@frangio Sounds like a reasonable feature. We can add something like this, though I can't promise it will make its way into the initial implementation (which is almost ready at this point).

There are also some things that need to be clarified about how it would work in corner cases related to possible future extensions:

  • If we allow relocating individual state variables and all of them get moved away from the base location for the contact, should it keep pointing at that empty spot? Or move with the first variable?
  • If C is B and B is A, does type(C).layoutBase point at the start of the storage area of only C or its whole inheritance hierarchy (i.e. A)?

@frangio
Copy link
Contributor

frangio commented Jan 14, 2025

If we allow relocating individual state variables and all of them get moved away from the base location for the contact, should it keep pointing at that empty spot? Or move with the first variable?

I think keep pointing at the empty spot.

If C is B and B is A, does type(C).layoutBase point at the start of the storage area of only C or its whole inheritance hierarchy (i.e. A)?

My intuition is it makes more sense to point at the storage of only C. This would also be more compatible with separately relocating parts of the inheritance graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language design :rage4: Any changes to the language, e.g. new features must have eventually Something we consider essential but not enough to prevent us from releasing Solidity 1.0 without it.
Projects
Status: ❄️ Q1 2025
Development

No branches or pull requests