Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsics for the VM's storage opcodes. #2508

Merged
merged 24 commits into from
Aug 16, 2022

Conversation

vaivaswatha
Copy link
Contributor

@vaivaswatha vaivaswatha commented Aug 10, 2022

This PR provides the following intrinsics:

  1. __state_load_word (key: b256) -> u64;
  2. __state_store_word (key: b256, value: u64);
  3. __state_load_quad (key: b256, addr: u64);
  4. __state_store_quad (key: b256, addr: u64);

All of them compile to equivalent IR instructions. The last two (quad versions) required some changes to asm generation since they use int_to_ptr as inputs to the IR instructions for quad storage.

While adding the tests, a flaw I discovered was that: If the intrinsics are called from a script, passing an address that belongs to the script, we hit a runtime error (because the address is now referenced in the contract, not the calling script). This indicates that we'll need some static analysis around passing addresses - flagging such issues. Another issue would be if we take the address of a const, if at all that is allowed - I haven't checked. These are issues with the __addr_of intrinsic rather than this one, so I suppose I can file a separate Issue on that?

Closes #2514

@vaivaswatha vaivaswatha self-assigned this Aug 10, 2022
@mohammadfawaz
Copy link
Contributor

a flaw I discovered was that: If the intrinsics are called from a script

We should certainly error out at compile-time if storage intrinsics are used in a script!

@vaivaswatha
Copy link
Contributor Author

a flaw I discovered was that: If the intrinsics are called from a script

We should certainly error out at compile-time if storage intrinsics are used in a script!

I think that already happens. The problem is if you take a local variable's address in a script, then call a (wrapper) function in the contract that in-turn calls one of these intrinsics using that address, then we have a runtime error.

@vaivaswatha vaivaswatha marked this pull request as ready for review August 10, 2022 16:22
@vaivaswatha vaivaswatha requested a review from a team August 10, 2022 16:22
@mohammadfawaz
Copy link
Contributor

mohammadfawaz commented Aug 10, 2022

This is great! Thanks Vaivas :)

Three small requests:

  • Do you mind creating a short issue that explains the motivation behind this change and link the PR to it? It's probably good to mention that this is one task towards implementing the intrinsics RFC that you proposed.
  • Will you also describe the source of the runtime error in another issue so that we don't forget about it?
  • Finally, can we now use the intrinsic in the standard library to replace asm blocks? Is it a good idea to this in this PR?

@vaivaswatha
Copy link
Contributor Author

This is great! Thanks Vaivas :)

Three small requests:

* Do you mind creating a short issue that explains the motivation behind this change and link the PR to it? It's probably good to mention that this is one task towards implementing the intrinsics RFC that you proposed.

* Will you also describe the source of the runtime error in another issue so that we don't forget about it?

* Finally, can we now use the intrinsic in the standard library to replace `asm` blocks? Is it a good idea to this in this PR?

I meant to do the 3rd one in this PR and forgot all about it 🤦‍♂️ . I'll do all three of these and update back here. Thanks Mo.

@vaivaswatha
Copy link
Contributor Author

This is great! Thanks Vaivas :)

Three small requests:

  • Do you mind creating a short issue that explains the motivation behind this change and link the PR to it? It's probably good to mention that this is one task towards implementing the intrinsics RFC that you proposed.
  • Will you also describe the source of the runtime error in another issue so that we don't forget about it?
  • Finally, can we now use the intrinsic in the standard library to replace asm blocks? Is it a good idea to this in this PR?

New issues: #2515 and #2514

@sezna
Copy link
Contributor

sezna commented Aug 11, 2022

One thing I am curious about is the portability of these intrinsics. Given that the EVM's word size is different, the quad word intrinsic would have to load 256 * 4 == 1024 bits in order for the name to make sense. Is there a naming scheme that could give us more freedom to implement these optimally? Perhaps state_load_256 or something?

@mohammadfawaz
Copy link
Contributor

mohammadfawaz commented Aug 11, 2022

One thing I am curious about is the portability of these intrinsics. Given that the EVM's word size is different, the quad word intrinsic would have to load 256 * 4 == 1024 bits in order for the name to make sense. Is there a naming scheme that could give us more freedom to implement these optimally? Perhaps state_load_256 or something?

+1 for this.

@otrho
Copy link
Contributor

otrho commented Aug 12, 2022

Aren't we going to have different intrinsics for different targets? So maybe since these are Fuel specific they should be __fuelvm_state_load_word() and we can have __evm_state_load_word() eventually too. And, then I guess different instructions in the IR... 🤔

I was 🤔 about this the other day actually -- platform specific stuff in the IR should probably be generalised with wrappers. We're going to have to do this with ASM blocks which are currently FuelVM specific in the IR, and so any platform specific stuff will need the same. Storage counts here -- it's very, very similar between FuelVM and EVM but not the same.

@sezna
Copy link
Contributor

sezna commented Aug 12, 2022

Aren't we going to have different intrinsics for different targets?

Hmm, my mental model has been that we will use the same intrinsics and accomplish portability by implementing the codegen differently. Otherwise we aren't increasing portability with intrinsics relative to asm blocks, since you'd have to rewrite both of them. Platform specific stuff can live in asm perhaps?

How do portable intrinsics work in e.g. rust? I guess they're platform specific too, thinking about things like the AVX intrinsics....maybe there's a middle ground where some are disallowed for certain backends, like the ones related to the fuel tx frame -- those would need to be disallowed for EVM..

Just having a stream of consciousness here...

@otrho
Copy link
Contributor

otrho commented Aug 12, 2022

I was thinking yeah, a certain subset of intrinsics will just not compile, with an error, if your target isn't right for them. Some intrinsics simply won't exist for certain targets, especially if we wrap basic ASM ops (in the ongoing effort to remove ASM blocks from the libs in the name of type safety).

ASM blocks themselves are going to have to change drastically, though this isn't really the place for that discussion. But they're going to have to be more opaque than they are now, and maybe intrinsics should be the same.

We could actually have just one 'intrinsic' which is parameterised with an enum. So __size_of<T>() becomes __builtin<T>(SizeOf), or we have e.g., __builtin<T>(Eq, lhs: T, rhs: T) -> bool and __builtin is very much special cased and parameterised based on target. It would have to magically allow any number of type parameters and args and return type.

I'm not sure if this is less work and/or easier to implement though.

@sezna
Copy link
Contributor

sezna commented Aug 12, 2022

We also need to standardize how we define what intrinsics are allowable for what backends, and it should be thoroughly considered..

We could actually have just one 'intrinsic' which is parameterised with an enum. So __size_of() becomes __builtin(SizeOf), or we have e.g., __builtin(Eq, lhs: T, rhs: T) and __builtin is very much special cased and parameterised based on target. It would have to magically allow any number of type parameters and args.

What would be the benefit of restructuring in this way?

@otrho
Copy link
Contributor

otrho commented Aug 12, 2022

There is one intrinsic whose behaviour is entirely data driven based on the target.

You could add or modify targets without having to change the actual compiler core/type checker.

@vaivaswatha vaivaswatha requested a review from otrho as a code owner August 12, 2022 06:06
@vaivaswatha
Copy link
Contributor Author

I've now updated storage.sw to use the new intrinsics. To workaround type limitations on the arguments and return type of the intrinsics, I'm using asm to do type casts (this isn't a problem at runtime because of the is_reference_type guard). Ideally, this should be solved with type constraints and not using is_reference_type as detailed here.

So that leaves only the question of names for these intrinsics (the discussion going on above related to different targets). I'll update the names once that is finalized, but the rest of the code can be reviewed now I think.

Copy link
Contributor

@otrho otrho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments, below, all good otherwise.

Comment on lines +1606 to +1608
let val_reg = if matches!(
&self.context.values[val.0].value,
ValueDatum::Instruction(Instruction::IntToPtr(..))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than special case IntToPtr here would it work instead to update resolve_ptr() to handle both GetPointer and IntToPtr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than special case IntToPtr here would it work instead to update resolve_ptr() to handle both GetPointer and IntToPtr?

I considered that, but there's some computation that's done later which isn't needed for IntToPtr (but needed if resolved as a stack pointer in resolve_ptr. So it wouldn't serve any purpose since I'll need to handle them differently here anyway.

@@ -0,0 +1,486 @@
contract {
fn main<ea1a0f91>() -> u64, !3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per #2505 we really need to move these tests away from here and use FileCheck. This is a bit ridiculous. 😃

Comment on lines +51 to 54
let loaded_word = __state_load_word(key);
asm (l: loaded_word) {
l: T
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not turbofish this one here? Was that also part of the discussion around this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not turbofish this one here? Was that also part of the discussion around this issue?

You mean using std::mem::read<T>? That wouldn't work as it ends up actually doing a load, assuming the value to be an address (in case of non-reference types).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asm block is just casting the loaded word to T -- if it's a copy type then we should be able to use __state_load_word::<T>(key) to get it in the right type from the start, no? I didn't check, is __state_load_word() just returning a u64 right now? Maybe it should return T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this just the problem we can't solve until we have trait constraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really only returns u64, so having its type to return T would be incorrect. But if it's returning only u64, then when used with is_reference_type<T> in a generic function, we end up with this problem we have. If we could restrict it with trait contraints, then we wouldn't have this call in a non u64 context at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, OK. I kinda feel like we should only have two intrinsics -- read and write, and whether they need to use SRW/SWW or SRWQ/SWWQ can be decided by the back end. Then the read functions could just return T.

But I'm sure that won't work either for some reason. OK, this'll do for now but I can feel the technical debt growing weekly. 😄

Comment on lines +67 to 69
asm() {
cfei i32;
srwq v k;
};
Copy link
Contributor

@otrho otrho Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next is an __alloca() intrinsic to wrap CFEI since it's a bit weird to grab the stack pointer with a call but then allocate some space with an ASM block. I guess it could go in a library function for now...

@otrho otrho requested a review from a team August 12, 2022 06:57
pub fn alloca<T>() -> u64 {
let current_pointer = stack_ptr();
asm() {
cfei i32;
Copy link
Contributor

@otrho otrho Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocating 32 bytes, where it needs to allocate __size_of<T>().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocating 32 bytes, where it needs to allocate '__size_of()`.

Oops, yes. Let me fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping (and reverting related commits) this as it needs to be an intrinsic. The argument to cfei must be an immediate :-(

otrho
otrho previously approved these changes Aug 12, 2022
Copy link
Contributor

@otrho otrho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Approving and then I'm out of here. Have a good weekend!

@otrho otrho requested a review from a team August 12, 2022 08:04
@vaivaswatha
Copy link
Contributor Author

Alright. Approving and then I'm out of here. Have a good weekend!

Thank you and you too !

@otrho otrho requested a review from a team August 16, 2022 06:59
@vaivaswatha vaivaswatha enabled auto-merge (squash) August 16, 2022 13:34
@vaivaswatha vaivaswatha merged commit a5d5a26 into master Aug 16, 2022
@vaivaswatha vaivaswatha deleted the vaivaswatha/storage_intrinsics branch August 16, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement intrinsics for storage operations
4 participants